Module Dataset¶
Dataset(name, data_path=None, target_path=None, data=Empty DataFrame,
target=Empty DataFrame, scaler=MinMaxScaler(feature_range=(0.0, 1.0)))
After choosing two separate pandas DataFrame objects - one for the data and one for the target labels - preprocessing steps will be taken to format the dataset. Data will be normalized, by default with Min-Max Normalization.
Two formats are accepted: either pandas DataFrames (data
, target
), or the
path to a csv file (data_path
, target_path
).
Arguments¶
- name: string.
Name of the dataset.
- data_path: string, default=None.
Path to a .csv file containing the input data matrix with a header and an index. If None, the module will look for the input data in the
data
parameter.- target_path: string, default=None.
Path to a .csv file containing the true labels with a header and an index. The file can have either one or two columns:
One column with the label for each sample, each label being represented by an integer.
That same column, as well as a column with the label represented in a string format (the name of the class).
If None, the module will look for the dataset in the
target
parameter.- data:
pandas DataFrame object containing the input data matrix with a header and an index. If
data_path
was given,target
will be overridden.- target:
pandas DataFrame object containing the input data matrix with a header and an index. If
data_path
was given,target
will be overridden.- scaler, default=MinMaxScaler().
Scaler object from
sklearn
used to normalize the data.
Attributes¶
- n_classes:
Number of classes.
- original_data:
Data before normalization.
- data:
Data after normalization, ready to be split into a training and a test set and be used to build the neural network.
- target:
Labels, represented as integers.
- target_names:
Dictionary associating each class to its name.