Module Dataset

Dataset(name, data_path=None, target_path=None, data=Empty DataFrame,
        target=Empty DataFrame, scaler=MinMaxScaler(feature_range=(0.0, 1.0)))

After choosing two separate pandas DataFrame objects - one for the data and one for the target labels - preprocessing steps will be taken to format the dataset. Data will be normalized, by default with Min-Max Normalization.

Two formats are accepted: either pandas DataFrames (data, target), or the path to a csv file (data_path, target_path).

Arguments

name: string.

Name of the dataset.

data_path: string, default=None.

Path to a .csv file containing the input data matrix with a header and an index. If None, the module will look for the input data in the data parameter.

target_path: string, default=None.

Path to a .csv file containing the true labels with a header and an index. The file can have either one or two columns:

  • One column with the label for each sample, each label being represented by an integer.

  • That same column, as well as a column with the label represented in a string format (the name of the class).

If None, the module will look for the dataset in the target parameter.

data:

pandas DataFrame object containing the input data matrix with a header and an index. If data_path was given, target will be overridden.

target:

pandas DataFrame object containing the input data matrix with a header and an index. If data_path was given, target will be overridden.

scaler, default=MinMaxScaler().

Scaler object from sklearn used to normalize the data.

Attributes

n_classes:

Number of classes.

original_data:

Data before normalization.

data:

Data after normalization, ready to be split into a training and a test set and be used to build the neural network.

target:

Labels, represented as integers.

target_names:

Dictionary associating each class to its name.