Module Dataset¶

Dataset(name, data_path=None, target_path=None, data=Empty DataFrame,
        target=Empty DataFrame, scaler=MinMaxScaler(feature_range=(0.0, 1.0)))

After choosing two separate pandas DataFrame objects - one for the data and one for the target labels - preprocessing steps will be taken to format the dataset. Data will be normalized, by default with Min-Max Normalization.

Two formats are accepted: either pandas DataFrames (data, target), or the path to a csv file (data_path, target_path).

Arguments¶

name: string.

Name of the dataset.

data_path: string, default=None.

Path to a .csv file containing the input data matrix with a header and an index. If None, the module will look for the input data in the data parameter.

target_path: string, default=None.

Path to a .csv file containing the true labels with a header and an index. The file can have either one or two columns:

One column with the label for each sample, each label being represented by an integer.

That same column, as well as a column with the label represented in a string format (the name of the class).

If None, the module will look for the dataset in the target parameter.

data:

pandas DataFrame object containing the input data matrix with a header and an index. If data_path was given, target will be overridden.

target:

pandas DataFrame object containing the input data matrix with a header and an index. If data_path was given, target will be overridden.

scaler, default=MinMaxScaler().

Scaler object from sklearn used to normalize the data.

Attributes¶

n_classes:: Number of classes.
original_data:: Data before normalization.
data:: Data after normalization, ready to be split into a training and a test set and be used to build the neural network.
target:: Labels, represented as integers.
target_names:: Dictionary associating each class to its name.

Module Dataset¶

Arguments¶

Attributes¶

Table of Contents

Previous topic

This Page