Module utils

Contains the helper function random_mask() to mask labels in a target object, as well as several functions used for visualization and explanation.

Functions

random_mask()

random_mask(y, pct_labeled=0.9, random_state=2)

Randomly masks labels in a target object, by replacing their original value by -1.

Args:
y:

Target variable in which values will be masked.

pct_labeled, default=0.9:

Percentage of labeled samples to keep in the target object. If the goal is to mask (x*100)% samples, pct_labels = (1-x).

random_state, default=2:

A random state can be specified for reproductability. If this is unwanted, pass random_state=None.

Returns:

Variable of the same dimension as y, where some values are replaced by -1.

color_palette()

color_palette(target_names, colors=None)

Associates a color to each of the classes. If distance and ambiguity are part of the classes (i.e., we want to represent labels after abstention) they will be associated with separate reserved colors.

Args:
target_names:

list of classes, or dictionary where the keys are classes. If the name of a class is different from the integer value that represents it, please pass the integer values as input and not the names.

colors, default=None:

If the user wants to specify the colors to choose from, instead of picking colors in a predefined list.

Returns:

A dictionary with classes as keys and colors as values.

som_viz()

som_viz(data, model, distances=None, labels=None, label_names=None, colors=None, legend=False, numbered=False)

Visualize the self-organizing map trained by the model by coloring neurons depending on the labels of samples they are associated to.

Args:
data:

Data samples for which labels are represented on the map.

model:

Model object that was used to train the SOM.

distances, default=None:

Matrix of distances between samples and SOM neurons. Corresponds to the second variable returned by model.predict(data) if it was called. If None, prediction will be made inside the som_viz function.

labels, default=None:

Labels associated to data, which will be used to color the map neurons. If None, labels will be predicted by the model inside the function. Common use would be to pass either the true, predicted, or abstained labels associated to data.

label_names, default=None:

Dictionary in which integers representing each class are associated with a string, i.e., the name of the class. Used in the legend. If None, the integers representing the classes will be used as names.

colors, default=None:

Dictionary which associates each class to a color. If None, a palette object will be created using the color_palette() function.

legend: bool, default=False:

If True, legend will be shown on the plot.

numbered: bool, default=False:

If True, the number of each neuron will be shown on the plot.

Returns:

matplotlib figure object.

prototype_features()

prototype_features(model, numbered=False)

Represents each of the SOM neurons by their values for each of the dataset’s features, giving a prototype (or example) for the data samples that are represented by a neuron.

Args:
model:

Model that was used to train the SOM.

numbered: bool, default=False:

If True, the number of each neuron will be shown on the plot.

Returns:

matplotlib figure object.