BaseTree#

class sklearn_nominal.sklearn.tree_base.BaseTree(criterion='', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_error_decrease=0.0, attribute_penalization_importance=1.0, nominal_split='multi')[source]#

Base class for decision trees with native nominal attribute support.

This class serves as a shared foundation for both classification and regression trees, managing the hyperparameters that control tree growth, node splitting, and pruning strategies.

Parameters:

criterionstr, default=””: The function to measure the quality of a split. Supported criteria are “gini” for Gini impurity, and “entropy” or “gain_ratio” for Shannon information gain.
splitterstr or int, default=”best”: The strategy used to choose the split at each numeric node. - “best”: evaluates all possible split points. - int: limits the maximum number of splits to consider per node.
max_depthint, optional: The maximum depth of the tree. If None, nodes are expanded until all leaves are pure or contain fewer than min_samples_split samples.
min_samples_splitint or float, default=2: The minimum number of samples required to split an internal node. - If int, treated as a constant count. - If float, treated as a fraction: ceil(min_samples_split * n_samples).
min_samples_leafint or float, default=1: The minimum number of samples required to be at a leaf node. - If int, treated as a constant count. - If float, treated as a fraction: ceil(min_samples_leaf * n_samples).
min_error_decreasefloat, default=0.0: A node will be split if the split induces a decrease of the error greater than or equal to this value.

Notes

The weighted error decrease is calculated as: Δerror = error - Σ_i (N_i / N) * error_i where N is the total samples, N_i is the samples in branch i, and error_i is the error in that branch.

References

[1]

L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984.

[2]

T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009.

Examples

>>> from sklearn_nominal.sklearn.tree_base import BaseTree
>>> class MyTree(BaseTree):
...     def __init__(self, **kwargs):
...         super().__init__(**kwargs)
...     def fit(self, X, y):
...         # implementation that uses self.build_splitter()
...         # and self.build_prune_criteria()
...         pass

Methods

`build_attribute_penalizer`()	Determines the penalization strategy for multi-valued attributes.
`build_prune_criteria`(d)	Translates tree constraints into internal pruning criteria.
`build_splitter`(e, p)	Constructs specialized split scorers for different column types.
`display`([class_names, title])	Displays the tree using the default system viewer or notebook output.
`export_dot`([class_names, title])	Exports the tree as a Graphviz dot string.
`export_dot_file`(filepath[, class_names, title])	Exports the tree as a Graphviz dot file.
`export_image`(filepath[, class_names, title])	Exports the tree as an image file.
`pretty_print`([class_names])	Returns a human-readable string representation of the tree.

build_attribute_penalizer()[source]#

Determines the penalization strategy for multi-valued attributes.

This is used to implement “gain_ratio”, which penalizes nominal attributes with many levels to prevent overfitting.

Returns:

sklearn_nominal.shared.ColumnPenalization: The penalization strategy (e.g., GainRatioPenalization or NoPenalization).

build_prune_criteria(d)[source]#

Translates tree constraints into internal pruning criteria.

This method converts user-facing parameters (which can be counts or fractions) into the absolute integer values required by the tree builders.

Parameters:

dDataset: The dataset used to calculate relative sample counts.

Returns:

PruneCriteria: The consolidated criteria used for pruning.

Return type:

PruneCriteria

build_splitter(e, p)[source]#

Constructs specialized split scorers for different column types.

This method maps the general splitter and criterion parameters into concrete ColumnError implementations for both Numeric and Nominal features.

Parameters:

esklearn_nominal.shared.TargetError: The target error function (e.g., Gini, Entropy, or MSE).
psklearn_nominal.shared.ColumnPenalization: The column-level penalization strategy.

Returns:

dict: A dictionary mapping ColumnType to its corresponding ColumnError scorer.

Raises:

ValueError: If the splitter value is neither “best” nor an integer, or if nominal_split is invalid.

display(class_names=None, title='')[source]#

Displays the tree using the default system viewer or notebook output.

Parameters:

class_nameslist of str, optional: The names of the classes for display.
titlestr, default=””: The title for the graph.

Returns:

Any: The image object for display in interactive environments.

export_dot(class_names=None, title='')[source]#

Exports the tree as a Graphviz dot string.

Parameters:

class_nameslist of str, optional: The names of the classes for display.
titlestr, default=””: The title for the graph.

Returns:

str: The tree in Graphviz dot format.

export_dot_file(filepath, class_names=None, title='')[source]#

Exports the tree as a Graphviz dot file.

Parameters:

filepathstr: The path to the file to save.
class_nameslist of str, optional: The names of the classes for display.
titlestr, default=””: The title for the graph.

export_image(filepath, class_names=None, title='')[source]#

Exports the tree as an image file.

This requires Graphviz to be installed on the system.

Parameters:

filepathstr: The path to the image file (e.g., “tree.png”).
class_nameslist of str, optional: The names of the classes for display.
titlestr, default=””: The title for the graph.

pretty_print(class_names=None)[source]#

Returns a human-readable string representation of the tree.

Parameters:

class_nameslist of str, optional: The names of the classes for display.

Returns:

str: The pretty-printed tree structure.

BaseTree#

This Page