BaseTree#

class sklearn_nominal.sklearn.tree_base.BaseTree(criterion='', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_error_decrease=0.0, attribute_penalization_importance=1.0, nominal_split='multi')[source]#

Base class for decision trees with native nominal attribute support.

This class serves as a shared foundation for both classification and regression trees, managing the hyperparameters that control tree growth, node splitting, and pruning strategies.

Parameters:
criterionstr, default=””

The function to measure the quality of a split. Supported criteria are “gini” for Gini impurity, and “entropy” or “gain_ratio” for Shannon information gain.

splitterstr or int, default=”best”

The strategy used to choose the split at each numeric node. - “best”: evaluates all possible split points. - int: limits the maximum number of splits to consider per node.

max_depthint, optional

The maximum depth of the tree. If None, nodes are expanded until all leaves are pure or contain fewer than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node. - If int, treated as a constant count. - If float, treated as a fraction: ceil(min_samples_split * n_samples).

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. - If int, treated as a constant count. - If float, treated as a fraction: ceil(min_samples_leaf * n_samples).

min_error_decreasefloat, default=0.0

A node will be split if the split induces a decrease of the error greater than or equal to this value.

Notes

The weighted error decrease is calculated as: Δerror = error - Σ_i (N_i / N) * error_i where N is the total samples, N_i is the samples in branch i, and error_i is the error in that branch.

References

[1]

L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984.

[2]

T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009.

Examples

>>> from sklearn_nominal.sklearn.tree_base import BaseTree
>>> class MyTree(BaseTree):
...     def __init__(self, **kwargs):
...         super().__init__(**kwargs)
...     def fit(self, X, y):
...         # implementation that uses self.build_splitter()
...         # and self.build_prune_criteria()
...         pass

Methods

build_attribute_penalizer()

Determines the penalization strategy for multi-valued attributes.

build_prune_criteria(d)

Translates tree constraints into internal pruning criteria.

build_splitter(e, p)

Constructs specialized split scorers for different column types.

display([class_names, title])

Displays the tree using the default system viewer or notebook output.

export_dot([class_names, title])

Exports the tree as a Graphviz dot string.

export_dot_file(filepath[, class_names, title])

Exports the tree as a Graphviz dot file.

export_image(filepath[, class_names, title])

Exports the tree as an image file.

pretty_print([class_names])

Returns a human-readable string representation of the tree.

build_attribute_penalizer()[source]#

Determines the penalization strategy for multi-valued attributes.

This is used to implement “gain_ratio”, which penalizes nominal attributes with many levels to prevent overfitting.

Returns:
sklearn_nominal.shared.ColumnPenalization

The penalization strategy (e.g., GainRatioPenalization or NoPenalization).

build_prune_criteria(d)[source]#

Translates tree constraints into internal pruning criteria.

This method converts user-facing parameters (which can be counts or fractions) into the absolute integer values required by the tree builders.

Parameters:
dDataset

The dataset used to calculate relative sample counts.

Returns:
PruneCriteria

The consolidated criteria used for pruning.

Return type:

PruneCriteria

build_splitter(e, p)[source]#

Constructs specialized split scorers for different column types.

This method maps the general splitter and criterion parameters into concrete ColumnError implementations for both Numeric and Nominal features.

Parameters:
esklearn_nominal.shared.TargetError

The target error function (e.g., Gini, Entropy, or MSE).

psklearn_nominal.shared.ColumnPenalization

The column-level penalization strategy.

Returns:
dict

A dictionary mapping ColumnType to its corresponding ColumnError scorer.

Raises:
ValueError

If the splitter value is neither “best” nor an integer, or if nominal_split is invalid.

display(class_names=None, title='')[source]#

Displays the tree using the default system viewer or notebook output.

Parameters:
class_nameslist of str, optional

The names of the classes for display.

titlestr, default=””

The title for the graph.

Returns:
Any

The image object for display in interactive environments.

export_dot(class_names=None, title='')[source]#

Exports the tree as a Graphviz dot string.

Parameters:
class_nameslist of str, optional

The names of the classes for display.

titlestr, default=””

The title for the graph.

Returns:
str

The tree in Graphviz dot format.

export_dot_file(filepath, class_names=None, title='')[source]#

Exports the tree as a Graphviz dot file.

Parameters:
filepathstr

The path to the file to save.

class_nameslist of str, optional

The names of the classes for display.

titlestr, default=””

The title for the graph.

export_image(filepath, class_names=None, title='')[source]#

Exports the tree as an image file.

This requires Graphviz to be installed on the system.

Parameters:
filepathstr

The path to the image file (e.g., “tree.png”).

class_nameslist of str, optional

The names of the classes for display.

titlestr, default=””

The title for the graph.

pretty_print(class_names=None)[source]#

Returns a human-readable string representation of the tree.

Parameters:
class_nameslist of str, optional

The names of the classes for display.

Returns:
str

The pretty-printed tree structure.