BaseTree#
- class sklearn_nominal.sklearn.tree_base.BaseTree(criterion='', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_error_decrease=0.0, attribute_penalization_importance=1.0, nominal_split='multi')[source]#
Base class for decision trees with native nominal attribute support.
This class serves as a shared foundation for both classification and regression trees, managing the hyperparameters that control tree growth, node splitting, and pruning strategies.
- Parameters:
- criterionstr, default=””
The function to measure the quality of a split. Supported criteria are “gini” for Gini impurity, and “entropy” or “gain_ratio” for Shannon information gain.
- splitterstr or int, default=”best”
The strategy used to choose the split at each numeric node. - “best”: evaluates all possible split points. - int: limits the maximum number of splits to consider per node.
- max_depthint, optional
The maximum depth of the tree. If None, nodes are expanded until all leaves are pure or contain fewer than
min_samples_splitsamples.- min_samples_splitint or float, default=2
The minimum number of samples required to split an internal node. - If int, treated as a constant count. - If float, treated as a fraction:
ceil(min_samples_split * n_samples).- min_samples_leafint or float, default=1
The minimum number of samples required to be at a leaf node. - If int, treated as a constant count. - If float, treated as a fraction:
ceil(min_samples_leaf * n_samples).- min_error_decreasefloat, default=0.0
A node will be split if the split induces a decrease of the error greater than or equal to this value.
Notes
The weighted error decrease is calculated as:
Δerror = error - Σ_i (N_i / N) * error_iwhereNis the total samples,N_iis the samples in branchi, anderror_iis the error in that branch.References
[1]L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984.
[2]T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009.
Examples
>>> from sklearn_nominal.sklearn.tree_base import BaseTree >>> class MyTree(BaseTree): ... def __init__(self, **kwargs): ... super().__init__(**kwargs) ... def fit(self, X, y): ... # implementation that uses self.build_splitter() ... # and self.build_prune_criteria() ... pass
Methods
Determines the penalization strategy for multi-valued attributes.
Translates tree constraints into internal pruning criteria.
build_splitter(e, p)Constructs specialized split scorers for different column types.
display([class_names, title])Displays the tree using the default system viewer or notebook output.
export_dot([class_names, title])Exports the tree as a Graphviz dot string.
export_dot_file(filepath[, class_names, title])Exports the tree as a Graphviz dot file.
export_image(filepath[, class_names, title])Exports the tree as an image file.
pretty_print([class_names])Returns a human-readable string representation of the tree.
- build_attribute_penalizer()[source]#
Determines the penalization strategy for multi-valued attributes.
This is used to implement “gain_ratio”, which penalizes nominal attributes with many levels to prevent overfitting.
- Returns:
- sklearn_nominal.shared.ColumnPenalization
The penalization strategy (e.g.,
GainRatioPenalizationorNoPenalization).
- build_prune_criteria(d)[source]#
Translates tree constraints into internal pruning criteria.
This method converts user-facing parameters (which can be counts or fractions) into the absolute integer values required by the tree builders.
- Parameters:
- dDataset
The dataset used to calculate relative sample counts.
- Returns:
- PruneCriteria
The consolidated criteria used for pruning.
- Return type:
PruneCriteria
- build_splitter(e, p)[source]#
Constructs specialized split scorers for different column types.
This method maps the general
splitterandcriterionparameters into concreteColumnErrorimplementations for both Numeric and Nominal features.- Parameters:
- esklearn_nominal.shared.TargetError
The target error function (e.g., Gini, Entropy, or MSE).
- psklearn_nominal.shared.ColumnPenalization
The column-level penalization strategy.
- Returns:
- dict
A dictionary mapping
ColumnTypeto its correspondingColumnErrorscorer.
- Raises:
- ValueError
If the
splittervalue is neither “best” nor an integer, or ifnominal_splitis invalid.
- display(class_names=None, title='')[source]#
Displays the tree using the default system viewer or notebook output.
- Parameters:
- class_nameslist of str, optional
The names of the classes for display.
- titlestr, default=””
The title for the graph.
- Returns:
- Any
The image object for display in interactive environments.
- export_dot(class_names=None, title='')[source]#
Exports the tree as a Graphviz dot string.
- Parameters:
- class_nameslist of str, optional
The names of the classes for display.
- titlestr, default=””
The title for the graph.
- Returns:
- str
The tree in Graphviz dot format.
- export_dot_file(filepath, class_names=None, title='')[source]#
Exports the tree as a Graphviz dot file.
- Parameters:
- filepathstr
The path to the file to save.
- class_nameslist of str, optional
The names of the classes for display.
- titlestr, default=””
The title for the graph.
- export_image(filepath, class_names=None, title='')[source]#
Exports the tree as an image file.
This requires Graphviz to be installed on the system.
- Parameters:
- filepathstr
The path to the image file (e.g., “tree.png”).
- class_nameslist of str, optional
The names of the classes for display.
- titlestr, default=””
The title for the graph.