clustering_metrics.skutils module

This module contains code pasted verbatim from Scikit-Learn to avoid depending on scikit-learn

exception clustering_metrics.skutils.DataConversionWarning[source]

Bases: exceptions.UserWarning

A warning on implicit data conversions happening in the code

exception clustering_metrics.skutils.UndefinedMetricWarning[source]

Bases: exceptions.UserWarning

Warning used when the metric is invalid

clustering_metrics.skutils.assert_all_finite(X)[source]

Throw a ValueError if X contains NaN or infinity. Input MUST be an np.ndarray instance or a scipy.sparse matrix.

clustering_metrics.skutils.auc(x, y, reorder=False)[source]

Compute Area Under the Curve (AUC) using the trapezoidal rule This is a general function, given points on a curve. For computing the area under the ROC-curve, see roc_auc_score(). Parameters ———- x : array, shape = [n]

x coordinates.
y
: array, shape = [n]
y coordinates.
reorder
: boolean, optional (default=False)
If True, assume that the curve is ascending in the case of ties, as for an ROC curve. If the curve is non-ascending, the result will be wrong.

auc : float Examples ——–

>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1, 1, 2, 2])
>>> pred = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2)
>>> metrics.auc(fpr, tpr)
0.75

See also

roc_auc_score
Computes the area under the ROC curve
precision_recall_curve
Compute precision-recall pairs for different probability thresholds
clustering_metrics.skutils.check_consistent_length(*arrays)[source]

Check that all arrays have consistent first dimensions. Checks whether all objects in arrays have the same shape or length. Parameters ———- *arrays : list or tuple of input objects.

Objects that will be checked for consistent length.
clustering_metrics.skutils.column_or_1d(y, warn=False)[source]

Ravel column or 1d numpy array, else raises an error Parameters ———- y : array-like warn : boolean, default False

To control display of warnings.

y : array

clustering_metrics.skutils.roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)[source]

Compute Receiver operating characteristic (ROC) Note: this implementation is restricted to the binary classification task. Read more in the User Guide. Parameters ———- y_true : array, shape = [n_samples]

True binary labels in range {0, 1} or {-1, 1}. If labels are not binary, pos_label should be explicitly given.
y_score
: array, shape = [n_samples]
Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
pos_label
: int
Label considered as positive and others are considered negative.
sample_weight
: array-like of shape = [n_samples], optional
Sample weights.
drop_intermediate
: boolean, optional (default=True)

Whether to drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves. .. versionadded:: 0.17

parameter drop_intermediate.
fpr
: array, shape = [>2]
Increasing false positive rates such that element i is the false positive rate of predictions with score >= thresholds[i].
tpr
: array, shape = [>2]
Increasing true positive rates such that element i is the true positive rate of predictions with score >= thresholds[i].
thresholds
: array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute fpr and tpr. thresholds[0] represents no instances being predicted and is arbitrarily set to max(y_score) + 1.

roc_auc_score : Compute Area Under the Curve (AUC) from prediction scores Notes —– Since the thresholds are sorted from low to high values, they are reversed upon returning them to ensure they correspond to both fpr and tpr, which are sorted in reversed order during their calculation. References ———- .. [R45] `Wikipedia entry for the Receiver operating characteristic

>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1, 1, 2, 2])
>>> scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
>>> fpr
array([ 0. ,  0.5,  0.5,  1. ])
>>> tpr
array([ 0.5,  0.5,  1. ,  1. ])
>>> thresholds
array([ 0.8 ,  0.4 ,  0.35,  0.1 ])
clustering_metrics.skutils.stable_cumsum(arr, rtol=1e-05, atol=1e-08)[source]

Use high precision for cumsum and check that final value matches sum Parameters ———- arr : array-like

To be cumulatively summed as flat
rtol
: float
Relative tolerance, see np.allclose
atol
: float
Absolute tolerance, see np.allclose