clustering_metrics.skutils module¶
This module contains code pasted verbatim from Scikit-Learn to avoid depending on scikit-learn
-
exception
clustering_metrics.skutils.
DataConversionWarning
[source]¶ Bases:
exceptions.UserWarning
A warning on implicit data conversions happening in the code
-
exception
clustering_metrics.skutils.
UndefinedMetricWarning
[source]¶ Bases:
exceptions.UserWarning
Warning used when the metric is invalid
-
clustering_metrics.skutils.
assert_all_finite
(X)[source]¶ Throw a ValueError if X contains NaN or infinity. Input MUST be an np.ndarray instance or a scipy.sparse matrix.
-
clustering_metrics.skutils.
auc
(x, y, reorder=False)[source]¶ Compute Area Under the Curve (AUC) using the trapezoidal rule This is a general function, given points on a curve. For computing the area under the ROC-curve, see
roc_auc_score()
. Parameters ———- x : array, shape = [n]x coordinates.- y : array, shape = [n]
- y coordinates.
- reorder : boolean, optional (default=False)
- If True, assume that the curve is ascending in the case of ties, as for an ROC curve. If the curve is non-ascending, the result will be wrong.
auc : float Examples ——–
>>> import numpy as np >>> from sklearn import metrics >>> y = np.array([1, 1, 2, 2]) >>> pred = np.array([0.1, 0.4, 0.35, 0.8]) >>> fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2) >>> metrics.auc(fpr, tpr) 0.75
See also
roc_auc_score
- Computes the area under the ROC curve
precision_recall_curve
- Compute precision-recall pairs for different probability thresholds
-
clustering_metrics.skutils.
check_consistent_length
(*arrays)[source]¶ Check that all arrays have consistent first dimensions. Checks whether all objects in arrays have the same shape or length. Parameters ———- *arrays : list or tuple of input objects.
Objects that will be checked for consistent length.
-
clustering_metrics.skutils.
column_or_1d
(y, warn=False)[source]¶ Ravel column or 1d numpy array, else raises an error Parameters ———- y : array-like warn : boolean, default False
To control display of warnings.y : array
-
clustering_metrics.skutils.
roc_curve
(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)[source]¶ Compute Receiver operating characteristic (ROC) Note: this implementation is restricted to the binary classification task. Read more in the User Guide. Parameters ———- y_true : array, shape = [n_samples]
True binary labels in range {0, 1} or {-1, 1}. If labels are not binary, pos_label should be explicitly given.- y_score : array, shape = [n_samples]
- Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
- pos_label : int
- Label considered as positive and others are considered negative.
- sample_weight : array-like of shape = [n_samples], optional
- Sample weights.
- drop_intermediate : boolean, optional (default=True)
Whether to drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves. .. versionadded:: 0.17
parameter drop_intermediate.
- fpr : array, shape = [>2]
- Increasing false positive rates such that element i is the false positive rate of predictions with score >= thresholds[i].
- tpr : array, shape = [>2]
- Increasing true positive rates such that element i is the true positive rate of predictions with score >= thresholds[i].
- thresholds : array, shape = [n_thresholds]
- Decreasing thresholds on the decision function used to compute fpr and tpr. thresholds[0] represents no instances being predicted and is arbitrarily set to max(y_score) + 1.
roc_auc_score : Compute Area Under the Curve (AUC) from prediction scores Notes —– Since the thresholds are sorted from low to high values, they are reversed upon returning them to ensure they correspond to both
fpr
andtpr
, which are sorted in reversed order during their calculation. References ———- .. [R45] `Wikipedia entry for the Receiver operating characteristic>>> import numpy as np >>> from sklearn import metrics >>> y = np.array([1, 1, 2, 2]) >>> scores = np.array([0.1, 0.4, 0.35, 0.8]) >>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2) >>> fpr array([ 0. , 0.5, 0.5, 1. ]) >>> tpr array([ 0.5, 0.5, 1. , 1. ]) >>> thresholds array([ 0.8 , 0.4 , 0.35, 0.1 ])
-
clustering_metrics.skutils.
stable_cumsum
(arr, rtol=1e-05, atol=1e-08)[source]¶ Use high precision for cumsum and check that final value matches sum Parameters ———- arr : array-like
To be cumulatively summed as flat- rtol : float
- Relative tolerance, see
np.allclose
- atol : float
- Absolute tolerance, see
np.allclose