clustering_metrics.entropy module

clustering_metrics.entropy.assignment_cost()
clustering_metrics.entropy.centropy()

Entropy of an iterable of counts (integers)

Assumes every entry in the list belongs to a different class. The resulting value is not normalized by N. Also note that the entropy value is calculated using natural base, which may not be what you want, so you may need to normalized it with log(base).

The ‘counts’ parameter is expected to be an list or tuple-like iterable. For convenience, it can also be a dict/mapping type, in which case its values will be used to calculate entropy.

clustering_metrics.entropy.cnum_pairs()

Binomial coefficient for k=2 (integer)

For non-vectorized computation, this is faster than calling scipy.misc.comb(x, 2) or scipy.special.binom(x, 2). Unlike with those two, the domain here extends into negative integers.

clustering_metrics.entropy.csum_pairs()

Count sum of possible pairs (integer)

Use n choose 2 to calculate sum of possible pairs.

clustering_metrics.entropy.emi_from_margins()

Calculate Expected Mutual Information given margins of RxC table

For the sake of numeric precision, the resulting value is not normalized by N.

License: BSD 3 clause

clustering_metrics.entropy.fentropy()

Entropy of an iterable of frequencies (floating point)

Assumes every entry in the list belongs to a different class. The resulting value is not normalized by N. Also note that the entropy value is calculated using natural base, which may not be what you want, so you may need to normalized it with log(base).

The ‘freqs’ parameter is expected to be an list or tuple-like iterable. For convenience, it can also be a dict/mapping type, in which case its values will be used to calculate entropy.

clustering_metrics.entropy.fnum_pairs()

Binomial coefficient for k=2 (floating point)

For non-vectorized computation, this is faster than calling scipy.misc.comb(x, 2) or scipy.special.binom(x, 2). Unlike with those two, the domain here extends into negative integers.

clustering_metrics.entropy.fsum_pairs()

Count sum of possible pairs (floating points)

Use n choose 2 to calculate sum of possible pairs.

clustering_metrics.entropy.lgamma()

Log of gamma function for scalar double x

This is a scalar-only replacement for scipy.special.gammaln. On scalar values, this method is ~10x faster than the corresponding SciPy one. On large arrays, however, even when vectorized using np.vectorize, this method is slower than the SciPy one, so use gammaln in those cases.

This function is borrowed verbatim from Scikit-Learn.

clustering_metrics.entropy.ndarray_from_iter()

Create NumPy arrays from different object types

In addition to standard np.asarray casting functionality, this function handles conversion from the following types: collections.Mapping, collections.Iterator.

If the input object is an instance of collections.Mapping, assumes that we are interesting in creating a NumPy array from the values.