clustering_metrics.hungarian module¶

clustering_metrics.hungarian.linear_sum_assignment(cost_matrix)[source]¶

Solve the linear sum assignment problem.

The linear sum assignment problem is also known as minimum weight matching in bipartite graphs. A problem instance is described by a matrix C, where each C[i,j] is the cost of matching vertex i of the first partite set (a “worker”) and vertex j of the second set (a “job”). The goal is to find a complete assignment of workers to jobs of minimal cost.

Formally, let X be a boolean matrix where \(X[i,j] = 1\) iff row i is assigned to column j. Then the optimal assignment has cost

\[\min \sum_i \sum_j C_{i,j} X_{i,j}\]

s.t. each row is assignment to at most one column, and each column to at most one row.

This function can also solve a generalization of the classic assignment problem where the cost matrix is rectangular. If it has more rows than columns, then not every row needs to be assigned to a column, and vice versa.

The method used is the Hungarian algorithm, also known as the Munkres or Kuhn-Munkres algorithm.

Parameters:

cost_matrix : array

The cost matrix of the bipartite graph.

Returns:

row_ind, col_ind : array

An array of row indices and one of corresponding column indices giving the optimal assignment. The cost of the assignment can be computed as cost_matrix[row_ind, col_ind].sum(). The row indices will be sorted; in the case of a square cost matrix they will be equal to numpy.arange(cost_matrix.shape[0]).

Notes

New in version 0.17.0.

References

[991]

R.A. Pilgrim’s page on Munkres’ Assignemnt Algorithm

[992]

Harold W. Kuhn. The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly, 2:83-97, 1955.

[993]

Harold W. Kuhn. Variants of the Hungarian method for assignment problems. Naval Research Logistics Quarterly, 3: 253-258, 1956.

[994]

Munkres, J. Algorithms for the Assignment and Transportation Problems. J. SIAM, 5(1):32-38, March, 1957.

[995]

Wikipedia entry for the Hungarian algorithm

Examples

>>> cost = np.array([[4, 1, 3], [2, 0, 5], [3, 2, 2]])
>>> row_ind, col_ind = linear_sum_assignment(cost)
>>> col_ind
array([1, 0, 2])
>>> cost[row_ind, col_ind].sum()
5