cluster.method.hierarchical¶

class
cluster.method.hierarchical.
HierarchicalClustering
(data, distance_function, linkage=None, num_processes=1, progress_callback=None)¶ Bases:
cluster.method.base.BaseClusterMethod
Implementation of the hierarchical clustering method as explained in a tutorial by matteucc.
Object prerequisites:
 Items must be sortable (See issue #11)
 Items must be hashable.
Example:
>>> from cluster import HierarchicalClustering >>> # or: from cluster import * >>> cl = HierarchicalClustering([123,334,345,242,234,1,3], lambda x,y: float(abs(xy))) >>> cl.getlevel(90) [[345, 334], [234, 242], [123], [3, 1]]
Note that all of the returned clusters are more than 90 (
getlevel(90)
) apart.See
BaseClusterMethod
for more details.Parameters:  data – The collection of items to be clustered.
 distance_function – A function which takes two elements of
data
and returns a distance between both elements (note that the distance should not be returned as negative value!)  linkage – The method used to determine the distance between two
clusters. See
set_linkage_method()
for possible values.  num_processes – If you want to use multiprocessing to split up the
work and run
genmatrix()
in parallel, specify num_processes > 1 and this number of workers will be spun up, the work split up amongst them evenly.  progress_callback – A function to be called on each iteration to publish the progress. The function is called with two integer arguments which represent the total number of elements in the cluster, and the remaining elements to be clustered.

cluster
(matrix=None, level=None, sequence=None)¶ Perform hierarchical clustering.
Parameters:  matrix – The 2D list that is currently under processing. The matrix contains the distances of each item with each other
 level – The current level of clustering
 sequence – The sequence number of the clustering

display
()¶ Prints a simple dendogramlike representation of the full cluster to the console.

getlevel
(threshold)¶ Returns all clusters with a maximum distance of threshold in between each other
Parameters: threshold – the maximum distance between clusters. See
getlevel()

publish_progress
(total, current)¶ If a progress function was supplied, this will call that function with the total number of elements, and the remaining number of elements.
Parameters:  total – The total number of elements.
 remaining – The remaining number of elements.

set_linkage_method
(method)¶ Sets the method to determine the distance between two clusters.
Parameters: method – The method to use. It can be one of 'single'
,'complete'
,'average'
or'uclus'
, or a callable. The callable should take two collections as parameters and return a distance value between both collections.