cluster

class cluster.cluster.Cluster(level, *args)

Bases: object

A collection of items. This is internally used to detect clustered items in the data so we could distinguish other collection types (lists, dicts, …) from the actual clusters. This means that you could also create clusters of lists with this class.

display(depth=0)

Pretty-prints this cluster. Useful for debuging.

getlevel(threshold)

Retrieve all clusters up to a specific level threshold. This level-threshold represents the maximum distance between two clusters. So the lower you set this threshold, the more clusters you will receive and the higher you set it, you will receive less but bigger clusters.

Parameters:threshold – The level threshold:

Note

It is debatable whether the value passed into this method should really be as strongly linked to the real cluster-levels as it is right now. The end-user will not know the range of this value unless s/he first inspects the top-level cluster. So instead you might argue that a value ranging from 0 to 1 might be a more useful approach.

topology()

Returns the structure (topology) of the cluster as tuples.

Output from cl.data:

[<Cluster@0.833333333333(['CVS',
 <Cluster@0.818181818182(['34.xls',
 <Cluster@0.789473684211([<Cluster@0.555555555556(['0.txt',
 <Cluster@0.181818181818(['ChangeLog', 'ChangeLog.txt'])>])>,
 <Cluster@0.684210526316(['20060730.py',
 <Cluster@0.684210526316(['.cvsignore',
 <Cluster@0.647058823529(['About.py', <Cluster@0.625(['.idlerc',
 '.pylint.d'])>])>])>])>])>])>])>]

Corresponding output from cl.topo():

('CVS', ('34.xls', (('0.txt', ('ChangeLog', 'ChangeLog.txt')),
('20060730.py', ('.cvsignore', ('About.py',
('.idlerc', '.pylint.d')))))))