imagecluster.calc.cluster

imagecluster.calc.cluster(fingerprints, sim=0.5, timestamps=None, alpha=0.3, method='average', metric='euclidean', extra_out=False, print_stats=True, min_csize=2)[source]

Hierarchical clustering of images based on image fingerprints, optionally scaled by time distance (alpha).

Parameters:

fingerprints: dict

output of fingerprints()

sim : float 0..1

similarity index

timestamps: dict

alpha : float

mixing parameter of image content distance and time distance, ignored if timestamps is None

method : see scipy.cluster.hierarchy.linkage(), all except ‘centroid’ produce

pretty much the same result

metric : see scipy.cluster.hierarchy.linkage(), make sure to use

‘euclidean’ in case of method=’centroid’, ‘median’ or ‘ward’

extra_out : bool

additionally return internal variables for debugging

print_stats : bool

min_csize : int

return clusters with at least that many elements

Returns:

clusters [, extra]

clusters : dict

We call a list of file names a “cluster”.

keys = size of clusters (number of elements (images) csize)
value = list of clusters with that size
{csize : [[filename, filename, ...],
          [filename, filename, ...],
          ...
          ],
csize : [...]}

extra : dict

if extra_out is True