evoclusterstream.cluster package

Submodules

evoclusterstream.cluster.EvoDBSCAN module

From the paper: Evolutionary Clustering and Community Detection Algorithms for Social Media Health Surveillance

Kyle Spurlock, Tanner Bogart, Heba Elgazzar 2020

Version 1.2 of an Evolutionary Aadaptation of DBSCAN clustering algorithm.

Example

df = pd.read_csv(r”encoded_twitter_dataset.csv”) X = df.iloc[:1200,[3,4,5]] t_labels = np.unique(X[‘Time’])

evo1 = EvoDBSCAN(min_samples = 2) evo1.callSTATIC(X, beta)

# Evolutionary a = 0 evo2 = EvoDBSCAN(min_samples = 2) noise0 = evo2.callDBSCAN(X, t_labels, alpha = 0.8, beta=1)

class evoclusterstream.cluster.EvoDBSCAN.EvoDBSCAN(min_samples=5, *, eps=0, metric='euclidean', algorithm='auto', leaf_size=30, p=None, n_jobs=None)[source]

Bases: sklearn.cluster._dbscan.DBSCAN

Implementation of Evolutionary DBSCAN with dynamic radius measure.

Notes

Acts as a wrapper for sci-kit learn DBSCAN class.

eps

Radius measure for finding neighbourhood of core point

Type: float

min_samples

Minimum number of neighbours to form a core point

Type: int

clusters_gen

Stores cluster count per generation

Type: list

noise_gen

Stores noise count per time generation

Type: list

eps_gen

Stores eps parameter per time generation

Type: list

metric

Distance metric (manhattan, euclidean, etc.)

Type: str

metric_params

Additional arguments for metric function

Type: dict

algorithm

Algorithm used to compute pointwise distances

Type: str

leaf_size

Specific to BallTree or cKDTree algorithm

Type: int

p

Power of Minkowski metric

Type: int

n_jobs

Number of parallel jobs

Type: int

callDBSCAN(X, times, alpha, beta=1, show_eps=False, plot_gens=None, save_plot=None)[source]

Perform evolutionary DBSCAN clustering.

Parameters

X ('pd.DataFrame') – Dataframe of tabular data samples
times (list) – List containing times for X samples
alpha (float) – Parameter used to modulate epsilon by snapshot vs. history
beta (float, optional) – Optional scaling param for radius
show_eps (bool) – Verbose for comptued epsilon value
plot (list, optional) – Generations to show as plots
save_plot (str, optional) – Path to save plots

Returns

Return type

None

callSTATIC(X, beta, save_plot=None)[source]: “Normal DBSCAN implementation

showPlot(current_gen, time, labels, noise, alpha, save_plot=None)[source]: Performs plotting of clusters at generation

evoclusterstream.cluster.EvoLouvain module

From the paper: Evolutionary Clustering and Community Detection Algorithms for Social Media Health Surveillance

Kyle Spurlock, Tanner Bogart, Heba Elgazzar 2020

Version 1.2 of an Evolutionary adaptation of the Louvain Method

Example

df = pd.read_csv(r”encoded_twitter_dataset.csv”) X = df.iloc[:200,[3,4,5]] t_labels = np.unique(X[‘Time’])

evo8 = EvoLouvain() evo8.callLouvain(X, t_labels, alpha = .8, save_plot = “path/”)

class evoclusterstream.cluster.EvoLouvain.EvoLouvain[source]

Bases: object

Implementation of Evolutionary Louvain Method

Notes

Wraps dynamic smoothing around community_louvain by Thomas Aynaud

modularities_

stores modularities per generation

Type: list

applySmoothing(mat1, mat2, alpha)[source]: “Adds and applies smoothing effect to dist matrices

callLouvain(X, times, alpha, show_mod=False, plot_gens=None, save_plot=None)[source]

Perform Evolutionary Community Detection through the Louvain Method

Parameters

X (pd.DataFrame) – Dataframe of tabular data samples
times (list) – List containing times for X samples
alpha (float) – Parameter used to modulate epsilon by snapshot vs. history
show_mod (bool, optional) – Verbose for comptued modularity values
plot_gens (list, optional) – Generations to show as plots
save_plot (str, optional) – Directory path to save plot as image

Returns

Return type

None

showPlot(current_gen, time, partition, modularity, alpha, save_plot=None)[source]: Performs plotting of clusters at generation

sparsify(dist)[source]: Introducing sparsity into distance matrix, loose similarity

Module contents

class evoclusterstream.cluster.EvoDBSCAN(min_samples=5, *, eps=0, metric='euclidean', algorithm='auto', leaf_size=30, p=None, n_jobs=None)[source]

Bases: sklearn.cluster._dbscan.DBSCAN

Implementation of Evolutionary DBSCAN with dynamic radius measure.

Notes

Acts as a wrapper for sci-kit learn DBSCAN class.

eps

Radius measure for finding neighbourhood of core point

Type: float

min_samples

Minimum number of neighbours to form a core point

Type: int

clusters_gen

Stores cluster count per generation

Type: list

noise_gen

Stores noise count per time generation

Type: list

eps_gen

Stores eps parameter per time generation

Type: list

metric

Distance metric (manhattan, euclidean, etc.)

Type: str

metric_params

Additional arguments for metric function

Type: dict

algorithm

Algorithm used to compute pointwise distances

Type: str

leaf_size

Specific to BallTree or cKDTree algorithm

Type: int

p

Power of Minkowski metric

Type: int

n_jobs

Number of parallel jobs

Type: int

callDBSCAN(X, times, alpha, beta=1, show_eps=False, plot_gens=None, save_plot=None)[source]

Perform evolutionary DBSCAN clustering.

Parameters

X ('pd.DataFrame') – Dataframe of tabular data samples
times (list) – List containing times for X samples
alpha (float) – Parameter used to modulate epsilon by snapshot vs. history
beta (float, optional) – Optional scaling param for radius
show_eps (bool) – Verbose for comptued epsilon value
plot (list, optional) – Generations to show as plots
save_plot (str, optional) – Path to save plots

Returns

Return type

None

callSTATIC(X, beta, save_plot=None)[source]: “Normal DBSCAN implementation

showPlot(current_gen, time, labels, noise, alpha, save_plot=None)[source]: Performs plotting of clusters at generation