evoclusterstream.cluster package
Submodules
evoclusterstream.cluster.EvoDBSCAN module
From the paper: Evolutionary Clustering and Community Detection Algorithms for Social Media Health Surveillance
Kyle Spurlock, Tanner Bogart, Heba Elgazzar 2020
Version 1.2 of an Evolutionary Aadaptation of DBSCAN clustering algorithm.
Example
df = pd.read_csv(r”encoded_twitter_dataset.csv”) X = df.iloc[:1200,[3,4,5]] t_labels = np.unique(X[‘Time’])
evo1 = EvoDBSCAN(min_samples = 2) evo1.callSTATIC(X, beta)
# Evolutionary a = 0 evo2 = EvoDBSCAN(min_samples = 2) noise0 = evo2.callDBSCAN(X, t_labels, alpha = 0.8, beta=1)
- class evoclusterstream.cluster.EvoDBSCAN.EvoDBSCAN(min_samples=5, *, eps=0, metric='euclidean', algorithm='auto', leaf_size=30, p=None, n_jobs=None)[source]
Bases:
sklearn.cluster._dbscan.DBSCANImplementation of Evolutionary DBSCAN with dynamic radius measure.
Notes
Acts as a wrapper for sci-kit learn DBSCAN class.
- eps
Radius measure for finding neighbourhood of core point
- Type
float
- min_samples
Minimum number of neighbours to form a core point
- Type
int
- clusters_gen
Stores cluster count per generation
- Type
list
- noise_gen
Stores noise count per time generation
- Type
list
- eps_gen
Stores eps parameter per time generation
- Type
list
- metric
Distance metric (manhattan, euclidean, etc.)
- Type
str
- metric_params
Additional arguments for metric function
- Type
dict
- algorithm
Algorithm used to compute pointwise distances
- Type
str
- leaf_size
Specific to BallTree or cKDTree algorithm
- Type
int
- p
Power of Minkowski metric
- Type
int
- n_jobs
Number of parallel jobs
- Type
int
- callDBSCAN(X, times, alpha, beta=1, show_eps=False, plot_gens=None, save_plot=None)[source]
Perform evolutionary DBSCAN clustering.
- Parameters
X ('pd.DataFrame') – Dataframe of tabular data samples
times (list) – List containing times for X samples
alpha (float) – Parameter used to modulate epsilon by snapshot vs. history
beta (float, optional) – Optional scaling param for radius
show_eps (bool) – Verbose for comptued epsilon value
plot (list, optional) – Generations to show as plots
save_plot (str, optional) – Path to save plots
- Returns
- Return type
None
evoclusterstream.cluster.EvoLouvain module
From the paper: Evolutionary Clustering and Community Detection Algorithms for Social Media Health Surveillance
Kyle Spurlock, Tanner Bogart, Heba Elgazzar 2020
Version 1.2 of an Evolutionary adaptation of the Louvain Method
Example
df = pd.read_csv(r”encoded_twitter_dataset.csv”) X = df.iloc[:200,[3,4,5]] t_labels = np.unique(X[‘Time’])
evo8 = EvoLouvain() evo8.callLouvain(X, t_labels, alpha = .8, save_plot = “path/”)
- class evoclusterstream.cluster.EvoLouvain.EvoLouvain[source]
Bases:
objectImplementation of Evolutionary Louvain Method
Notes
Wraps dynamic smoothing around community_louvain by Thomas Aynaud
- modularities_
stores modularities per generation
- Type
list
- callLouvain(X, times, alpha, show_mod=False, plot_gens=None, save_plot=None)[source]
Perform Evolutionary Community Detection through the Louvain Method
- Parameters
X (pd.DataFrame) – Dataframe of tabular data samples
times (list) – List containing times for X samples
alpha (float) – Parameter used to modulate epsilon by snapshot vs. history
show_mod (bool, optional) – Verbose for comptued modularity values
plot_gens (list, optional) – Generations to show as plots
save_plot (str, optional) – Directory path to save plot as image
- Returns
- Return type
None
Module contents
- class evoclusterstream.cluster.EvoDBSCAN(min_samples=5, *, eps=0, metric='euclidean', algorithm='auto', leaf_size=30, p=None, n_jobs=None)[source]
Bases:
sklearn.cluster._dbscan.DBSCANImplementation of Evolutionary DBSCAN with dynamic radius measure.
Notes
Acts as a wrapper for sci-kit learn DBSCAN class.
- eps
Radius measure for finding neighbourhood of core point
- Type
float
- min_samples
Minimum number of neighbours to form a core point
- Type
int
- clusters_gen
Stores cluster count per generation
- Type
list
- noise_gen
Stores noise count per time generation
- Type
list
- eps_gen
Stores eps parameter per time generation
- Type
list
- metric
Distance metric (manhattan, euclidean, etc.)
- Type
str
- metric_params
Additional arguments for metric function
- Type
dict
- algorithm
Algorithm used to compute pointwise distances
- Type
str
- leaf_size
Specific to BallTree or cKDTree algorithm
- Type
int
- p
Power of Minkowski metric
- Type
int
- n_jobs
Number of parallel jobs
- Type
int
- callDBSCAN(X, times, alpha, beta=1, show_eps=False, plot_gens=None, save_plot=None)[source]
Perform evolutionary DBSCAN clustering.
- Parameters
X ('pd.DataFrame') – Dataframe of tabular data samples
times (list) – List containing times for X samples
alpha (float) – Parameter used to modulate epsilon by snapshot vs. history
beta (float, optional) – Optional scaling param for radius
show_eps (bool) – Verbose for comptued epsilon value
plot (list, optional) – Generations to show as plots
save_plot (str, optional) – Path to save plots
- Returns
- Return type
None