data_describe.core.clustering¶
| 
 | Unsupervised determination of clusters. | 
- 
class data_describe.core.clustering.ClusterWidget(method: str, clusters: List[int] = None, estimator=None, n_clusters=None, search=False, cluster_range=None, **kwargs)¶
- Bases: - data_describe._widget.BaseWidget- Container for clustering calculations and visualization. - This class (object) is returned from the - clusterfunction. The attributes documented below can be accessed or extracted.- 
method¶
- {‘kmeans’, ‘hdbscan’} The type of clustering algorithm - Type
- str 
 
 - 
clusters¶
- The predicted cluster labels - Type
- List[int] 
 
 - 
estimator¶
- The clustering estimator/model 
 - 
input_data¶
- The input data 
 - 
scaled_data¶
- The data after applying standardization 
 - 
viz_data¶
- The data used for the default visualization i.e. reduced to 2 dimensions 
 - 
dim_method¶
- The algorithm used for dimensionality reduction - Type
- str 
 
 - 
reductor¶
- The dimensionality reduction estimator 
 - 
xlabel¶
- The x-axis label for the cluster plot - Type
- str 
 
 - 
ylabel¶
- The y-axis label for the cluster plot - Type
- str 
 
 - 
n_clusters¶
- (KMeans) The number of clusters ( - k) used in the final clustering fit.- Type
- int, optional 
 
 - 
search¶
- (KMeans) If True, a search was performed for optimal - n_clusters.- Type
- bool, optional 
 
 - 
cluster_range¶
- (KMeans) The range of clusters searched as (min_cluster, max_cluster). - Type
- Tuple[int, int], optional 
 
 - 
metric¶
- (KMeans) The metric used to evaluate the cluster search. - Type
- str, optional 
 
 - 
scores¶
- (KMeans) The metric scores in cluster search. - Type
- List 
 
 - 
show(self, viz_backend=None, **kwargs)¶
- The default display for this output. - Displays the clustered, projected data as a scatter plot, with points colored by
- the cluster labels. 
 - Parameters
- viz_backend – The visualization backend. 
- **kwargs – Keyword arguments. 
 
- Raises
- ValueError – Data to visualize is missing / not calculated. 
- Returns
- The cluster plot. 
 
 - 
cluster_search_plot(self, viz_backend=None, **kwargs)¶
- Shows the results of cluster search. - Cluster search attempts to find an optimal n_clusters by maximizing on some criterion. This plot shows a line plot of each n_cluster that was attempted and its score. - Parameters
- viz_backend – The visualization backend. 
- **kwargs – Additional keyword arguments to pass to the visualization backend. 
 
- Raises
- ValueError – Cluster search is False. 
- Returns
- The plot 
 
 
- 
- 
data_describe.core.clustering.cluster(data, method='kmeans', dim_method='pca', compute_backend=None, viz_backend=None, **kwargs) → ClusterWidget¶
- Unsupervised determination of clusters. - This feature computes clusters using various algorithms (KMeans, HDBSCAN) and then projects the data onto a two-dimensional plot for visualization. - Parameters
- data (DataFrame) – The data. 
- method (str, optional) – {‘kmeans’, ‘hdbscan’} The clustering method. 
- dim_method (str, optional) – The method to use for dimensionality reduction. 
- compute_backend (str, optional) – The compute backend. 
- viz_backend (str, optional) – The visualization backend. 
- n_clusters (Optional[int], optional) – (KMeans) The number of clusters. 
- cluster_range (Tuple[int, int], optional) – (KMeans) A tuple of the minimum and maximum cluster search range. Defaults to (2, 20). 
- metric (str) – (KMeans) The metric to optimize (from sklearn.metrics). 
- target – (KMeans) The labels for supervised clustering, as a 1-D array. 
- **kwargs – Keyword arguments. 
 
- Raises
- ValueError – Data frame required 
- ValueError – Clustering method not implemented 
 
- Returns
- ClusterWidget