data_describe.core.clustering
====================================

.. py:module:: data_describe.core.clustering


.. autoapisummary::

   data_describe.core.clustering.cluster


.. py:class:: ClusterWidget(method: str, clusters: List[int] = None, estimator=None, n_clusters=None, search=False, cluster_range=None, **kwargs)

   Bases: :class:`data_describe._widget.BaseWidget`

   Container for clustering calculations and visualization.

   This class (object) is returned from the ``cluster`` function. The
   attributes documented below can be accessed or extracted.

   .. attribute:: method

      {'kmeans', 'hdbscan'} The type of clustering algorithm

      :type: str

   .. attribute:: clusters

      The predicted cluster labels

      :type: List[int]

   .. attribute:: estimator

      The clustering estimator/model

   .. attribute:: input_data

      The input data

   .. attribute:: scaled_data

      The data after applying standardization

   .. attribute:: viz_data

      The data used for the default visualization i.e. reduced to 2 dimensions

   .. attribute:: dim_method

      The algorithm used for dimensionality reduction

      :type: str

   .. attribute:: reductor

      The dimensionality reduction estimator

   .. attribute:: xlabel

      The x-axis label for the cluster plot

      :type: str

   .. attribute:: ylabel

      The y-axis label for the cluster plot

      :type: str

   .. attribute:: n_clusters

      (KMeans) The number of clusters (``k``) used in the
      final clustering fit.

      :type: int, optional

   .. attribute:: search

      (KMeans) If True, a search was performed for optimal
      ``n_clusters``.

      :type: bool, optional

   .. attribute:: cluster_range

      (KMeans) The range of clusters
      searched as (min_cluster, max_cluster).

      :type: Tuple[int, int], optional

   .. attribute:: metric

      (KMeans) The metric used to evaluate the cluster search.

      :type: str, optional

   .. attribute:: scores

      (KMeans) The metric scores in cluster search.

      :type: List

   .. method:: show(self, viz_backend=None, **kwargs)


      The default display for this output.

      Displays the clustered, projected data as a scatter plot, with points colored by
          the cluster labels.

      :param viz_backend: The visualization backend.
      :param \*\*kwargs: Keyword arguments.

      :raises ValueError: Data to visualize is missing / not calculated.

      :returns: The cluster plot.


   .. method:: cluster_search_plot(self, viz_backend=None, **kwargs)


      Shows the results of cluster search.

      Cluster search attempts to find an optimal n_clusters by maximizing on some criterion.
      This plot shows a line plot of each n_cluster that was attempted and its score.

      :param viz_backend: The visualization backend.
      :param \*\*kwargs: Additional keyword arguments to pass to the visualization backend.

      :raises ValueError: Cluster `search` is False.

      :returns: The plot


.. function:: cluster(data, method='kmeans', dim_method='pca', compute_backend=None, viz_backend=None, **kwargs) -> ClusterWidget

   Unsupervised determination of clusters.

   This feature computes clusters using various algorithms (KMeans, HDBSCAN) and then
   projects the data onto a two-dimensional plot for visualization.

   :param data: The data.
   :type data: DataFrame
   :param method: {'kmeans', 'hdbscan'} The clustering method.
   :type method: str, optional
   :param dim_method: The method to use for dimensionality reduction.
   :type dim_method: str, optional
   :param compute_backend: The compute backend.
   :type compute_backend: str, optional
   :param viz_backend: The visualization backend.
   :type viz_backend: str, optional
   :param n_clusters: (KMeans) The number of clusters.
   :type n_clusters: Optional[int], optional
   :param cluster_range: (KMeans) A tuple of the minimum and
                         maximum cluster search range. Defaults to (2, 20).
   :type cluster_range: Tuple[int, int], optional
   :param metric: (KMeans) The metric to optimize (from sklearn.metrics).
   :type metric: str
   :param target: (KMeans) The labels for supervised clustering, as a 1-D array.
   :param \*\*kwargs: Keyword arguments.

   :raises ValueError: Data frame required
   :raises ValueError: Clustering method not implemented

   :returns: ClusterWidget