data_describe.core.correlation

correlation_matrix(data, cluster=False, categorical=False, compute_backend=None, viz_backend=None, **kwargs)

Computes correlations (associations) and visualizes as a heatmap.

class data_describe.core.correlation.CorrelationWidget(association_matrix=None, cluster_matrix=None, categorical=None, viz_data=None, **kwargs)

Bases: data_describe._widget.BaseWidget

Container for correlation calculation and visualization.

This class (object) is returned from the correlation_matrix function. The attributes documented below can be accessed or extracted.

association_matrix

The combined association matrix i.e. correlation and other categorical-numeric or categorical-categorical associations.

cluster_matrix

The clustered association matrix.

categorical

True if association matrix contains categorical values.

Type

bool

viz_data

The final data to be visualized.

show(self, viz_backend=None, **kwargs)

The default display for this output.

Displays the correlation matrix heatmap.

Parameters
  • viz_backend – The visualization backend.

  • **kwargs – Keyword arguments.

Raises

ValueError – Computed data is missing.

Returns

The correlation matrix plot.

data_describe.core.correlation.correlation_matrix(data, cluster=False, categorical=False, compute_backend=None, viz_backend=None, **kwargs) → CorrelationWidget

Computes correlations (associations) and visualizes as a heatmap.

This feature combines measures of association for pairs of variables:
  • Numeric-numeric pairs: Pearson correlation

  • Categorical-numeric pairs: Correlation ratio

  • Categorical-categorical pairs
    • More than 2 levels: Cramer’s V

    • Only 2 levels for both variables: Point-biserial coefficient

Parameters
  • data (DataFrame) – A data frame

  • cluster (bool) – If True, use clustering to reorder similar columns together

  • categorical (bool) – If True, include categorical associations using Cramer’s V, Correlation Ratio, and Point-biserial coefficient (a.k.a. Matthews correlation coefficient). All associations (including Pearson correlation) are scaled to be in the range [0, 1].

  • compute_backend – The compute backend.

  • viz_backend – The visualization backend.

  • **kwargs – Keyword arguments.

Raises

ValueError – Invalid data input type.

Returns

CorrelationWidget