Introduction¶

data-describe is a Python toolkit for Exploratory Data Analysis (EDA). It aims to accelerate data exploration and analysis by providing automated and opinionated analysis widgets.

Main Features¶

The main features of data-describe are organized as the “core”. These features are expected to be commonly used with most EDA applications on tabular data:

clustering: Clustering analysis and visualization on a 2D plot

correlation: Association measures for both numeric and categorical features

data heatmap: Data variation and missingness heatmap

data summary: Selected summary (descriptive) statistics

distribution: Histograms, violin plots, bar charts

scatter plots: Scatterplots

feature importance: Feature ranking

time series: Visualizing time series and other analysis

Example Usage¶

The core features (functions) are exported and can be used directly:
import data_describe as dd
dd.data_summary(df)
Non-core features need to be imported explicitly. For example, for text preprocessing:
from data_describe.text.text_preprocessing import preprocess_texts
preprocess_texts(df.TEXT_COLUMN)

Extended Features¶

Additional features of data-describe include sensitive data detection (e.g. PII), text analysis, dimensionality reduction, and more. For more information on using these, check out the Examples or API Reference sections.