Introduction¶
data-describe is a Python toolkit for Exploratory Data Analysis (EDA). It aims to accelerate data exploration and analysis by providing automated and opinionated analysis widgets.
Main Features¶
The main features of data-describe are organized as the “core”. These features are expected to be commonly used with most EDA applications on tabular data:
clustering: Clustering analysis and visualization on a 2D plot
correlation: Association measures for both numeric and categorical features
data heatmap: Data variation and missingness heatmap
data summary: Selected summary (descriptive) statistics
distribution: Histograms, violin plots, bar charts
scatter plots: Scatterplots
feature importance: Feature ranking
time series: Visualizing time series and other analysis
Example Usage¶
The core features (functions) are exported and can be used directly:
import data_describe as dd dd.data_summary(df)Non-core features need to be imported explicitly. For example, for text preprocessing:
from data_describe.text.text_preprocessing import preprocess_texts preprocess_texts(df.TEXT_COLUMN)
Extended Features¶
Additional features of data-describe include sensitive data detection (e.g. PII), text analysis, dimensionality reduction, and more. For more information on using these, check out the Examples or API Reference sections.