With unknown "dark" data, "unclean" data, structured and unstructured data, and data embedded in images and documents, it can be difficult to get a clear understanding of your data environment. data-describe profiles the data and reveals the true landscape of all of your data. This toolset provides a Data Scientist a rich set of tools chained together to automate common data analysis tasks. These insights help facilitate conversations among other data scientists, engineers, and business analysts, ultimately lending itself to future innovation. data-describe was built by contributors that have lead projects like Tensorflow, XGboost, Kubeflow, and MXNet, and who have combined over 40 years of Data Science Experience.
Streamlined data summaries for important statistics
Cluster similar items with unsupervised techniques
Out-of-the-box correlation matrices with categorical support
Quickly visualize data outliers and missing values with heatmaps
Univariate analysis with quick distribution plots
Baseline feature importance prior to model trials
Smart scatter plots using diagnostics
Dimensional Reduction | Visualize high-dimensional data using PCA and t-SNE |
---|---|
Sensitive Data | Identifies and handles things like PII |
Text and NLP | Tools for common tasks like text pre-processing & Topic Modeling |
Big Data Support | Uses Modin on top of Apache Arrow via Ray, or Dask |