{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.011413,
"end_time": "2020-12-08T22:00:30.201566",
"exception": false,
"start_time": "2020-12-08T22:00:30.190153",
"status": "completed"
},
"tags": []
},
"source": [
"# Data Summary\n",
"\n",
"The Data Summary feature provides an overview of the data using summary statistics. The output is similar to using `pandas.DataFrame.describe`, however, a different set of statistics are selected to address common questions about the data.\n",
"\n",
"- Data Type: The data type\n",
"- Nulls: The number (count) or percentage of null values. Primarily for identifying missing data.\n",
"- Zeros: The number (count) or percentage of zero values. Zero is commonly used as a special number and may indicate abnormalities.\n",
"- Min, Max: The minimum and maximum values. Used to identify extreme values (outliers).\n",
"- Median, Mean, Standard Deviation: Used to identify skew.\n",
"- Unique: Number of unique values (levels). Used to identify high cardinality.\n",
"- Top Frequency: The number (count) or percentage of values equaling the mode. Used to identify imbalanced data."
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.005885,
"end_time": "2020-12-08T22:00:30.215499",
"exception": false,
"start_time": "2020-12-08T22:00:30.209614",
"status": "completed"
},
"tags": []
},
"source": [
"## Example data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2020-12-08T22:00:30.230862Z",
"iopub.status.busy": "2020-12-08T22:00:30.230153Z",
"iopub.status.idle": "2020-12-08T22:00:31.711283Z",
"shell.execute_reply": "2020-12-08T22:00:31.711731Z"
},
"papermill": {
"duration": 1.490596,
"end_time": "2020-12-08T22:00:31.711964",
"exception": false,
"start_time": "2020-12-08T22:00:30.221368",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"from datetime import datetime\n",
"import pandas as pd\n",
"from sklearn.datasets import load_boston\n",
"\n",
"import data_describe as dd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2020-12-08T22:00:31.727540Z",
"iopub.status.busy": "2020-12-08T22:00:31.726630Z",
"iopub.status.idle": "2020-12-08T22:00:31.748134Z",
"shell.execute_reply": "2020-12-08T22:00:31.748515Z"
},
"papermill": {
"duration": 0.031568,
"end_time": "2020-12-08T22:00:31.748688",
"exception": false,
"start_time": "2020-12-08T22:00:31.717120",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CRIM | \n",
" ZN | \n",
" INDUS | \n",
" CHAS | \n",
" NOX | \n",
" RM | \n",
" AGE | \n",
" DIS | \n",
" RAD | \n",
" TAX | \n",
" PTRATIO | \n",
" B | \n",
" LSTAT | \n",
" target | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.00632 | \n",
" 18.0 | \n",
" 2.31 | \n",
" 0.0 | \n",
" 0.538 | \n",
" 6.575 | \n",
" 65.2 | \n",
" 4.09 | \n",
" 1.0 | \n",
" 296.0 | \n",
" 15.3 | \n",
" 396.9 | \n",
" 4.98 | \n",
" 24.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO \\\n",
"0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.09 1.0 296.0 15.3 \n",
"\n",
" B LSTAT target \n",
"0 396.9 4.98 24.0 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = load_boston()\n",
"df = pd.DataFrame(data.data, columns=list(data.feature_names))\n",
"df['target'] = data.target\n",
"df.head(1)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2020-12-08T22:00:31.766432Z",
"iopub.status.busy": "2020-12-08T22:00:31.765707Z",
"iopub.status.idle": "2020-12-08T22:00:31.779538Z",
"shell.execute_reply": "2020-12-08T22:00:31.779930Z"
},
"papermill": {
"duration": 0.025,
"end_time": "2020-12-08T22:00:31.780163",
"exception": false,
"start_time": "2020-12-08T22:00:31.755163",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"# Change data types to demonstrate data summary\n",
"df['CRIM'] = df['CRIM'] / 1000000\n",
"df['AGE'] = df['AGE'].map(lambda x: \"young\" if x < 29 else \"old\")\n",
"df[\"AgeFlag\"] = df['AGE'].astype(bool)\n",
"df['ZN'] = df['ZN'].astype(int)\n",
"df['Date'] = datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p')"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.0057,
"end_time": "2020-12-08T22:00:31.791982",
"exception": false,
"start_time": "2020-12-08T22:00:31.786282",
"status": "completed"
},
"tags": []
},
"source": [
"## Default\n",
"The defaults for `data_summary` attempts to format floats to be easier to read by disabling scientific notation and limiting the number of decimal places shown."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2020-12-08T22:00:31.806589Z",
"iopub.status.busy": "2020-12-08T22:00:31.805868Z",
"iopub.status.idle": "2020-12-08T22:00:32.124701Z",
"shell.execute_reply": "2020-12-08T22:00:32.125088Z"
},
"papermill": {
"duration": 0.327942,
"end_time": "2020-12-08T22:00:32.125266",
"exception": false,
"start_time": "2020-12-08T22:00:31.797324",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Info | \n",
"
\n",
" \n",
" \n",
" \n",
" Rows | \n",
" 506 | \n",
"
\n",
" \n",
" Columns | \n",
" 16 | \n",
"
\n",
" \n",
" Size in Memory | \n",
" 59.9 KB | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Info\n",
"Rows 506\n",
"Columns 16\n",
"Size in Memory 59.9 KB"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" | Data Type | Nulls | Zeros | Min | Median | Max | Mean | Standard Deviation | Unique | Top Frequency |
\n",
" \n",
" CRIM | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 0.0000000063 | \n",
" 0.00000026 | \n",
" 0.000089 | \n",
" 0.0000036 | \n",
" 0.0000086 | \n",
" 504 | \n",
" 2 | \n",
"
\n",
" \n",
" ZN | \n",
" int64 | \n",
" 0 | \n",
" 372 | \n",
" 0 | \n",
" 0 | \n",
" 100 | \n",
" 11.35 | \n",
" 23.29 | \n",
" 26 | \n",
" 372 | \n",
"
\n",
" \n",
" INDUS | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 0.46 | \n",
" 9.69 | \n",
" 27.74 | \n",
" 11.14 | \n",
" 6.85 | \n",
" 76 | \n",
" 132 | \n",
"
\n",
" \n",
" CHAS | \n",
" float64 | \n",
" 0 | \n",
" 471 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0.069 | \n",
" 0.25 | \n",
" 2 | \n",
" 471 | \n",
"
\n",
" \n",
" NOX | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 0.39 | \n",
" 0.54 | \n",
" 0.87 | \n",
" 0.55 | \n",
" 0.12 | \n",
" 81 | \n",
" 23 | \n",
"
\n",
" \n",
" RM | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 3.56 | \n",
" 6.21 | \n",
" 8.78 | \n",
" 6.28 | \n",
" 0.70 | \n",
" 446 | \n",
" 3 | \n",
"
\n",
" \n",
" AGE | \n",
" object | \n",
" 0 | \n",
" 0 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" 2 | \n",
" 446 | \n",
"
\n",
" \n",
" DIS | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 1.13 | \n",
" 3.21 | \n",
" 12.13 | \n",
" 3.80 | \n",
" 2.10 | \n",
" 412 | \n",
" 5 | \n",
"
\n",
" \n",
" RAD | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 5 | \n",
" 24 | \n",
" 9.55 | \n",
" 8.70 | \n",
" 9 | \n",
" 132 | \n",
"
\n",
" \n",
" TAX | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 187 | \n",
" 330 | \n",
" 711 | \n",
" 408.24 | \n",
" 168.37 | \n",
" 66 | \n",
" 132 | \n",
"
\n",
" \n",
" PTRATIO | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 12.60 | \n",
" 19.050 | \n",
" 22 | \n",
" 18.46 | \n",
" 2.16 | \n",
" 46 | \n",
" 140 | \n",
"
\n",
" \n",
" B | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 0.32 | \n",
" 391.44 | \n",
" 396.90 | \n",
" 356.67 | \n",
" 91.20 | \n",
" 357 | \n",
" 121 | \n",
"
\n",
" \n",
" LSTAT | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 1.73 | \n",
" 11.36 | \n",
" 37.97 | \n",
" 12.65 | \n",
" 7.13 | \n",
" 455 | \n",
" 3 | \n",
"
\n",
" \n",
" target | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 5 | \n",
" 21.20 | \n",
" 50 | \n",
" 22.53 | \n",
" 9.19 | \n",
" 229 | \n",
" 16 | \n",
"
\n",
" \n",
" AgeFlag | \n",
" bool | \n",
" 0 | \n",
" 0 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" 1 | \n",
" 506 | \n",
"
\n",
" \n",
" Date | \n",
" datetime64[ns] | \n",
" 0 | \n",
" 0 | \n",
" 2008-01-01 13:30:00 | \n",
" | \n",
" 2008-01-01 13:30:00 | \n",
" | \n",
" | \n",
" 1 | \n",
" 506 | \n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"None"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"data-describe Summary Widget"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dd.data_summary(df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.009868,
"end_time": "2020-12-08T22:00:32.142324",
"exception": false,
"start_time": "2020-12-08T22:00:32.132456",
"status": "completed"
},
"tags": []
},
"source": [
"## Display counts as percentage\n",
"To display the count statistics as a percentage (over the total record count), use `as_percentage=True`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2020-12-08T22:00:32.172926Z",
"iopub.status.busy": "2020-12-08T22:00:32.170150Z",
"iopub.status.idle": "2020-12-08T22:00:32.209777Z",
"shell.execute_reply": "2020-12-08T22:00:32.210256Z"
},
"papermill": {
"duration": 0.060745,
"end_time": "2020-12-08T22:00:32.210432",
"exception": false,
"start_time": "2020-12-08T22:00:32.149687",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Info | \n",
"
\n",
" \n",
" \n",
" \n",
" Rows | \n",
" 506 | \n",
"
\n",
" \n",
" Columns | \n",
" 16 | \n",
"
\n",
" \n",
" Size in Memory | \n",
" 59.9 KB | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Info\n",
"Rows 506\n",
"Columns 16\n",
"Size in Memory 59.9 KB"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" | Data Type | Nulls | Zeros | Min | Median | Max | Mean | Standard Deviation | Unique | Top Frequency |
\n",
" \n",
" CRIM | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 0.0000000063 | \n",
" 0.00000026 | \n",
" 0.000089 | \n",
" 0.0000036 | \n",
" 0.0000086 | \n",
" 504 | \n",
" 0.4% | \n",
"
\n",
" \n",
" ZN | \n",
" int64 | \n",
" 0.0% | \n",
" 73.5% | \n",
" 0 | \n",
" 0 | \n",
" 100 | \n",
" 11.35 | \n",
" 23.29 | \n",
" 26 | \n",
" 73.5% | \n",
"
\n",
" \n",
" INDUS | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 0.46 | \n",
" 9.69 | \n",
" 27.74 | \n",
" 11.14 | \n",
" 6.85 | \n",
" 76 | \n",
" 26.1% | \n",
"
\n",
" \n",
" CHAS | \n",
" float64 | \n",
" 0.0% | \n",
" 93.1% | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0.069 | \n",
" 0.25 | \n",
" 2 | \n",
" 93.1% | \n",
"
\n",
" \n",
" NOX | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 0.39 | \n",
" 0.54 | \n",
" 0.87 | \n",
" 0.55 | \n",
" 0.12 | \n",
" 81 | \n",
" 4.5% | \n",
"
\n",
" \n",
" RM | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 3.56 | \n",
" 6.21 | \n",
" 8.78 | \n",
" 6.28 | \n",
" 0.70 | \n",
" 446 | \n",
" 0.6% | \n",
"
\n",
" \n",
" AGE | \n",
" object | \n",
" 0.0% | \n",
" 0.0% | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" 2 | \n",
" 88.1% | \n",
"
\n",
" \n",
" DIS | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 1.13 | \n",
" 3.21 | \n",
" 12.13 | \n",
" 3.80 | \n",
" 2.10 | \n",
" 412 | \n",
" 1.0% | \n",
"
\n",
" \n",
" RAD | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 1 | \n",
" 5 | \n",
" 24 | \n",
" 9.55 | \n",
" 8.70 | \n",
" 9 | \n",
" 26.1% | \n",
"
\n",
" \n",
" TAX | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 187 | \n",
" 330 | \n",
" 711 | \n",
" 408.24 | \n",
" 168.37 | \n",
" 66 | \n",
" 26.1% | \n",
"
\n",
" \n",
" PTRATIO | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 12.60 | \n",
" 19.050 | \n",
" 22 | \n",
" 18.46 | \n",
" 2.16 | \n",
" 46 | \n",
" 27.7% | \n",
"
\n",
" \n",
" B | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 0.32 | \n",
" 391.44 | \n",
" 396.90 | \n",
" 356.67 | \n",
" 91.20 | \n",
" 357 | \n",
" 23.9% | \n",
"
\n",
" \n",
" LSTAT | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 1.73 | \n",
" 11.36 | \n",
" 37.97 | \n",
" 12.65 | \n",
" 7.13 | \n",
" 455 | \n",
" 0.6% | \n",
"
\n",
" \n",
" target | \n",
" float64 | \n",
" 0.0% | \n",
" 0.0% | \n",
" 5 | \n",
" 21.20 | \n",
" 50 | \n",
" 22.53 | \n",
" 9.19 | \n",
" 229 | \n",
" 3.2% | \n",
"
\n",
" \n",
" AgeFlag | \n",
" bool | \n",
" 0.0% | \n",
" 0.0% | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" 1 | \n",
" 100.0% | \n",
"
\n",
" \n",
" Date | \n",
" datetime64[ns] | \n",
" 0.0% | \n",
" 0.0% | \n",
" 2008-01-01 13:30:00 | \n",
" | \n",
" 2008-01-01 13:30:00 | \n",
" | \n",
" | \n",
" 1 | \n",
" 100.0% | \n",
"
\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"None"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"data-describe Summary Widget"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dd.data_summary(df, as_percentage=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.008632,
"end_time": "2020-12-08T22:00:32.227499",
"exception": false,
"start_time": "2020-12-08T22:00:32.218867",
"status": "completed"
},
"tags": []
},
"source": [
"## Disable auto float formatting\n",
"If the formatting logic is not desired, use `auto_float=False`. Depending on your data, there may not be a significant difference in the output."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2020-12-08T22:00:32.260924Z",
"iopub.status.busy": "2020-12-08T22:00:32.258066Z",
"iopub.status.idle": "2020-12-08T22:00:32.298634Z",
"shell.execute_reply": "2020-12-08T22:00:32.299144Z"
},
"papermill": {
"duration": 0.062812,
"end_time": "2020-12-08T22:00:32.299319",
"exception": false,
"start_time": "2020-12-08T22:00:32.236507",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Info | \n",
"
\n",
" \n",
" \n",
" \n",
" Rows | \n",
" 506 | \n",
"
\n",
" \n",
" Columns | \n",
" 16 | \n",
"
\n",
" \n",
" Size in Memory | \n",
" 59.9 KB | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Info\n",
"Rows 506\n",
"Columns 16\n",
"Size in Memory 59.9 KB"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Data Type | \n",
" Nulls | \n",
" Zeros | \n",
" Min | \n",
" Median | \n",
" Max | \n",
" Mean | \n",
" Standard Deviation | \n",
" Unique | \n",
" Top Frequency | \n",
"
\n",
" \n",
" \n",
" \n",
" CRIM | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 6.32e-09 | \n",
" 2.5651e-07 | \n",
" 8.89762e-05 | \n",
" 3.61352e-06 | \n",
" 8.59304e-06 | \n",
" 504 | \n",
" 2 | \n",
"
\n",
" \n",
" ZN | \n",
" int64 | \n",
" 0 | \n",
" 372 | \n",
" 0 | \n",
" 0 | \n",
" 100 | \n",
" 11.3478 | \n",
" 23.2875 | \n",
" 26 | \n",
" 372 | \n",
"
\n",
" \n",
" INDUS | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 0.46 | \n",
" 9.69 | \n",
" 27.74 | \n",
" 11.1368 | \n",
" 6.85357 | \n",
" 76 | \n",
" 132 | \n",
"
\n",
" \n",
" CHAS | \n",
" float64 | \n",
" 0 | \n",
" 471 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0.06917 | \n",
" 0.253743 | \n",
" 2 | \n",
" 471 | \n",
"
\n",
" \n",
" NOX | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 0.385 | \n",
" 0.538 | \n",
" 0.871 | \n",
" 0.554695 | \n",
" 0.115763 | \n",
" 81 | \n",
" 23 | \n",
"
\n",
" \n",
" RM | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 3.561 | \n",
" 6.2085 | \n",
" 8.78 | \n",
" 6.28463 | \n",
" 0.701923 | \n",
" 446 | \n",
" 3 | \n",
"
\n",
" \n",
" AGE | \n",
" object | \n",
" 0 | \n",
" 0 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" 2 | \n",
" 446 | \n",
"
\n",
" \n",
" DIS | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 1.1296 | \n",
" 3.20745 | \n",
" 12.1265 | \n",
" 3.79504 | \n",
" 2.10363 | \n",
" 412 | \n",
" 5 | \n",
"
\n",
" \n",
" RAD | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 5 | \n",
" 24 | \n",
" 9.54941 | \n",
" 8.69865 | \n",
" 9 | \n",
" 132 | \n",
"
\n",
" \n",
" TAX | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 187 | \n",
" 330 | \n",
" 711 | \n",
" 408.237 | \n",
" 168.37 | \n",
" 66 | \n",
" 132 | \n",
"
\n",
" \n",
" PTRATIO | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 12.6 | \n",
" 19.05 | \n",
" 22 | \n",
" 18.4555 | \n",
" 2.16281 | \n",
" 46 | \n",
" 140 | \n",
"
\n",
" \n",
" B | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 0.32 | \n",
" 391.44 | \n",
" 396.9 | \n",
" 356.674 | \n",
" 91.2046 | \n",
" 357 | \n",
" 121 | \n",
"
\n",
" \n",
" LSTAT | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 1.73 | \n",
" 11.36 | \n",
" 37.97 | \n",
" 12.6531 | \n",
" 7.134 | \n",
" 455 | \n",
" 3 | \n",
"
\n",
" \n",
" target | \n",
" float64 | \n",
" 0 | \n",
" 0 | \n",
" 5 | \n",
" 21.2 | \n",
" 50 | \n",
" 22.5328 | \n",
" 9.18801 | \n",
" 229 | \n",
" 16 | \n",
"
\n",
" \n",
" AgeFlag | \n",
" bool | \n",
" 0 | \n",
" 0 | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" 1 | \n",
" 506 | \n",
"
\n",
" \n",
" Date | \n",
" datetime64[ns] | \n",
" 0 | \n",
" 0 | \n",
" 2008-01-01 13:30:00 | \n",
" | \n",
" 2008-01-01 13:30:00 | \n",
" | \n",
" | \n",
" 1 | \n",
" 506 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Data Type Nulls Zeros Min Median \\\n",
"CRIM float64 0 0 6.32e-09 2.5651e-07 \n",
"ZN int64 0 372 0 0 \n",
"INDUS float64 0 0 0.46 9.69 \n",
"CHAS float64 0 471 0 0 \n",
"NOX float64 0 0 0.385 0.538 \n",
"RM float64 0 0 3.561 6.2085 \n",
"AGE object 0 0 \n",
"DIS float64 0 0 1.1296 3.20745 \n",
"RAD float64 0 0 1 5 \n",
"TAX float64 0 0 187 330 \n",
"PTRATIO float64 0 0 12.6 19.05 \n",
"B float64 0 0 0.32 391.44 \n",
"LSTAT float64 0 0 1.73 11.36 \n",
"target float64 0 0 5 21.2 \n",
"AgeFlag bool 0 0 \n",
"Date datetime64[ns] 0 0 2008-01-01 13:30:00 \n",
"\n",
" Max Mean Standard Deviation Unique \\\n",
"CRIM 8.89762e-05 3.61352e-06 8.59304e-06 504 \n",
"ZN 100 11.3478 23.2875 26 \n",
"INDUS 27.74 11.1368 6.85357 76 \n",
"CHAS 1 0.06917 0.253743 2 \n",
"NOX 0.871 0.554695 0.115763 81 \n",
"RM 8.78 6.28463 0.701923 446 \n",
"AGE 2 \n",
"DIS 12.1265 3.79504 2.10363 412 \n",
"RAD 24 9.54941 8.69865 9 \n",
"TAX 711 408.237 168.37 66 \n",
"PTRATIO 22 18.4555 2.16281 46 \n",
"B 396.9 356.674 91.2046 357 \n",
"LSTAT 37.97 12.6531 7.134 455 \n",
"target 50 22.5328 9.18801 229 \n",
"AgeFlag 1 \n",
"Date 2008-01-01 13:30:00 1 \n",
"\n",
" Top Frequency \n",
"CRIM 2 \n",
"ZN 372 \n",
"INDUS 132 \n",
"CHAS 471 \n",
"NOX 23 \n",
"RM 3 \n",
"AGE 446 \n",
"DIS 5 \n",
"RAD 132 \n",
"TAX 132 \n",
"PTRATIO 140 \n",
"B 121 \n",
"LSTAT 3 \n",
"target 16 \n",
"AgeFlag 506 \n",
"Date 506 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"None"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"data-describe Summary Widget"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dd.data_summary(df, auto_float=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"papermill": {
"duration": 4.379267,
"end_time": "2020-12-08T22:00:33.482384",
"environment_variables": {},
"exception": null,
"input_path": "/Users/richardtruong-chau/Projects/data-describe/examples/Data_Summary.ipynb",
"output_path": "/Users/richardtruong-chau/Projects/data-describe/examples/Data_Summary.ipynb",
"parameters": {},
"start_time": "2020-12-08T22:00:29.103117",
"version": "2.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}