Scatter Plots

[1]:
import pandas as pd
import data_describe as dd
[2]:
from sklearn.datasets import load_diabetes
data = load_diabetes()
df = pd.DataFrame(data.data, columns=list(data.feature_names))
df['target'] = data.target
df.shape
[2]:
(442, 11)
[3]:
df.head(2)
[3]:
age sex bmi bp s1 s2 s3 s4 s5 s6 target
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401 -0.002592 0.019908 -0.017646 151.0
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412 -0.039493 -0.068330 -0.092204 75.0

Scatterplot Matrix

[4]:
dd.scatter_plots(df, mode='matrix')
<seaborn.axisgrid.PairGrid at 0x21644629c88>
[4]:
data-describe Scatter Plot Widget
../_images/examples_scatter_plots_5_2.png

Show all plots

[5]:
df_subset = df.iloc[:, :3] # Avoid creating all the plots in this notebook
dd.scatter_plots(df_subset, mode='all')
[<seaborn.axisgrid.JointGrid at 0x216500e9848>,
 <seaborn.axisgrid.JointGrid at 0x216515810c8>,
 <seaborn.axisgrid.JointGrid at 0x2165165ad48>]
[5]:
data-describe Scatter Plot Widget
../_images/examples_scatter_plots_7_2.png
../_images/examples_scatter_plots_7_3.png
../_images/examples_scatter_plots_7_4.png

Show plots of interest using scatterplot diagnostics

Filter plots by a diagnostic

[6]:
dd.scatter_plots(df, mode='diagnostic', threshold={'Outlying': 0.5})
[<seaborn.axisgrid.JointGrid at 0x21651eabd48>]
[6]:
data-describe Scatter Plot Widget
../_images/examples_scatter_plots_9_2.png
[7]:
dd.scatter_plots(df, mode='diagnostic', threshold={'Striated': 0.9})
[<seaborn.axisgrid.JointGrid at 0x216545ca108>,
 <seaborn.axisgrid.JointGrid at 0x21656a9ee08>]
[7]:
data-describe Scatter Plot Widget
../_images/examples_scatter_plots_10_2.png
../_images/examples_scatter_plots_10_3.png