Feature Importance¶
[1]:
import pandas as pd
import data_describe as dd
from sklearn.naive_bayes import GaussianNB
[2]:
import warnings
warnings.simplefilter("ignore")
[3]:
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=list(data.feature_names))
df['target'] = data.target
df.head(1)
[3]:
mean radius | mean texture | mean perimeter | mean area | mean smoothness | mean compactness | mean concavity | mean concave points | mean symmetry | mean fractal dimension | ... | worst texture | worst perimeter | worst area | worst smoothness | worst compactness | worst concavity | worst concave points | worst symmetry | worst fractal dimension | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 17.99 | 10.38 | 122.8 | 1001.0 | 0.1184 | 0.2776 | 0.3001 | 0.1471 | 0.2419 | 0.07871 | ... | 17.33 | 184.6 | 2019.0 | 0.1622 | 0.6656 | 0.7119 | 0.2654 | 0.4601 | 0.1189 | 0 |
1 rows × 31 columns
Default¶
Pass in the data frame and the column name of the response variable
[4]:
dd.importance(df, 'target')
[4]:
Text(0.5, 1.0, 'Feature Importance')
Show negative importance¶
Use truncate=False
to disable removal of negative importance values.
[5]:
dd.importance(df, 'target', truncate=False)
[5]:
Text(0.5, 1.0, 'Feature Importance')
Getting the data values¶
You can get the importance values by setting return_values
to True
.
[6]:
dd.importance(df, 'target', return_values=True)
[6]:
array([0.00035398, 0.00561404, 0.00106195, 0. , 0.00069865,
0.00035709, 0.00033225, 0.00386586, 0. , 0. ,
0.00105884, 0.00176681, 0.00105263, 0.00247477, 0. ,
0. , 0. , 0. , 0. , 0. ,
0.00738705, 0.00597423, 0. , 0.00386896, 0.00069554,
0.00034467, 0. , 0.00525074, 0. , 0. ])
Alternate Model¶
You can also specify a different model type using the estimator
argument.
[7]:
dd.importance(df, 'target', estimator=GaussianNB())
[7]:
Text(0.5, 1.0, 'Feature Importance')