Sensitive Data¶
[1]:
import datetime
import pandas as pd
from data_describe.privacy.detection import sensitive_data
Create Sample Profile¶
[2]:
sample_profile = {
"company": {
0: "Fisher, Green and Dixon",
1: "Lawrence, Herring and Riley",
2: "Thompson-Ruiz",
3: "Sloan PLC",
4: "Smith LLC",
5: "Nolan, Meyers and Johnson",
},
"ssn": {
0: "415-39-7809",
1: "462-64-5856",
2: "420-73-8333",
3: "119-33-2186",
4: "532-38-7349",
5: "152-33-9873",
},
"residence": {
0: "24219 Archer Mountain Suite 924\nNorth Melissaborough, LA 41945",
1: "5330 Wilson Fields Suite 560\nEast Heiditown, VA 70519",
2: "PSC 5642, Box 8071\nAPO AA 06490",
3: "1240 Jamie Forks Apt. 590\nAlistad, NY 60619",
4: "PSC 9361, Box 5349\nAPO AP 57022",
5: "7118 Williams Flat Apt. 075\nOwenhaven, LA 50600",
},
"website": {
0: [
"http://sellers.com/",
"https://garrett.com/",
"https://stark.net/",
"http://kaiser.org/",
],
1: ["https://wood-hooper.com/", "http://martinez.net/"],
2: [
"http://www.arroyo-schultz.biz/",
"https://www.curtis-smith.com/",
"http://www.gray-hutchinson.com/",
"http://www.barnes.com/",
],
3: ["http://hernandez.info/", "https://www.williams-martin.org/"],
4: ["https://www.wilson.com/"],
5: ["https://mooney.com/"],
},
"username": {
0: "sandraharris",
1: "jeffreylucas",
2: "karla07",
3: "johnwilliams",
4: "amyhernandez",
5: "eprice",
},
"name": {
0: "Doris Martinez",
1: "Jeffery Garcia",
2: "Kelsey Freeman",
3: "Kimberly Carter",
4: "Charles Gonzalez",
5: "Roger Olson",
},
"address": {
0: "19659 Ivan Stravenue Apt. 471\nLake Nancyside, VT 71358",
1: "0916 Michael Row\nSellersville, WI 08109",
2: "63812 Morales Ranch Apt. 300\nLowestad, NM 26520",
3: "65461 Regina Mall Suite 517\nSouth Benjaminborough, DE 22331",
4: "Unit 7296 Box 6875\nDPO AP 65859",
5: "0248 Cook Mews Apt. 466\nBrownfurt, IN 44282",
},
"mail": {
0: "mary84@yahoo.com",
1: "imoore@yahoo.com",
2: "yramirez@gmail.com",
3: "nicholas11@hotmail.com",
4: "nancy89@hotmail.com",
5: "johnsonrobert@yahoo.com",
},
"birthdate": {
0: datetime.date(1936, 7, 5),
1: datetime.date(1920, 5, 30),
2: datetime.date(1958, 6, 13),
3: datetime.date(1931, 5, 31),
4: datetime.date(1905, 10, 12),
5: datetime.date(1986, 5, 21),
}
}
[3]:
df = pd.DataFrame(sample_profile)
df.head(2)
[3]:
company | ssn | residence | website | username | name | address | birthdate | ||
---|---|---|---|---|---|---|---|---|---|
0 | Fisher, Green and Dixon | 415-39-7809 | 24219 Archer Mountain Suite 924\nNorth Melissa... | [http://sellers.com/, https://garrett.com/, ht... | sandraharris | Doris Martinez | 19659 Ivan Stravenue Apt. 471\nLake Nancyside,... | mary84@yahoo.com | 1936-07-05 |
1 | Lawrence, Herring and Riley | 462-64-5856 | 5330 Wilson Fields Suite 560\nEast Heiditown, ... | [https://wood-hooper.com/, http://martinez.net/] | jeffreylucas | Jeffery Garcia | 0916 Michael Row\nSellersville, WI 08109 | imoore@yahoo.com | 1920-05-30 |
Redact sensitive data¶
[4]:
sensitive_data(df, mode='redact', sample_size=len(df))
UserWarning: The Dask Engine for Modin is experimental.
company | ssn | residence | website | username | name | address | birthdate | ||
---|---|---|---|---|---|---|---|---|---|
0 | Fisher, Green and <PERSON> | <US_SSN> | 24219 Archer Mountain Suite 924\nNorth Melissa... | ['http://<DOMAIN_NAME>/', 'https://<DOMAIN_NAM... | <PERSON> | <PERSON> | <DATE_TIME> <PERSON> Apt. 471\nLake Nancyside,... | <EMAIL_ADDRESS> | <DATE_TIME> |
1 | <PERSON>, <PERSON> and <PERSON> | <US_SSN> | 5330 <PERSON> Fields Suite 560\nEast Heiditown... | ['https://<DOMAIN_NAME>/', 'http://<DOMAIN_NAM... | <PERSON> | <PERSON> | 0916 <PERSON> Row\n<US_DRIVER_LICENSE>, <LOCAT... | <EMAIL_ADDRESS> | <DATE_TIME> |
2 | <PERSON> | <US_SSN> | PSC 5642, Box <DATE_TIME>\nAPO AA 06490 | ['http://<DOMAIN_NAME>/', 'https://<DOMAIN_NAM... | karla07 | <PERSON> | 63812 <PERSON> Ranch Apt. 300\nLowestad, NM <D... | <EMAIL_ADDRESS> | <DATE_TIME> |
3 | Sloan PLC | <US_SSN> | <PERSON> Apt. 590\nAlistad, <LOCATION> <DATE_T... | ['http://<DOMAIN_NAME>/', 'https://<DOMAIN_NAM... | johnwilliams | <PERSON> | 65461 Regina Mall Suite 517\nSouth Benjaminbor... | <EMAIL_ADDRESS> | <DATE_TIME> |
4 | Smith LLC | <US_SSN> | PSC 9361, Box 5349\nAPO AP 57022 | ['https://<DOMAIN_NAME>/'] | <PERSON> | <PERSON> | Unit <DATE_TIME> Box 6875\nDPO AP 65859 | <EMAIL_ADDRESS> | <DATE_TIME>-12 |
5 | <PERSON>, <PERSON> and Johnson | <US_SSN> | <PERSON>. 075\nOwenhaven, <LOCATION> <DATE_TIME> | ['https://<DOMAIN_NAME>/'] | <PERSON> | <PERSON> | <DATE_TIME> Cook Mews Apt. 466\nBrownfurt, IN ... | <EMAIL_ADDRESS> | <DATE_TIME> |
[4]:
<data_describe.privacy.detection.SensitiveDataWidget at 0x2a9bfb15308>
Redact sensitive data using selected columns¶
[5]:
sensitive_data(df, mode="redact", columns=["birthdate", "mail", "ssn"], sample_size=len(df), detect_infotypes=True)
INFO:presidio:nlp_engine not provided. Creating new SpacyNlpEngine instance
INFO:presidio:Loading NLP model: spaCy en_core_web_lg
INFO:presidio:
===================== Info about model 'en_core_web_lg' =====================
INFO:presidio:
lang en
name core_web_lg
license MIT
author Explosion
url https://explosion.ai
email contact@explosion.ai
description English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.
sources [{'name': 'OntoNotes 5', 'url': 'https://catalog.ldc.upenn.edu/LDC2013T19', 'license': 'commercial (licensed by Explosion)'}, {'name': 'Common Crawl'}]
pipeline ['tagger', 'parser', 'ner']
version 2.2.0
spacy_version >=2.2.0
parent_package spacy
labels {'tagger': ['$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', 'XX', '_SP', '``'], 'parser': ['ROOT', 'acl', 'acomp', 'advcl', 'advmod', 'agent', 'amod', 'appos', 'attr', 'aux', 'auxpass', 'case', 'cc', 'ccomp', 'compound', 'conj', 'csubj', 'csubjpass', 'dative', 'dep', 'det', 'dobj', 'expl', 'intj', 'mark', 'meta', 'neg', 'nmod', 'npadvmod', 'nsubj', 'nsubjpass', 'nummod', 'oprd', 'parataxis', 'pcomp', 'pobj', 'poss', 'preconj', 'predet', 'prep', 'prt', 'punct', 'quantmod', 'relcl', 'xcomp'], 'ner': ['CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART']}
vectors {'width': 300, 'vectors': 684831, 'keys': 684830, 'name': 'en_core_web_lg.vectors'}
source C:\Users\David\.conda\envs\test-env\lib\site-packages\en_core_web_lg
INFO:presidio:Printing spaCy model and package details:
{'lang': 'en', 'name': 'core_web_lg', 'license': 'MIT', 'author': 'Explosion', 'url': 'https://explosion.ai', 'email': 'contact@explosion.ai', 'description': 'English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.', 'sources': [{'name': 'OntoNotes 5', 'url': 'https://catalog.ldc.upenn.edu/LDC2013T19', 'license': 'commercial (licensed by Explosion)'}, {'name': 'Common Crawl'}], 'pipeline': ['tagger', 'parser', 'ner'], 'version': '2.2.0', 'spacy_version': '>=2.2.0', 'parent_package': 'spacy', 'accuracy': {'las': 90.1644260278, 'uas': 91.9835496082, 'token_acc': 99.7579930934, 'tags_acc': 97.2056522464, 'ents_f': 86.3045056111, 'ents_p': 86.217859334, 'ents_r': 86.391326217}, 'speed': {'cpu': 7127.1086034688, 'gpu': None, 'nwords': 291314}, 'labels': {'tagger': ['$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', 'XX', '_SP', '``'], 'parser': ['ROOT', 'acl', 'acomp', 'advcl', 'advmod', 'agent', 'amod', 'appos', 'attr', 'aux', 'auxpass', 'case', 'cc', 'ccomp', 'compound', 'conj', 'csubj', 'csubjpass', 'dative', 'dep', 'det', 'dobj', 'expl', 'intj', 'mark', 'meta', 'neg', 'nmod', 'npadvmod', 'nsubj', 'nsubjpass', 'nummod', 'oprd', 'parataxis', 'pcomp', 'pobj', 'poss', 'preconj', 'predet', 'prep', 'prt', 'punct', 'quantmod', 'relcl', 'xcomp'], 'ner': ['CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART']}, 'vectors': {'width': 300, 'vectors': 684831, 'keys': 684830, 'name': 'en_core_web_lg.vectors'}, 'source': 'C:\\Users\\David\\.conda\\envs\\test-env\\lib\\site-packages\\en_core_web_lg'}
INFO:presidio:Recognizer registry not provided. Creating default RecognizerRegistry instance
INFO:presidio:Loaded recognizer: CreditCardRecognizer
INFO:presidio:Loaded recognizer: CryptoRecognizer
INFO:presidio:Loaded recognizer: DomainRecognizer
INFO:presidio:Loaded recognizer: EmailRecognizer
INFO:presidio:Loaded recognizer: IbanRecognizer
INFO:presidio:Loaded recognizer: IpRecognizer
INFO:presidio:Loaded recognizer: NhsRecognizer
INFO:presidio:Loaded recognizer: UsBankRecognizer
INFO:presidio:Loaded recognizer: UsLicenseRecognizer
INFO:presidio:Loaded recognizer: UsItinRecognizer
INFO:presidio:Loaded recognizer: UsPassportRecognizer
INFO:presidio:Loaded recognizer: UsPhoneRecognizer
INFO:presidio:Loaded recognizer: UsSsnRecognizer
INFO:presidio:Loaded recognizer: SpacyRecognizer
INFO:presidio:Loaded recognizer: SgFinRecognizer
birthdate | ssn | ||
---|---|---|---|
0 | <DATE_TIME> | <EMAIL_ADDRESS> | <US_SSN> |
1 | <DATE_TIME> | <EMAIL_ADDRESS> | <US_SSN> |
2 | <DATE_TIME> | <EMAIL_ADDRESS> | <US_SSN> |
3 | <DATE_TIME> | <EMAIL_ADDRESS> | <US_SSN> |
4 | <DATE_TIME>-12 | <EMAIL_ADDRESS> | <US_SSN> |
5 | <DATE_TIME> | <EMAIL_ADDRESS> | <US_SSN> |
[5]:
<data_describe.privacy.detection.SensitiveDataWidget at 0x2aa0d48a5c8>
Encrypt Data¶
[6]:
# encrypt with SHA256
sensitivewidget = sensitive_data(df, mode="encrypt", sample_size=len(df))
sensitivewidget
INFO:presidio:nlp_engine not provided. Creating new SpacyNlpEngine instance
INFO:presidio:Loading NLP model: spaCy en_core_web_lg
INFO:presidio:
===================== Info about model 'en_core_web_lg' =====================
INFO:presidio:
lang en
name core_web_lg
license MIT
author Explosion
url https://explosion.ai
email contact@explosion.ai
description English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.
sources [{'name': 'OntoNotes 5', 'url': 'https://catalog.ldc.upenn.edu/LDC2013T19', 'license': 'commercial (licensed by Explosion)'}, {'name': 'Common Crawl'}]
pipeline ['tagger', 'parser', 'ner']
version 2.2.0
spacy_version >=2.2.0
parent_package spacy
labels {'tagger': ['$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', 'XX', '_SP', '``'], 'parser': ['ROOT', 'acl', 'acomp', 'advcl', 'advmod', 'agent', 'amod', 'appos', 'attr', 'aux', 'auxpass', 'case', 'cc', 'ccomp', 'compound', 'conj', 'csubj', 'csubjpass', 'dative', 'dep', 'det', 'dobj', 'expl', 'intj', 'mark', 'meta', 'neg', 'nmod', 'npadvmod', 'nsubj', 'nsubjpass', 'nummod', 'oprd', 'parataxis', 'pcomp', 'pobj', 'poss', 'preconj', 'predet', 'prep', 'prt', 'punct', 'quantmod', 'relcl', 'xcomp'], 'ner': ['CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART']}
vectors {'width': 300, 'vectors': 684831, 'keys': 684830, 'name': 'en_core_web_lg.vectors'}
source C:\Users\David\.conda\envs\test-env\lib\site-packages\en_core_web_lg
INFO:presidio:Printing spaCy model and package details:
{'lang': 'en', 'name': 'core_web_lg', 'license': 'MIT', 'author': 'Explosion', 'url': 'https://explosion.ai', 'email': 'contact@explosion.ai', 'description': 'English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.', 'sources': [{'name': 'OntoNotes 5', 'url': 'https://catalog.ldc.upenn.edu/LDC2013T19', 'license': 'commercial (licensed by Explosion)'}, {'name': 'Common Crawl'}], 'pipeline': ['tagger', 'parser', 'ner'], 'version': '2.2.0', 'spacy_version': '>=2.2.0', 'parent_package': 'spacy', 'accuracy': {'las': 90.1644260278, 'uas': 91.9835496082, 'token_acc': 99.7579930934, 'tags_acc': 97.2056522464, 'ents_f': 86.3045056111, 'ents_p': 86.217859334, 'ents_r': 86.391326217}, 'speed': {'cpu': 7127.1086034688, 'gpu': None, 'nwords': 291314}, 'labels': {'tagger': ['$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', 'XX', '_SP', '``'], 'parser': ['ROOT', 'acl', 'acomp', 'advcl', 'advmod', 'agent', 'amod', 'appos', 'attr', 'aux', 'auxpass', 'case', 'cc', 'ccomp', 'compound', 'conj', 'csubj', 'csubjpass', 'dative', 'dep', 'det', 'dobj', 'expl', 'intj', 'mark', 'meta', 'neg', 'nmod', 'npadvmod', 'nsubj', 'nsubjpass', 'nummod', 'oprd', 'parataxis', 'pcomp', 'pobj', 'poss', 'preconj', 'predet', 'prep', 'prt', 'punct', 'quantmod', 'relcl', 'xcomp'], 'ner': ['CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART']}, 'vectors': {'width': 300, 'vectors': 684831, 'keys': 684830, 'name': 'en_core_web_lg.vectors'}, 'source': 'C:\\Users\\David\\.conda\\envs\\test-env\\lib\\site-packages\\en_core_web_lg'}
INFO:presidio:Recognizer registry not provided. Creating default RecognizerRegistry instance
INFO:presidio:Loaded recognizer: CreditCardRecognizer
INFO:presidio:Loaded recognizer: CryptoRecognizer
INFO:presidio:Loaded recognizer: DomainRecognizer
INFO:presidio:Loaded recognizer: EmailRecognizer
INFO:presidio:Loaded recognizer: IbanRecognizer
INFO:presidio:Loaded recognizer: IpRecognizer
INFO:presidio:Loaded recognizer: NhsRecognizer
INFO:presidio:Loaded recognizer: UsBankRecognizer
INFO:presidio:Loaded recognizer: UsLicenseRecognizer
INFO:presidio:Loaded recognizer: UsItinRecognizer
INFO:presidio:Loaded recognizer: UsPassportRecognizer
INFO:presidio:Loaded recognizer: UsPhoneRecognizer
INFO:presidio:Loaded recognizer: UsSsnRecognizer
INFO:presidio:Loaded recognizer: SpacyRecognizer
INFO:presidio:Loaded recognizer: SgFinRecognizer
company | ssn | residence | website | username | name | address | birthdate | ||
---|---|---|---|---|---|---|---|---|---|
0 | Fisher, Green and 7e0b7a468d5cbb313ac238ae4942... | 9fc2a689acf5f89d888c7acb3d43add7c4ddb320ca0fe0... | 24219 Archer Mountain Suite 924\nNorth Melissa... | ['http://7f6bcdcd8eb28186cd2bd93cdd66bf2abba0f... | 83238faab748c22d8d84203bae0628716e31bd00891ecc... | 779b878ad9f68899e5de5daac8322dd5b3a3991b10e3ed... | 26f70e334556abc2841f5312a9984aa9a7f21b1924015d... | b43ca3c683dabab5b42fee6181836eeea80dad950c1428... | bd4b000796139f1fe01d62a9e6ccc8d50c8ca861fedd9c... |
1 | 76364d81ab4c1832d9181572cc0d408f7bdd3900a61113... | 567e4a168b80d74c435b22ec74bf56f52d1480372ee3db... | 5330 fb7961d139e4da12af18c24571a166fb77c391a15... | ['https://b5e885e576b91d7f332cecd36e1078e1c8ff... | cf7bfbcaf80510fb918e37ad52ff7e9b59e86308a7e9c5... | b79d0309ea6eb45b8e8dee995c996184641de208262b0b... | 0916 f089eaef57aba315bc0e1455985c0c8e40c247f07... | 42ea5e56df211dbed485dc611277fe0932bb3139c4815e... | 7e7bebf19314f96de0b1231c7c89627c3a87dd110c2614... |
2 | 4bf5f01b21094aceb8d03ea384cc90e0c163898704dc2b... | 06f11d164b97a22cd2a3a2a1a1b6df50268a8a21ee8a97... | PSC 5642, Box e58bb6dcf3948e6c37eefaf185c769e0... | ['http://7aad9afb51f77260b55915f74a5498b5850cc... | karla07 | 08471f7bf132fdfb2a1a9ad7529fca6942425dc9d32834... | 63812 6f005a18689861aa2634b7a3155a148a56fcd272... | 4ca073222470748ed4fb77ad0011402af414dd561cd645... | 1b739ba212c2fec7848fbe6cffeffeb2f3737f40748542... |
3 | Sloan PLC | e7ed1ca0c627c7e1928f4a969225b761133dfb3d9f687f... | 2ebf4a7fdd9a4209fd1b52d9b672bccbf65e196f62169f... | ['http://587d2a51617478f91f145b2b261629d100768... | johnwilliams | 29b8f8f54a0405dc20c84f25d2344ddb3fa644459453c1... | 65461 Regina Mall Suite 517\nSouth Benjaminbor... | 14080d914340529a7af82567e749c5443108be5b9b945e... | 8ce998bb9305cb1a1f92f8ce0b17261280cee95c3b4ec6... |
4 | Smith LLC | 6c2264d952abb164d584bb7eba89cf41f94fbba413208c... | PSC 9361, Box 5349\nAPO AP 57022 | ['https://68bd3a706676d6fbfffbde3da0c7da6164da... | 66565c13d4869eda2a02c506c1b3d7e65c769ac209bd00... | 070f7870e5befe421c6dd0e1fd941a34d877b6b4758b90... | Unit 63bfc4219c16f9ba618b6b76d81edef58b912ca65... | 66d6d0dac0ba4a8f0e415c24d87448f6d100cb11083f18... | e80ef1d1d41d0b4b0bc14a77236ee6018cc31db6e9aa07... |
5 | 71f90b7c03aad530eafa769b2ad97cb333ca6dd455fce1... | 2d982e60214173e9d9dcace8376a3b33e224374465e289... | 824d23478ea9d0d571699e83112a567183dbfde775f2f4... | ['https://3328a0e4658c855acde71f8d85e98eac5d63... | eb0f1bd37f4ba7e4ecb33aeec2423538e088530eff1cea... | 3b4cef0bb0914adbe31e209262f261589133e5a90f1788... | 912b3255f7a8016fbf71bd4b2f7ffb646945200bb4b63c... | 731feae7800d583626f6fe1ffda859e274ba4f7813c223... | a8f11f485ed632115deb5cccd3d251ec901c445c6896d7... |
[6]:
<data_describe.privacy.detection.SensitiveDataWidget at 0x2aa6fb3a888>
Identify infotypes in each column¶
[7]:
sensitivewidget.infotypes
[7]:
{'company': ['PERSON'],
'ssn': ['US_SSN'],
'residence': ['DATE_TIME', 'LOCATION', 'PERSON'],
'website': ['DOMAIN_NAME', 'PERSON'],
'username': ['PERSON'],
'name': ['PERSON'],
'address': ['DATE_TIME', 'LOCATION', 'PERSON', 'US_DRIVER_LICENSE'],
'mail': ['DOMAIN_NAME', 'EMAIL_ADDRESS'],
'birthdate': ['DATE_TIME']}