{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7-final" }, "orig_nbformat": 2, "kernelspec": { "name": "Python 3.7.7 64-bit ('py37': conda)", "display_name": "Python 3.7.7 64-bit ('py37': conda)", "metadata": { "interpreter": { "hash": "8a09d050f863db8ad263dfeb2d0542738b0c5ac95d8f2f206711295a3f98cf5e" } } } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# Developer Guide" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Contributing Guide\n", "Work in progress!" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Recommended Python Setup\n", "\n", "We recommend the following setup for Python development when contributing to data-describe:\n", "\n", "- [VS Code](https://code.visualstudio.com/)\n", " - Extensions:\n", " - ms-python.python\n", " - njpwerner.autodocstring\n", "- Conda\n", " - [Miniconda](https://docs.conda.io/en/latest/miniconda.html)\n", "\n", "To create a new conda environment for development, run:\n", "\n", "```\n", "conda create -n test-env\n", "conda env update -n test-env -f etc/test-environment.yml\n", "```\n", "\n", "### Pre-commit\n", "[Pre-commit](https://pre-commit.com/) is also recommended for running linting and style checks. To install pre-commit:\n", "```\n", "pip install pre-commit\n", "pre-commit install\n", "```\n", "\n", "The checks will now run on commit.\n", "\n" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Optional Dependencies\n", "data-describe may make use of optional dependencies such as `nltk` or `modin`. When adding or using these optional dependencies in data-describe modules, the following patterns should be used:\n", "\n", "### `_requires` marks functionality that requires a dependency\n", "\n", "Use the `_requires` decorator on any object function or class that needs the optional dependency. Usage:\n", "\n", "```python\n", "from data_describe.compat import _requires\n", "\n", "@_requires(\"nltk\")\n", "def function_that_uses_nltk():\n", " return\n", "```\n", "\n", "`_requires` should generally take the top-level package name as its sole argument. See the section on *packages vs subpackages* for more information.\n", "\n", "### `_compat` is used to lazily import from dependencies\n", "\n", "Instead of having import statements at the top of the file, import and use the `_compat` object to use functionalities from the optional dependency:\n", "\n", "```python\n", "from data_describe.compat import _requires, _compat\n", "\n", "@_requires(\"nltk\")\n", "def function_that_uses_nltk_freqdist():\n", " _compat[\"nltk\"].FreqDist()\n", "```\n", "\n", "`_compat` should generally take the sub-package as its key in a dictionary-style access. See the section on *packages vs subpackages* for more information.\n", "\n", "### packages vs subpackages\n", "\n", "Some packages do not export all of their subpackages. For example, `import statsmodels` does not provide access to `statsmodels.graphics.tsaplots`, as the `graphics` subpackage is not exported.\n", "\n", "As a result, `_requires` generally takes the top-level package name, as this checks if the package itself is installed. In contrast, `_compat` takes the subpackage to enable imports.\n", "\n", "One exception to this are the google client libraries such as `google-cloud-storage` or `google-cloud-bigquery`. Each of these are installed individually, but they are organized as subpackages of the `google` namespace i.e. `google.cloud.storage`. In this case, `_requires` should instead be the specific subpackage (i.e. `_requires(\"google.cloud.storage\")`) since requiring only the `google` package is not specific enough. \n", "\n", "### Side imports\n", "\n", "Some packages require downloads of additional data or models to function. One example is the stopwords for `nltk`. Downloading of these resources is handled in `data_describe/compat/_dependency.py`. When adding a dependency that requires this download, adhere to the following steps:\n", "\n", "1. Add a function that takes the module as its sole argument, checks for the existence of the resource (i.e if it was already downloaded), and executes the download if it doesn't exist. \n", "2. Add this function to the module-import mapping used to initialize `_compat`:\n", "```\n", "_compat = DependencyManager(\n", " {\n", " \"nltk\": nltk_download,\n", " \"spacy\": spacy_download,\n", " # \"new_package\": downloader_function,\n", " }\n", ")\n", "```\n", "\n", "### Add to `extras_require` in setup.py\n", "\n", "The new dependency should be added to the `extras_require` in setup.py. If applicable, try to use existing tags over creating new ones. New tags should be alphabetical and short.\n", "\n", "### Add to conda environments\n", "\n", "The new dependency should be added to all conda environment definitions. These are located in two locations: `etc/*.yml` and `docker/*/*.yml`" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Docstring Checking\n", "\n", "[darglint](https://github.com/terrencepreilly/darglint) can optionally be used to check if docstrings are outdated. It hasn't been added to the pre-commit hooks because it can take a long time to parse.\n", "\n", "To run darglint, use the following command:\n", "\n", "```\n", "darglint -v 2 --strictness=short \n", "```" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Automatic Documentation\n", "Documentation is generated using Sphinx. To run the typical build process, use the provided conda environment definition at `etc/doc-environment.yml` and run `docs/make.py`.\n", "\n", "Note that `sphinx-multiversion` is used to build documentation for multiple versions of `data-describe` and only captures changes in tagged commits and/or the master branch on remote.\n", "\n", "### Notebook Update\n", "Example notebooks are stored in the `examples/` folder for easy access in the Github repository. Run `docs/update_notebook_docs.py` to copy these notebooks (if any updates) into the `docs/source` directory and update the index.\n", "\n", "### Manual Build\n", "To test a build of the Sphinx-generated documentation without using `sphinx-multiversion`, you can run the following command:\n", "\n", "```\n", "sphinx-build -a -E docs/source docs/build\n", "```\n", "\n", "The generated HTML files will be in `docs/build`" ], "cell_type": "markdown", "metadata": {} } ] }