data_describe.dimensionality_reduction.dimensionality_reduction

dim_reduc(data, n_components: int, dim_method: str, apply_tsvd: bool = True, compute_backend=None)

Reduces the number of dimensions of the input data.

run_pca(data, n_components, compute_backend=None)

Reduces the number of dimensions of the input data using PCA.

run_ipca(data, n_components, compute_backend=None)

Reduces the number of dimensions of the input data using Incremental PCA.

run_tsne(data, n_components, apply_tsvd=True, compute_backend=None)

Reduces the number of dimensions of the input data using t-SNE.

run_tsvd(data, n_components, compute_backend=None)

Reduces the number of dimensions of the input data using TSVD.

data_describe.dimensionality_reduction.dimensionality_reduction.dim_reduc(data, n_components: int, dim_method: str, apply_tsvd: bool = True, compute_backend=None)

Reduces the number of dimensions of the input data.

Parameters
  • data – The dataframe

  • n_components – Desired dimensionality for the data set prior to modeling

  • dim_method – {‘pca’, ‘ipca’, ‘tsne’, ‘tsvd’}

  • pca (-) – Principal Component Analysis

  • ipca (-) – Incremental Principal Component Analysis. Highly suggested for very large datasets

  • tsne (-) – T-distributed Stochastic Neighbor Embedding

  • tsvd (-) – Truncated Singular Value Decomposition

  • apply_tsvd – If True, TSVD will be run before t-SNE. This is highly recommended when running t-SNE

Returns

The dimensionally-reduced dataframe and reduction object

data_describe.dimensionality_reduction.dimensionality_reduction.run_pca(data, n_components, compute_backend=None)

Reduces the number of dimensions of the input data using PCA.

Parameters
  • data – The dataframe

  • n_components – Desired dimensionality for the data set prior to modeling

Returns

The dimensionally-reduced dataframe pca: The applied PCA object

Return type

reduc_df

data_describe.dimensionality_reduction.dimensionality_reduction.run_ipca(data, n_components, compute_backend=None)

Reduces the number of dimensions of the input data using Incremental PCA.

Parameters
  • data – The dataframe

  • n_components – Desired dimensionality for the data set prior to modeling

Returns

The dimensionally-reduced dataframe ipca: The applied IncrementalPCA object

Return type

reduc_df

data_describe.dimensionality_reduction.dimensionality_reduction.run_tsne(data, n_components, apply_tsvd=True, compute_backend=None)

Reduces the number of dimensions of the input data using t-SNE.

Parameters
  • data – The dataframe

  • n_components – Desired dimensionality for the output dataset

  • apply_tsvd – If True, TSVD will be run before t-SNE. This is highly recommended when running t-SNE

Returns

The dimensionally-reduced dataframe tsne: The applied t-SNE object

Return type

reduc_df

data_describe.dimensionality_reduction.dimensionality_reduction.run_tsvd(data, n_components, compute_backend=None)

Reduces the number of dimensions of the input data using TSVD.

Parameters
  • data – The dataframe

  • n_components – Desired dimensionality for the output dataset

Returns

The dimensionally-reduced dataframe tsne: The applied TSVD object

Return type

reduc_df