dynamo.pp.Preprocessor

class dynamo.pp.Preprocessor(collapse_species_adata_function=<function collapse_species_adata>, convert_gene_name_function=<function convert2symbol>, filter_cells_by_outliers_function=<function filter_cells_by_outliers>, filter_cells_by_outliers_kwargs={}, filter_genes_by_outliers_function=<function filter_genes_by_outliers>, filter_genes_by_outliers_kwargs={}, filter_cells_by_highly_variable_genes_function=<function filter_cells_by_highly_variable_genes>, filter_cells_by_highly_variable_genes_kwargs={}, normalize_by_cells_function=<function normalize>, normalize_by_cells_function_kwargs={}, size_factor_function=<function calc_sz_factor>, size_factor_kwargs={}, select_genes_function=<function select_genes_monocle>, select_genes_kwargs={}, normalize_selected_genes_function=None, normalize_selected_genes_kwargs={}, norm_method=<function log1p>, norm_method_kwargs={}, pca_function=<function pca>, pca_kwargs={}, gene_append_list=[], gene_exclude_list=[], force_gene_list=None, sctransform_kwargs={}, regress_out_kwargs={}, cell_cycle_score_enable=False, cell_cycle_score_kwargs={})[source]

Methods table

add_experiment_info(adata[, tkey, ...])

Infer the experiment type and experiment layers stored in the AnnData object and record the info in unstructured metadata (.uns).

config_monocle_pearson_residuals_recipe(adata)

Automatically configure the preprocessor for using the Monocle-Pearson-residuals style recipe.

config_monocle_recipe(adata[, n_top_genes])

Automatically configure the preprocessor for monocle recipe.

config_pearson_residuals_recipe(adata)

Automatically configure the preprocessor for using the Pearson residuals style recipe.

config_sctransform_recipe(adata)

Automatically configure the preprocessor for using the sctransform style recipe.

config_seurat_recipe(adata)

Automatically configure the preprocessor for using the seurat style recipe.

preprocess_adata(adata[, recipe, tkey, ...])

Preprocess the AnnData object with the recipe specified.

preprocess_adata_monocle(adata[, tkey, ...])

Preprocess the AnnData object based on Monocle style preprocessing recipe.

preprocess_adata_monocle_pearson_residuals(adata)

A combined pipeline of monocle and pearson_residuals.

preprocess_adata_pearson_residuals(adata[, ...])

A pipeline proposed in Pearson residuals (Lause, Berens & Kobak, 2021).

preprocess_adata_sctransform(adata[, tkey, ...])

Python implementation of https://github.com/satijalab/sctransform.

preprocess_adata_seurat(adata[, tkey, ...])

The preprocess pipeline in Seurat based on dispersion, implemented by dynamo authors.

preprocess_adata_seurat_wo_pca(adata[, ...])

Preprocess the anndata object according to standard preprocessing in Seurat recipe without PCA.

standardize_adata(adata, tkey, experiment_type)

Process the AnnData object to make it meet the standards of dynamo.

Methods

Preprocessor.add_experiment_info(adata, tkey=None, experiment_type=None)[source]

Infer the experiment type and experiment layers stored in the AnnData object and record the info in unstructured metadata (.uns).

Parameters:
  • adata (AnnData) – an AnnData object.

  • tkey (Optional[str] (default: None)) – the key for time information (labeling time period for the cells) in .obs. Defaults to None.

  • experiment_type (Optional[str] (default: None)) – the experiment type. If set to None, the experiment type would be inferred from the data.

Raises:

ValueError – the tkey is invalid.

Return type:

None

Preprocessor.config_monocle_pearson_residuals_recipe(adata)[source]

Automatically configure the preprocessor for using the Monocle-Pearson-residuals style recipe.

Useful when you want to use Pearson residual to obtain feature genes and perform PCA but also using the standard size-factor normalization and log1p analyses to normalize data for RNA velocity and vector field analyses.

Parameters:

adata (AnnData) – an AnnData object.

Return type:

None

Preprocessor.config_monocle_recipe(adata, n_top_genes=2000)[source]

Automatically configure the preprocessor for monocle recipe.

Parameters:
  • adata (AnnData) – an AnnData object.

  • n_top_genes (int (default: 2000)) – Number of top feature genes to select in the preprocessing step. Defaults to 2000.

Return type:

None

Preprocessor.config_pearson_residuals_recipe(adata)[source]

Automatically configure the preprocessor for using the Pearson residuals style recipe.

Parameters:

adata (AnnData) – an AnnData object.

Return type:

None

Preprocessor.config_sctransform_recipe(adata)[source]

Automatically configure the preprocessor for using the sctransform style recipe.

Parameters:

adata (AnnData) – an AnnData object.

Return type:

None

Preprocessor.config_seurat_recipe(adata)[source]

Automatically configure the preprocessor for using the seurat style recipe.

Parameters:

adata (AnnData) – an AnnData object.

Return type:

None

Preprocessor.preprocess_adata(adata, recipe='monocle', tkey=None, experiment_type=None)[source]

Preprocess the AnnData object with the recipe specified.

Parameters:
  • adata (AnnData) – An AnnData object.

  • recipe (Literal['monocle', 'seurat', 'sctransform', 'pearson_residuals', 'monocle_pearson_residuals'] (default: 'monocle')) – The recipe used to preprocess the data. Defaults to “monocle”.

  • tkey (Optional[str] (default: None)) – the key for time information (labeling time period for the cells) in .obs. Defaults to None.

  • experiment_type (Optional[str] (default: None)) – the experiment type of the data. If not provided, would be inferred from the data.

Raises:

NotImplementedError – the recipe is invalid.

Return type:

None

Preprocessor.preprocess_adata_monocle(adata, tkey=None, experiment_type=None)[source]

Preprocess the AnnData object based on Monocle style preprocessing recipe.

Parameters:
  • adata (AnnData) – an AnnData object.

  • tkey (Optional[str] (default: None)) – the key for time information (labeling time period for the cells) in .obs. Defaults to None.

  • experiment_type (Optional[str] (default: None)) – the experiment type of the data. If not provided, would be inferred from the data.

Return type:

None

Preprocessor.preprocess_adata_monocle_pearson_residuals(adata, tkey=None, experiment_type=None)[source]

A combined pipeline of monocle and pearson_residuals.

Results after running pearson_residuals can contain negative values, an undesired feature for later RNA velocity analysis. This function combine pearson_residual and monocle recipes so that it uses Pearson residual to obtain feature genes and perform PCA but also uses monocle recipe to generate X_spliced, X_unspliced, X_new, X_total or other data values for RNA velocity and downstream vector field analyses.

Parameters:
  • adata (AnnData) – an AnnData object

  • tkey (Optional[str] (default: None)) – the key for time information (labeling time period for the cells) in .obs. Defaults to None.

  • experiment_type (Optional[str] (default: None)) – the experiment type of the data. If not provided, would be inferred from the data.

Return type:

None

Preprocessor.preprocess_adata_pearson_residuals(adata, tkey=None, experiment_type=None)[source]

A pipeline proposed in Pearson residuals (Lause, Berens & Kobak, 2021).

Lause, J., Berens, P. & Kobak, D. Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data. Genome Biol 22, 258 (2021). https://doi.org/10.1186/s13059-021-02451-7

Parameters:
  • adata (AnnData) – an AnnData object

  • tkey (Optional[str] (default: None)) – the key for time information (labeling time period for the cells) in .obs. Defaults to None.

  • experiment_type (Optional[str] (default: None)) – the experiment type of the data. If not provided, would be inferred from the data. Defaults to None.

Return type:

None

Preprocessor.preprocess_adata_sctransform(adata, tkey=None, experiment_type=None)[source]

Python implementation of https://github.com/satijalab/sctransform.

Hao and Hao et al. Integrated analysis of multimodal single-cell data. Cell (2021)

Parameters:
  • adata (AnnData) – an AnnData object

  • tkey (Optional[str] (default: None)) – the key for time information (labeling time period for the cells) in .obs. Defaults to None.

  • experiment_type (Optional[str] (default: None)) – the experiment type of the data. If not provided, would be inferred from the data. Defaults to None.

Return type:

None

Preprocessor.preprocess_adata_seurat(adata, tkey=None, experiment_type=None)[source]

The preprocess pipeline in Seurat based on dispersion, implemented by dynamo authors.

Stuart and Butler et al. Comprehensive Integration of Single-Cell Data. Cell (2019) Butler et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol

Parameters:
  • adata (AnnData) – an AnnData object

  • tkey (Optional[str] (default: None)) – the key for time information (labeling time period for the cells) in .obs. Defaults to None.

  • experiment_type (Optional[str] (default: None)) – the experiment type of the data. If not provided, would be inferred from the data.

Return type:

None

Preprocessor.preprocess_adata_seurat_wo_pca(adata, tkey=None, experiment_type=None)[source]

Preprocess the anndata object according to standard preprocessing in Seurat recipe without PCA. This can be used to test different dimension reduction methods.

Return type:

None

Preprocessor.standardize_adata(adata, tkey, experiment_type)[source]

Process the AnnData object to make it meet the standards of dynamo.

The index of the observations would be ensured to be unique. The layers with sparse matrix would be converted to compressed csr_matrix. DKM.allowed_layer_raw_names() will be used to define only_splicing, only_labeling and splicing_labeling keys. The genes would be renamed to their official name.

Parameters:
  • adata (AnnData) – an AnnData object.

  • tkey (str) – the key for time information (labeling time period for the cells) in .obs.

  • experiment_type (str) – the experiment type.

Return type:

None