dynamo.pp.filter_cells_by_outliers

dynamo.pp.filter_cells_by_outliers(adata, filter_bool=None, layer='all', keep_filtered=False, min_expr_genes_s=50, min_expr_genes_u=25, min_expr_genes_p=1, max_expr_genes_s=inf, max_expr_genes_u=inf, max_expr_genes_p=inf, max_pmito_s=None, shared_count=None, spliced_key='spliced', unspliced_key='unspliced', protein_key='protein', obs_store_key='pass_basic_filter')[source]

Select valid cells based on a collection of filters including spliced, unspliced and protein min/max vals.

Parameters:
  • adata (AnnData) – an AnnData object.

  • filter_bool (Optional[ndarray] (default: None)) – a boolean array from the user to select cells for downstream analysis. Defaults to None.

  • layer (str (default: 'all')) – the layer (include X) used for feature selection. Defaults to “all”.

  • keep_filtered (bool (default: False)) – whether to keep cells that don’t pass the filtering in the adata object. Defaults to False.

  • min_expr_genes_s (int (default: 50)) – minimal number of genes with expression for a cell in the data from the spliced layer (also used for X). Defaults to 50.

  • min_expr_genes_u (int (default: 25)) – minimal number of genes with expression for a cell in the data from the unspliced layer. Defaults to 25.

  • min_expr_genes_p (int (default: 1)) – minimal number of genes with expression for a cell in the data from in the protein layer. Defaults to 1.

  • max_expr_genes_s (float (default: inf)) – maximal number of genes with expression for a cell in the data from the spliced layer (also used for X). Defaults to np.inf.

  • max_expr_genes_u (float (default: inf)) – maximal number of genes with expression for a cell in the data from the unspliced layer. Defaults to np.inf.

  • max_expr_genes_p (float (default: inf)) – maximal number of protein with expression for a cell in the data from the protein layer. Defaults to np.inf.

  • max_pmito_s (Optional[float] (default: None)) – maximal percentage of mitochondrial genes for a cell in the data from the spliced layer.

  • shared_count (Optional[int] (default: None)) – the minimal shared number of counts for each cell across genes between layers. Defaults to None.

  • spliced_key (default: 'spliced') – name of the layer storing spliced data. Defaults to “spliced”.

  • unspliced_key (default: 'unspliced') – name of the layer storing unspliced data. Defaults to “unspliced”.

  • protein_key (default: 'protein') – name of the layer storing protein data. Defaults to “protein”.

  • obs_store_key (default: 'pass_basic_filter') – name of the layer to store the filtered data. Defaults to “pass_basic_filter”.

Raises:

ValueError – the layer provided is invalid.

Return type:

AnnData

Returns:

An updated AnnData object indicating the selection of cells for downstream analysis. adata will be subsetted with only the cells pass filtering if keep_filtered is set to be False.