- Project Name: Single Cell Annotation by cellidentifierdx.
- Description: Annotate your single cell using cellidentifierd.
- Installation: Instructions for how to install your project.
- Usage: Instructions for how to use your project after it's installed.
- License: The license your project is distributed under.
- Note: This software package is yet to be published (unpublished)! If you have used it for publication purpose, please share authorship with MD. BABU MIA, PHD; ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI. Contact for detials: [email protected]
CellIdentifierDX is a Python package for identifying cell types using Bayesian scoring.
You can install CellIdentifierDX using pip:
pip install cellidentifierdxHere's a basic example of how to use --> Extract 250 genes per cluster: #getting to gene lists - step 2 continue import os import numpy as np import pandas as pd import anndata import scanpy as sc import scvi import scipy.io import matplotlib.pyplot as plt
sc.tl.rank_genes_groups(combined_adata, groupby="leiden_scvi", key_added="rank_genes", method="t-test_overestim_var", n_genes=250)
marker_genes_250 = pd.DataFrame(combined_adata.uns['rank_genes']['names']).head(250) marker_scores_250 = pd.DataFrame(combined_adata.uns['rank_genes']['scores']).head(250) marker_logfoldchanges_250 = pd.DataFrame(combined_adata.uns['rank_genes']['logfoldchanges']).head(250) marker_pvals_250 = pd.DataFrame(combined_adata.uns['rank_genes']['pvals']).head(250) marker_pvals_adj_250 = pd.DataFrame(combined_adata.uns['rank_genes']['pvals_adj']).head(250)
melted_dataframes = [] for cluster_idx in marker_genes_250.columns: cluster_df = pd.DataFrame( { "Cluster": cluster_idx, "Gene": marker_genes_250[cluster_idx], "Score": marker_scores_250[cluster_idx], "Log2FoldChange": marker_logfoldchanges_250[cluster_idx], "P-value": marker_pvals_250[cluster_idx], "Adjusted P-value": marker_pvals_adj_250[cluster_idx], } ) melted_dataframes.append(cluster_df)
marker_genes_info_250 = pd.concat(melted_dataframes, axis=0, ignore_index=True)
dataset_identifier = "dataset name"
marker_genes_info_250.to_csv(f"pathtosave/{dataset_identifier}_marker_genes_details_250.csv", index=False)
marker_genes_250.to_csv(f"pathtosave/{dataset_identifier}_marker_genes_only_250.csv", index=False)
--> CellIdentifierDX:
from cellidentifierdx import annotate_cell_types
reference_file = "/path/to/your/reference_file.xlsx"
expr_file = "/path/to/your/expr_file.csv"
tissue, cell_type_annotations = annotate_cell_types(reference_file, expr_file, 'sheetname')
for cluster_idx, cell_type, avg_score in cell_type_annotations:
print(f"Cluster {cluster_idx} is annotated as {cell_type} with average Bayesian score of {avg_score}.")MIT