未提供项目说明
parc的Python项目详细描述
PARC公司
PARC,“加速精细社区划分的表型划分”是一种快速、自动化、基于组合图的聚类方法,它将层次图构造(HNSW)和数据驱动的图修剪与新的莱顿社区检测算法相结合。在
入门
使用pip安装
conda create --name ParcEnv pip // (optional)
pip install parc // tested on linux
通过克隆存储库并运行设置.py
^{pr2}$如果需要,请单独安装依赖项
pip安装leidenalg、igraph和hnswlib
示例用法1。(小测试集).sklearn的虹膜和数字数据集
from parc import PARC
import matplotlib.pyplot as plt
from sklearn import datasets
// load sample IRIS data
//data (n_obs x k_dim, 150x4)
iris = datasets.load_iris()
X = iris.data
y=iris.target
plt.scatter(X[:,0],X[:,1], c = y) // colored by 'ground truth'
plt.show()
Parc1 = parc.PARC(X,y) // instantiate PARC
Parc1.run_PARC() // run the clustering
parc_labels = Parc1.labels
# View scatterplot colored by PARC labels
plt.scatter(X[:, 0], X[:, 1], c=parc_labels)
plt.show()
// load sample digits data
digits = datasets.load_digits()
X = digits.data // (n_obs x k_dim, 1797x64)
y = digits.target
Parc2 = parc.PARC(X,y, jac_std_global='median') // 'median' is default pruning level
Parc2.run_PARC()
parc_labels = Parc2.labels
示例用法2。(中等规模scRNA序列):10倍PBMC(Zheng等人,2017年)
import PARC
import csv
## load data (50 PCs of filtered gene matrix pre-processed as per Zheng et al. 2017)
X = csv.reader(open("'./pca50_pbmc68k.txt", 'rt'),delimiter = ",")
X = np.array(list(X)) // (n_obs x k_dim, 68579 x 50)
X = X.astype("float")
// OR with pandas as: X = pd.read_csv("'./pca50_pbmc68k.txt").values.astype("float")
y = [] // annotations
with open('/annotations_zhang.txt', 'rt') as f:
for line in f: y.append(line.strip().replace('\"', ''))
// OR with pandas as: y = list(pd.read_csv('./data/zheng17_annotations.txt', header=None)[0])
parc1 = parc.PARC(X,y) // instantiate PARC
parc1.run_PARC() // run the clustering
parc_labels = parc1.labels
tsne注释和PARC聚类图
示例用法3。10X PBMC(Zheng等人,2017),整合了稀疏管道
pip install scanpy
import scanpy.api as sc
import pandas as pd
//load data
path = './data/zheng17_filtered_matrices_mex/hg19/'
adata = sc.read(path + 'matrix.mtx', cache=True).T # transpose the data
adata.var_names = pd.read_csv(path + 'genes.tsv', header=None, sep='\t')[1]
adata.obs_names = pd.read_csv(path + 'barcodes.tsv', header=None)[0]
// annotations as per correlation with pure samples
annotations = list(pd.read_csv('./data/zheng17_annotations.txt', header=None)[0])
adata.obs['annotations'] = pd.Categorical(annotations)
//pre-process as per Zheng et al., and take first 50 PCs for analysis
sc.pp.recipe_zheng17(adata)
sc.tl.pca(adata, n_comps=50)
parc1 = parc.PARC(adata2.obsm['X_pca'], annotations)
parc_labels = parc1.labels
adata2.obs["PARC"] = pd.Categorical(parc_labels)
//visualize
sc.pl.umap(adata, color='annotations')
sc.pl.umap(adata, color='PARC')
示例用法4。大规模(70K亚群和1.1M细胞)肺癌细胞(基于多原子成像细胞术的特征)
normalized image-based feature matrix 70K cells
Lung Cancer cells annotation 70K cells
1.1M cell features and annotations
import PARC
import pandas as pd
// load data: digital mix of 7 cell lines from 7 sets of pure samples (1.1M cells x 26 features)
X = pd.read_csv("'./LungData.txt").values.astype("float")
y = list(pd.read_csv('./LungData_annotations.txt', header=None)[0]) // list of cell-type annotations
// run PARC
parc1 = parc.PARC(X, y)
parc_labels = parc1.labels
tsne注释和PARC聚类图,特性热图
对依赖项的引用
- 莱顿(pip安装leidenalg)(V.A.Traag,2019年)doi.org/10.1038/s41598-019-41695-z)
- hsnwlib Malkov,Yu A.和D.A.Yashunin。”使用分层可导航小世界图的高效和健壮的近似近邻搜索。“TPAMI,预印本:https://arxiv.org/abs/1603.09320
- 测谎仪(igraph.org/python/)在
引用
如果您发现这段代码对您的工作有用,请考虑引用本文PARC:ultrafast and accurate clustering of phenotypic data of millions of single cells
- 项目
标签: