python中的变量注释

varcode的Python项目详细描述


Build StatusCoverage StatusPyPI

变量代码

varcode是一个在python中处理基因组变异数据并预测这些变异对蛋白质序列的影响的库。

安装

您可以使用pip

pip install varcode

您可以通过PyEnsembl安装所需的参考基因组数据,如下所示:

# Downloads and installs the Ensembl releases (75 and 76)
pyensembl install --release 7576

示例

importvarcode# Load TCGA MAF containing variants from theirvariants=varcode.load_maf("tcga-ovarian-cancer-variants.maf")print(variants)### <VariantCollection from 'tcga-ovarian-cancer-variants.maf' with 6428 elements>###  -- Variant(contig=1, start=69538, ref=G, alt=A, genome=GRCh37)###  -- Variant(contig=1, start=881892, ref=T, alt=G, genome=GRCh37)###  -- Variant(contig=1, start=3389714, ref=G, alt=A, genome=GRCh37)###  -- Variant(contig=1, start=3624325, ref=G, alt=T, genome=GRCh37)###  ...# you can index into a VariantCollection and get back a Variant objectvariant=variants[0]# groupby_gene_name returns a dictionary whose keys are gene names# and whose values are themselves VariantCollectionsgene_groups=variants.groupby_gene_name()# get variants which affect the TP53 geneTP53_variants=gene_groups["TP53"]# predict protein coding effect of every TP53 variant on# each transcript of the TP53 geneTP53_effects=TP53_variants.effects()print(TP53_effects)### <EffectCollection with 789 elements>### -- PrematureStop(variant=chr17 g.7574003G>A, transcript_name=TP53-001, transcript_id=ENST00000269305, effect_description=p.R342*)### -- ThreePrimeUTR(variant=chr17 g.7574003G>A, transcript_name=TP53-005, transcript_id=ENST00000420246)### -- PrematureStop(variant=chr17 g.7574003G>A, transcript_name=TP53-002, transcript_id=ENST00000445888, effect_description=p.R342*)### -- FrameShift(variant=chr17 g.7574030_7574030delG, transcript_name=TP53-001, transcript_id=ENST00000269305, effect_description=p.R333fs)### ...premature_stop_effect=TP53_effects[0]print(str(premature_stop_effect.mutant_protein_sequence))### 'MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMF'print(premature_stop_effect.aa_mutation_start_offset)### 341print(premature_stop_effect.transcript)### Transcript(id=ENST00000269305, name=TP53-001, gene_name=TP53, biotype=protein_coding, location=17:7571720-7590856)print(premature_stop_effect.gene.name)### 'TP53'

如果您正在寻找快速入门指南,可以查看演示varcode简单用例的this iPython book

效果类型

Effect typeDescription
AlternateStartCodonReplace annotated start codon with alternative start codon (e.g. "ATG>CAG").
ComplexSubstitutionInsertion and deletion of multiple amino acids.
DeletionCoding mutation which causes deletion of amino acid(s).
ExonLossDeletion of entire exon, significantly disrupts protein.
ExonicSpliceSiteMutation at the beginning or end of an exon, may affect splicing.
FivePrimeUTRVariant affects 5' untranslated region before start codon.
FrameShiftTruncationA frameshift which leads immediately to a stop codon (no novel amino acids created).
FrameShiftOut-of-frame insertion or deletion of nucleotides, causes novel protein sequence and often premature stop codon.
IncompleteTranscriptCan't determine effect since transcript annotation is incomplete (often missing either the start or stop codon).
InsertionCoding mutation which causes insertion of amino acid(s).
IntergenicOccurs outside of any annotated gene.
IntragenicWithin the annotated boundaries of a gene but not in a region that's transcribed into pre-mRNA.
IntronicSpliceSiteMutation near the beginning or end of an intron but less likely to affect splicing than donor/acceptor mutations.
IntronicVariant occurs between exons and is unlikely to affect splicing.
NoncodingTranscriptTranscript doesn't code for a protein.
PrematureStopInsertion of stop codon, truncates protein.
SilentMutation in coding sequence which does not change the amino acid sequence of the translated protein.
SpliceAcceptorMutation in the last two nucleotides of an intron, likely to affect splicing.
SpliceDonorMutation in the first two nucleotides of an intron, likely to affect splicing.
StartLossMutation causes loss of start codon, likely result is that an alternate start codon will be used down-stream (possibly in a different frame).
StopLossLoss of stop codon, causes extension of protein by translation of nucleotides from 3' UTR.
SubstitutionCoding mutation which causes simple substitution of one amino acid for another.
ThreePrimeUTRVariant affects 3' untranslated region after stop codon of mRNA.

坐标系

varcode目前使用一个“基本计数,一开始”基因组坐标系来匹配ensembl注释数据库。我们计划切换到“空间计数,零开始”(interbase)坐标,因为该系统允许更统一的逻辑(插入没有特殊情况)。要了解更多关于基因组坐标系的信息,请阅读本文blog post

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
SpringMVC中的java,当我遇到旋度时,SpringMVC中出现错误   java如何从设备获取默认ip地址?   plink运行autosys批处理作业并检查其在java中的状态   java Json数组对象通过控制器[Spring Boot]传递到模型   netbeans将java命令行参数传递给插件   java Android AIDL gen文件导致警告?   java JAXB阻止JAXB与共享实体序列化   由@JsonIdentityInfo序列化的对象的java反序列化   postgresql java数组插入postgres   Java圆环碰撞检测   在Java中提取JSON键名   jdk1中的java内存泄漏。7   java Spring 3@Autowired注释问题