从变异目录中提取变异签名

sigproextractor的Python项目详细描述


DocsLicenseBuild Status

sigprofilerextractor

sigprofilerextractor允许从以矩阵格式生成的数据中重新提取变异签名。 该工具识别了操作变异信号的数量、它们在每个样本中的活动以及概率。 对于每一个导致癌症样本中特定突变类型的特征。该工具利用sigprofilermatrix生成器 以及SigProfilerPlotting。

安装

在命令行中,请键入以下行:

$pip install sigproextractor

从命令行/终端安装所需的参考基因组,如下所示(可用的参考基因组为:grch37、grch38、mm9和mm10):

$ python
>> from SigProfilerMatrixGenerator import install as genInstall
>> genInstall.install('GRCh37')

这将安装人类37号染色体作为参考基因组。你可以安装任意数量的基因组。

打开一个python解释器并导入SigProfilerExtractor模块请参阅函数的示例

功能

导入数据

Imports the path of example data.

importdata(datatype="matobj")

Example: 
-------
>>> from sigproextractor import sigpro as sig
>>> path_to_example_table = sig.importdata("table")
>>> data = path_to_example_table 
This "data" variable can be used as a parameter of the "project" argument of the sigProfilerExtractor function.

To get help on the parameters and outputs of the "importdata" function, please write down the following line:

>>> help(sig.importdata)

sigProfilerExtractor

Extracts mutational signatures from an array of samples.

sigProfilerExtractor(input_type, out_put, project, refgen="GRCh37", genome_build = "GRCh37", startProcess=1, endProcess=10, totalIterations=8, 
cpu=-1, hierarchy = False, mtype = ["default"],exome = False)


Parameters
----------

input_type: A string. Type of input. The type of input should be one of the following:
        - "vcf": used for vcf format inputs.
        - "table": used for table format inputs using a tab seperated file.


out_put: A string. The name of the output folder. The output folder will be generated in 
the current working directory. 

input_data: A string. Name of the input folder (in case of "vcf" type input) or the 
input file (in case of "table"  type input). The project file or folder should be inside the 
current working directory. For the "vcf" type input,the project has to be a folder which will 
contain the vcf files in vcf format or text formats. The "text"type projects have to be a file.   

refgen: A string, optional. The name of the reference genome. The default reference genome is "GRCh37". 
This parameter is applicable only if the input_type is "vcf".

genome_build: A string, optional. The build or version of the reference signatures for the refgen. 
The default genome build is GRCh37. If the input_type is "vcf", the genome_build automatically 
matches the input refgen value.        

startProcess: A positive integer, optional. The minimum number of signatures to be extracted. 
The default value is 1. 

endProcess: A positive integer, optional. The maximum number of signatures to be extracted. 
The default value is 10.

totalIterations: A positive integer, optional. The number of iteration to be performed to extract 
each number signature. The default value is 8. However, we STRONGLY RECOMMEND TO USE 1000 
iterations to get valid results. 

cpu: An integer, optional. The number of processors to be used to extract the signatures. 
The default value is -1 which will use all available processors. 

hierarchy: Boolean, optional. Defines if the signature will be extracted in a hierarchical fashion. 
The default value is "False".

par_h = Float, optional. Ranges from 0 t0 1. Default is 0.90. Active only if the "hierarchy" is True. 
Sets the cutoff to select the unexplained samples in a hierarchical layer based on the cosine similarity 
between the original and reconstructed samples.  

mtype: A list of strings, optional. The items in the list defines the mutational contexts to be considered 
to extract the signatures. The default value is ["96", "DINUC" , "ID"], where "96" is the SBS96 context, "DINUC"
is the DINULEOTIDE context and ID is INDEL context. Other options are: '6144', '384', '1536', '6', '24' .

exome: Boolean, optional. Defines if the exomes will be extracted. The default value is "False".

penalty: Float, optional. Takes any positive float. Default is 0.05. Defines the thresh-hold cutoff 
to asaign signatures to a sample.    

resample: Boolean, optional. Default is True. If True, add poisson noise to samples by resampling.  
    Examples
    --------

    >>> from sigproextractor import sigpro as sig

    # to get input from vcf files
    >>> path_to_example_folder_containing_vcf_files = sig.importdata("vcf")
    >>> data = path_to_example_folder_containing_vcf_files # you can put the path to your folder containing the vcf samples
    >>> sig.sigProfilerExtractor("vcf", "example_output", data, startProcess=1, endProcess=3)

    Wait untill the excecution is finished. The process may a couple of hours based on the size of the data.
    Check the current working directory for the "example_output" folder.

    # to get input from table format (mutation catalog matrix)
    >>> path_to_example_table = sig.importdata("table")
    >>> data = path_to_example_table # you can put the path to your tab delimited file containing the mutational catalog matrix/table
    >>> sig.sigProfilerExtractor("table", "example_output", data, genome_build="GRCh38", startProcess=1, endProcess=3)

    To get help on the parameters and outputs of the "sigProfilerExtractor" function, please write down the following line:

    >>> help(sig.sigProfilerExtractor)

GPU支持

SigProfileRextractor启用了GPU,可以在单个或多个GPU系统上运行,以在大多数情况下显著提高性能。

要使用此功能,请将GPU标志设置为真:

    sigProfilerExtractor(input_type, out_put, project, refgen="GRCh37", genome_build = "GRCh37", startProcess=1, endProcess=10, totalIterations=8, 
    cpu=-1, hierarchy = False, mtype = ["default"],exome = False, gpu=True)

如果发生CUDA内存不足异常,则需要减少使用的CPU进程数(参数cpu)。

有关更多信息、帮助和示例,请访问:https://osf.io/t6j7u/wiki/home/

版权所有

作为SigProfiler项目的一部分,本软件及其文档具有2018年版权。SigPrimeReloStudio框架是免费软件,并被分发,希望它是有用的,但没有任何保证;甚至没有隐含的保证适销性或适合特定用途。有关更多详细信息,请参阅GNU通用公共许可证。

联系信息

请在m0islam.ucsd.edu向S M Ashiqul Islam(Mishu)提出任何疑问或错误报告

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何在JUnit5中定义优先级   Web驱动程序将焦点切换到iframe的java困难   java JFileChooser没有文件名文本字段选项   本地化是否可以回退到Java中resourcebundle的宏语言(例如,nynorsk>norsk)   禁用时Java断言的性能拖动   未考虑执行中的java jsonschema2pojo maven插件配置   java微调器。setSelection未调用setOnItemSelectedListener函数   序列化XStream:序列化java的反序列化。sql。时间导致错误   java无法理解为什么“ajpnio8009execXX”线程在AbstractQueuedSynchronizer$ConditionObject上阻塞/等待时间。等候   Java date给我的格式是mm/dd/yyyy,其中jquery datepicker的日期格式是dd/mm/yyyy   jsf如何用javaweb应用程序在客户端重写csv文件   雅加达ee Java邮件Api,无法从outlook客户端读取“.msg附件”   java PreparedStatement性能调优