将元数据摄取到candig存储库的例程

candig-ingest的Python项目详细描述


将元数据摄取到Candig1.0服务器的例程 需要[Candig服务器](https://github.com/candig/candig-server), [docopt](http://docopt.readthedocs.io/en/latest/) 还有[熊猫](https://github.com/pandas-dev/pandas)。

  • 自由软件:GNU通用公共许可v3

您可以运行摄取,并使用生成的repo测试服务器,如下所示 (Candig服务器需要Python2.7<;1.0.0,Candig服务器需要Python3.6>;=1.0.0,请注意,当前不支持Python3.7。)

# Install
virtualenv test_server # If you are running Python 2
python3 -m venv test_server # If you are running Python 3.6
cd test_server
source bin/activate
pip install --upgrade pip setuptools
pip install candig-server # Specify anything <1.0.0 for Python 2.7, or >=1.0.0 for Python 3.6.
pip install candig-ingest

# ingest data and make the repo
mkdir ga4gh-example-data
ingest ga4gh-example-data/registry.db <path to example data, like: mock_data/clinical_metadata_tier1.json>

# optional
# add peer site addresses
candig_repo add-peer ga4gh-example-data/registry.db <peer site IP address, like: http://127.0.0.1:8001>

# optional
# create dataset for reads and variants
candig_repo add-dataset --description "Reads and variants dataset" ga4gh-example-data/registry.db read_and_variats_dataset

# optinal
# add reference set, data source: https://www.ncbi.nlm.nih.gov/grc/human/ or http://genome.wustl.edu/pub/reference/
candig_repo add-referenceset ga4gh-example-data/registry.db <path to downloaded reference set, like GRCh37-lite.fa> -d "GRCh37-lite human reference genome" --name GRCh37-lite --sourceUri "http://genome.wustl.edu/pub/reference/GRCh37-lite/GRCh37-lite.fa.gz"# optional
# add reads
candig_repo add-readgroupset -r -I <path to bam index file> -R GRCh37-lite ga4gh-example-data/registry.db read_and_variats_dataset <path to bam file>

# optional
# add variants
candig_repo add-variantset -I <path to variants index file> -R GRCh37-lite ga4gh-example-data/registry.db read_and_variats_dataset <path to vcf file>

# optional
# add sequence ontology set
# wget https://raw.githubusercontent.com/The-Sequence-Ontology/SO-Ontologies/master/so.obo
candig_repo add-ontology ga4gh-example-data/registry.db <path to sequence ontology set, like: so.obo> -n so-xp

# optional
# add features/annotations
#
## get the following scripts
# https://github.com/ga4gh/ga4gh-server/blob/master/scripts/glue.py
# https://github.com/ga4gh/ga4gh-server/blob/master/scripts/generate_gff3_db.py
#
## download the relevant annotation release from Gencode
# https://www.gencodegenes.org/releases/current.html
#
## decompress
# gunzip gencode.v27.annotation.gff3.gz
#
## buid the annotation database
# python generate_gff3_db.py -i gencode.v27.annotation.gff3 -o gencode.v27.annotation.db -v
#
# add featureset
candig_repo add-featureset ga4gh-example-data/registry.db read_and_variats_dataset <path to the annotation.db> -R GRCh37-lite -O so-xp

# optional
# add phenotype association set from Monarch Initiative
# wget http://nif-crawler.neuinfo.org/monarch/ttl/cgd.ttl
candig_repo add-phenotypeassociationset ga4gh-example-data/registry.db read_and_variats_dataset <path to the folder containing cdg.ttl>

# optional
# add disease ontology set, like: NCIT
# wget http://purl.obolibrary.org/obo/ncit.obo
candig_repo add-ontology -n NCIT ga4gh-example-data/registry.db ncit.obo

# launch the server
# at different IP and/or port: ga4gh_server --host 127.0.0.1 --port 8000
candig_server --host 127.0.0.1 --port 8000 -c NoAuth


https://127.0.0.1:8000/

然后,从另一个终端:

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json'\
    http://127.0.0.1:8000/datasets/search \
| jq '.'

给予:

{"datasets":[{"description":"PROFYLE test metadata","id":"WyJQUk9GWUxFIl0","name":"PROFYLE"}]}

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何在没有代码气味的情况下编写遵循Liskov替代和其他可靠原则的不可变映射?   java最新jre上的压缩字符串对旧编译代码有好处吗?   java是否可以在javascript中取消PrimeFaces menuitem onclick函数   mysql从SQL数据库中访问java中xml名称空间标记的值   从java程序打开excel文件   java在方法中使用“var”是否会使执行(并发)线程不安全?   java使搜索视图以一种关于AndroidManifest的通用方式可用。xml   java对如何准确使用正则表达式感到困惑?   mule如何访问java文件中的记录变量   java在从2D数组引发异常后继续   枚举当前设置为的java值   java当listview只有几个项目时,如何使alert对话框显示listview的所有项目?   java getTableRow()返回大于项大小的索引   c用java传输二进制文件(数据)   java更改多选列表项复选框颜色