我目前正在为一个生物信息学类处理数据库,我在格式化SQL输出时遇到了问题。我的python脚本查询这些元组。这里不是我的输出:
CENPVP2 441495 9606 NR_033773.1 None NC_000023.11 None
CENPVP2 441495 9606 NR_033773.1 None NT_011630.15 None
CENPVP2 441495 9606 None None NG_022599.1 None
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NC_000023.11 12477932
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NC_000023.11 16382448
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NC_000023.11 18976975
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NG_027735.1 12477932
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NG_027735.1 16382448
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NG_027735.1 18976975
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NT_011786.17 12477932
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NT_011786.17 16382448
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NT_011786.17 18976975
CT47A11 255313 9606 None None NG_027735.1 12477932
CT47A11 255313 9606 None None NG_027735.1 16382448
CT47A11 255313 9606 None None NG_027735.1 18976975
如果每个字段用制表符分隔,我需要这个,每个字段仍然用制表符分隔,但是现在有多值字段,其中值用管道分隔,空值用破折号表示:
CENPVP2 441495 9606 NR_033773.1 - NC_000023.11|NG_022599.1|NT_011630.15 -
CT47A11 255313 9606 NM_173571.2 NP_775842.2 NC_000023.11|NG_027735.1|NT_011786.17 12477932|16382448|18976975
格式化输出以匹配第二个表的最佳方法是什么?你知道吗
下面是我的python脚本:
import sys
import getopt
import psycopg2
def writerows(row, outFile):
outFile.write("%s\t" % row[1])
outFile.write("%s\t" % row[0])
outFile.write("%s\t" % row[2])
outFile.write("%s\t" % row[3])
outFile.write("%s\t" % row[5])
outFile.write("%s\t" % row[4])
outFile.write("%s\n" % row[6])
def usage(err):
print("I will handle this later")
def main():
inFile = sys.stdin
outFile = sys.stdout
try:
opts, args = getopt.getopt(sys.argv[1:], "i:o:")
except getopt.GetoptError as err:
usage(err)
sys.exit(2)
for (opt, arg) in opts:
if(opt == "-i"):
inFile = open(arg, "r")
if(opt == "-o"):
outFile = open(arg, "w")
line = inFile.readline()
line = line.replace("\n", "")
conn = psycopg2.connect("dbname=********* user=*********** "
"password=********** host=localhost")
cursor = conn.cursor()
while (line):
cursor.execute("SELECT DISTINCT geneinfo.gene_id, geneinfo.symbol, "
"geneinfo.tax_id, gene2refseq.rna_accession, "
"gene2refseq.gen_accession, "
"gene2refseq.pro_accession, gene2pubmed.pubmed_id FROM "
"geneinfo LEFT JOIN gene2refseq ON "
"geneinfo.gene_id = gene2refseq.gene_id LEFT JOIN "
"gene2pubmed ON geneinfo.gene_id = gene2pubmed.gene_id "
"WHERE geneinfo.symbol ILIKE '"+line+"' OR "
"geneinfo.synonyms ILIKE '%"+line+"%' ORDER BY "
"geneinfo.symbol ASC, geneinfo.tax_id ASC;")
result = cursor.fetchone()
if result:
while result:
writerows(result, outFile)
result = cursor.fetchone()
else:
outFile.write("\n")
line = inFile.readline()
line = line.replace("\n", "")
cursor.close()
conn.close()
if (__name__=='__main__'):
main()
冻糕在评论中说的话。查询将类似于:
相关问题 更多 >
编程相关推荐