匹配和合并两个文本表？

Disease/Trait Mapped_gene p-Value Wegener's granulomatosis HLA-DPB1 2.00E-50 Wegener's granulomatosis TENM3 - DCTD 2.00E-06 Brugada syndrome SCN5A 1.00E-14 Brugada syndrome SCN10A 1.00E-68 Brugada syndrome HEY2 - NCOA7 5.00E-17 Major depressive disorder IRF8 - FENDRR 3.00E-07 Identifier Homologues Symbol CG11621 5286 HEY2 CG11621 5287 IRF8 CG11621 5287 PIK3C2B CG11621 5288 PIK3C2G CG11621 5288 PIK3C2G CG11949 2035 DCTD CG11949 2035 EPB41 CG11949 2036 EPB41L1 CG11949 2037 EPB41L2

1条回答

网友

1楼 · 发布于 2024-09-30 03:26:11

这应该如您所愿：

import csv

diseases = {}

# Load the disease file in memory
with csv.reader(open('table1.csv', 'rb')) as dfile:
    # Skip the header
    dfile.next()
    for disease, gene, pvalue in dfile:
        diseases[gene] = (disease, pvalue)

with csv.reader(open('table2.csv', 'rb')) as idfile, csv.writer(open('output.csv', 'wb')) as output:
    # Skip the header
    idfile.next()
    for ident, homologue, symbol in idfile:
        if symbol in diseases:
            output.writerow((ident, homologue, symbol) + diseases[symbol])

它假设Mapped_gene下的每个基因名都是唯一的。它可以很容易地扩展以处理重复项，否则。在

相关问题更多 >

编程相关推荐

热门问题

热门文章