Python: 从URL列表中提取网络文本字符串

DR Proteomes; UP000005640; Chromosome 3. DR Bgee; C9J872; -. DR ExpressionAtlas; C9J872; baseline and differential. DR GO; GO:0005634; C:nucleus; IBA:GO_Central. DR GO; GO:0005667; C:transcription factor complex; IEA:InterPro. DR GO; GO:0003677; F:DNA binding; IEA:UniProtKB-KW. DR GO; GO:0000981; F:sequence-specific DNA binding RNA polymerase II transcription factor activity; IBA:GO_Central. DR GO; GO:0003712; F:transcription cofactor activity; IEA:InterPro. DR GO; GO:0000278; P:mitotic cell cycle; IEA:InterPro.

import urllib2 import sys import re IDlist = ['C9JVZ1', 'C9JLN0', 'C9J872'] URLlist = ["http://www.uniprot.org/uniprot/"+x+".txt" for x in IDlist] function_list = [] for item in URLlist: textfile = urllib2.urlopen(item) myfile = textfile.read() for line in myfile: function = re.search('P:(.+?);', line).group(1) function_list.append(function)

1条回答

网友

1楼 · 发布于 2024-06-01 13:05:48

这是一个包含你的字典的更新文件。注意，我将循环控件更改为文件ID上的键：该ID用作字典键。你知道吗

import urllib2
import re

IDlist = ['C9JVZ1', 'C9JLN0', 'C9J872']
function_dict = {}

# Cycle through the data files, keyed by ID
for id in IDlist:

    # Start a new list of functions for this file.
    # Open the file and read line by line.
    function_list = []
    textfile = urllib2.urlopen("http://www.uniprot.org/uniprot/"+id+".txt")
    myfile = textfile.readlines()

    for line in myfile:

        # When you find a function tag, extract the function and add it to the list.
        found = re.search(' [PCF]:(.+?);', line)
        if found:
            function = found.group(1)
            function_list.append(function)

    # At end of file, insert the list into the dictionary.
    function_dict[id] = function_list

print function_dict

我从你的数据中得到的结果是

{'C9JVZ1': [], 'C9J872': ['nucleus', 'transcription factor complex', 'DNA binding', 'sequence-specific DNA binding RNA polymerase II transcription factor activity', 'transcription cofactor activity', 'mitotic cell cycle', 'regulation of transcription from RNA polymerase II promoter', 'transcription, DNA-templated'], 'C9JLN0': ['cytosol']}

相关问题更多 >

编程相关推荐

热门问题

热门文章