蛋白质组学:从包含几个蛋白质序列的文件中计算质量

2024-06-26 16:57:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我想计算我档案中每一种蛋白质的质量。你知道吗

到目前为止我的代码是:

from pyteomics import mass
 with open('file.txt') as f:
        for line in f:
             mass.calculate_mass(line)

当我用print(line)替换mass.calculate_mass时,所有的行都被正确打印。但是mass.calculate_mass(line)带来了几个错误消息:

Traceback (most recent call last):
File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/parser.py", line 275, in parse
    n, body, c = re.match(_modX_sequence, sequence).groups() AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):   File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 304, in __init__
    self._from_sequence(args[0], aa_comp)   File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 200, in _from_sequence
    show_unmodified_termini=True)   File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/parser.py", line 277, in parse
    raise PyteomicsError('Not a valid modX sequence: ' + sequence) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: "Not a valid modX sequence: 'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'\n"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):   File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 307, in __init__
    self._from_formula(args[0], mass_data)   File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 205, in _from_formula
    raise PyteomicsError('Invalid formula: ' + formula) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: "Invalid formula: 'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'\n"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):   File "/home/michaela/calculatemass.py", line 5, in <module>
    mass.calculate_mass(line)   File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 499, in calculate_mass
    else Composition(*args, **kwargs))   File "/home/michaela/.local/lib/python3.5/site-packages/pyteomics/mass/mass.py", line 312, in __init__
    'formula'.format(args[0])) pyteomics.auxiliary.PyteomicsError: Pyteomics error, message: 'Could not create a Composition object from string: "\'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY\'\n": not a valid sequence or formula'

我的文件如下所示:

'MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY'

我也试过了

你知道吗sequence='melnltqllplvlvhitfcgrpavsigvnlvglfgstddyvllqrigsqtlrkgdgggrhskdsrdsslidenrvrssnmklcrntglpvgcynvveggiydvvrysdlrkvkgmdfatlnrhsdgrpktrrgcrsrrrdgtvenaaqstpststsfsfkqpstppsgtsgngvsqrakvraaqpserkahqkatkvsqtgkqtgkeapvdeknsksverttkprgkvqkpvaeapqhapsrprqansfaavltasdlrscdlgsssvctdkaetqmtsmqlnkskhvpsstgrtaaqdngakktpqvatpvgesanakkqqdvdvdvdnallvghgsssngkkeggstglanvrtdhsrdvvdrraaaaapsnisvecpcapdaaspelgfvsalsrdfslgsslassadsvy'

我的文件中没有空行。 如果我在shell中尝试同样的方法,它会起作用:

mass.calculate_mass('MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY')

49589.2790365072

我也试过mass.calculate_mass(str(line)),但没用。你知道吗

你知道我做错了什么吗?你知道吗


Tags: inpyhomelibpackageslocallinesite
3条回答

我认为您对“EOL”字符('\n')有问题。尝试:

mass.calculate_mass(line.strip())

它有用吗?你知道吗

line.strip()从行中删除前导/尾随空格。有关详细信息,请参见^{}。你知道吗

在将文件更改为:

MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY (without sequence = or quotes)

将代码改为:

from pyteomics import mass

with open('file.txt') as f: for line in f: print(mass.calculate_mass(line))

您的文件似乎包含引号(')。它们被解释为序列的一部分并中断解析器。 如果只在文件中放置序列而不添加任何其他字符,则应该可以正常工作:

MELNLTQLPLVHITFCGRPAVSIGVVNLVGLFGSTDYVLLQRIGSQGQTALRKGDGGGRHSKDSRDSSLDSLEIENRVRSSNMKLCRNTGLPVGCYNVVEGGIYDVVRYSDLRKGKVKGMDFATLNRHSDGRPKTRRGCRSRRKRRRDGTVENAAQSTPSDTVSSSFKQPSTPVPTDPSGTSGGTNGVSQRAKVVRAAQPSERKAHQKATKVSQTSKQTGGKEAPAVDEKNSNGTKVERTRTTKPRAPGIPKERPPRVGKEKVQQLKPVAEAAPQHAPSRSPSPRQANSNFAAVVLTASDLRSCDLGSSLSNVSVCTDKAETQMTPTTGPVTTSMQLNKSKHVPSSTGRTAAQDNGAKKTPQVATPVGESANAKKQQDVVDVDNALLVGHGSSSNGKKEGGSTGLANVRTDHSRDVVDRRAAAAPSNSIVECPCAPDAASPELGFVTVESALSRDFSLGSSLASSADSVY

请注意,虽然没有记录在案,但行尾字符不会导致任何问题。你知道吗

相关问题 更多 >