我有以下csv文件:
ID,PDBID,FirstResidue,FirstChain,SecondResidue,SecondChain,ThirdResidue,ThirdChain,FourthResidue,FourthChain,Pattern
RZ_AUTO_505,1hmh,A22L,C,A22L,A,G21L,A,A23L,A,AA/GA Naked ribose
RZ_AUTO_506,1hmh,A22L,C,A22L,A,G114,A,A23L,A,AA/GA Naked ribose
RZ_AUTO_507,1hmh,A130,E,A90,A,G80,A,A130,A,AA/GA Naked ribose
RZ_AUTO_508,1hmh,A140,E,A90,E,G120,A,A90,A,AA/GA Naked ribose
RZ_AUTO_509,1hmh,G102,A,C103,A,G102,E,A90,E,GC/GA Single ribose
RZ_AUTO_510,1hmh,G102,A,C103,A,G120,E,A90,E,GC/GA Single ribose
RZ_AUTO_511,1hmh,G113,C,C112,C,G21L,A,A23L,A,GC/GA Single ribose
RZ_AUTO_512,1hmh,G113,C,C112,C,G114,A,A23L,A,GC/GA Single ribose
RZ_AUTO_513,1hnw,C1496,A,G1497,A,A1518,A,A1519,A,CG/AA Canonical ribose
RZ_AUTO_514,1hnw,C1496,A,G1497,A,A1519,A,A1518,A,CG/AA Canonical ribose
RZ_AUTO_515,1hnw,C221,A,U222,A,A195,A,A196,A,CU/AA Canonical ribose
RZ_AUTO_516,1hnw,C221,A,U222,A,A196,A,A195,A,CU/AA Canonical ribose
如果firstResidge或SecondResidue或ThirdResidue或FourthResidue的值与正则表达式“[A-Za-z]$”匹配,则需要删除csv行。 输出应该如下所示。在
^{pr2}$到目前为止,我已经把每一列都保存为一个列表,但我不知道下一步该怎么做。这是我的代码:
import csv
import re
rzid = []
pdbid = []
first_residue = []
first_chain = []
second_residue = []
second_chain = []
third_residue = []
third_chain = []
fourth_residue = []
fourth_chain = []
rz_pattern = []
#open csv file rz45.csv
f = open( 'rz45.csv', 'rU' ) #open the file in read universal mode
for line in f:
cells = line.split( "," )
rzid.append( (cells[0]) )
pdbid.append( (cells[1]) )
first_residue.append( (cells[2]) )
first_chain.append( (cells[3]) )
second_residue.append( (cells[4]) )
second_chain.append( (cells[5]) )
third_residue.append( (cells[6]) )
third_chain.append( (cells[7]) )
fourth_residue.append( (cells[8]) )
fourth_chain.append( (cells[9]) )
rz_pattern.append( (cells[10]) )
f.close()
有人能帮忙吗?谢谢
更新1
import re
import csv
output = []
regex = '[AUGC]\d{1,4}'
#open csv file test_regex.csv
f = open( 'test_regex.csv', 'rU' ) #open the file in read universal mode
for line in f:
cells = line.split( "," )
output.append( [ cells[ 2 ], cells[ 4 ], cells[ 6 ], cells[ 8 ] ] )
match = re.search(regex, str(output))
if match:
print line
f.close()
我对代码做了一些修改,但是我仍然不确定如何检查单元格[2,4,6,8]中的所有值是否都符合给定的regex。有人能建议下一步怎么做吗?在
类似这样的方法(至少在您的示例中是这样的):
印刷品:
^{pr2}$根据regex过滤了数据之后,您就得到了
row
这是您想要的。或者把它写到一个新的csv或者任何你想要的。在相关问题 更多 >
编程相关推荐