这是我第6周DNA习题集的代码。当我使用small.csv进行测试时,它工作正常,但当使用large.csv进行测试时,它似乎不正确地计算重复序列。有人能帮我找到代码中的错误吗?我对这个很陌生
import csv
import sys
if len(sys.argv) != 3:
sys.exit("Usage: python dna.py STRcounts DNASequence")
check = True
STRlist = []
Humanlist = []
# copy person list
with open(sys.argv[1],"r") as STR:
readSTR = csv.reader(STR)
for row in readSTR:
if check:
STRlist.append(row)
check = False
else:
Humanlist.append(row)
Slist = STRlist[0]
Slist.remove("name")
# print(Humanlist)
# print(Slist)
seq=[]
# copy sequence
with open(sys.argv[2],"r") as text:
readtext = csv.reader(text)
for i in readtext:
seq = i
text = seq[0]
# print(text)
# create dictionary for STR
STRdict = {}
for STR in Slist:
STRdict[STR] = 0
for STR in Slist:
for letter in range(len(text)):
if STR == text[letter:letter+len(STR)]:
STRdict[STR] += 1
check = False
for human in range(len(Humanlist)):
for STR in range(len(Slist)):
if str(STRdict[Slist[STR]]) == str(Humanlist[human][STR+1]):
check = True
else:
check = False
break
if check:
print(Humanlist[human][0])
break
if not check:
print("no match")
我注释掉了不必要的部分,并添加了代码以获得STR重复序列的
max
长度。代码的其余部分保持不变,我得到了预期的结果我没有检查所有的代码以寻求可能的改进,但它确实得到了正确的结果
您的代码不正确的原因是它计算字符串中所有STR的出现次数,而不是计算连续的重复次数(然后查找最大重复次数)
这个问题来自Harvard problem
相关问题 更多 >
编程相关推荐