我在cs50 stackexchange上发布了一个问题,但还没有得到答案,我已经在这个问题集上呆了一周了
链接:https://cs50.stackexchange.com/questions/37281/pset6-dna-loop-miscounts-longest-run-of-str
但总而言之,我的代码应该迭代一个DNA序列,计算短串联重复序列(STR)的最长连续运行时间,然后将其与给定的数据库进行比较,以确定该DNA属于谁
现在我正在计算STR的最长运行时间,我相信我的逻辑是正确的,但我无法确定为什么我的代码大部分时间都计算错误
For clarity:
if the expected output is [18, 23, 35, 13, 11, 19, 14, 24]
the actual output is [18, 23, 36, 13, 15, 19, 15, 26]
PSET材料:https://cdn.cs50.net/2019/fall/psets/6/dna/dna.zip
基本上,将数据库作为第一个命令行参数传递,将DNA序列作为第二个命令行参数传递。在上面使用的示例中,我使用了large.csv 6.txt
我在这个问题上已经被困太久了,如果能帮上忙,我将不胜感激。多谢各位
# modules
import sys
import csv
# ensure proper usage
if len(sys.argv) != 3:
print("Usage: python dna.py data.csv sequence.txt")
sys.exit()
database = sys.argv[1]
sequence = sys.argv[2]
run = 0
longrun = 0
STR_Count = []
# open database
with open(database, newline='') as csvfile:
db_reader = csv.reader(csvfile)
# a list of STRs
STR = list(next(db_reader))
STR.pop(0)
# open DNA sequence file
with open(sequence, newline='') as txtfile:
seq_reader = txtfile.read()
# for every STR
for x in range(len(STR)):
# iterate over the entire sequence
for y in range(len(seq_reader)):
# if STR matches sub-string
if STR[x] == seq_reader[y:y + len(STR[x])]:
run += 1
# update index
y = y + len(STR[x])
# if STR no longer matches sub-string
if not STR[x] == seq_reader[y:y + len(STR[x])]:
if run > longrun:
longrun = run
run = 0
STR_Count.append(longrun)
run = 0
longrun = 0
print(STR_Count)
目前没有回答
相关问题 更多 >
编程相关推荐