python2.7中大列表的时间复杂性

def parse_data(file, op_file_test): ins = csv.reader(open(file, 'rb'), delimiter = '\t') pc = list() rd = list() deltas = list() reoccurance = list() try: for row in ins: pc.append(int(row[0])) rd.append(int(row[1])) except: print row pass unique_pc = set(pc) unique_pc = list(unique_pc) print "closing file" #takes a long time from here! for a in range(0, len(unique_pc)): index_first_occurance = pc.index(unique_pc[a]) index_last_occurance = len(pc) - 1 - pc[::-1].index(unique_pc[a]) delta_rd = rd[index_last_occurance] - rd[index_first_occurance] deltas.append(int(delta_rd)) reoccurance.append(pc.count(unique_pc[a])) print unique_pc[a] , delta_rd, reoccurance[a] print "printing to file" map_file = open(op_file_test,'a') for a in range(0, len(unique_pc)): print >>map_file, "%d, %d, %d" % (unique_pc[a], deltas[a], reoccurance) map_file.close()

3条回答

网友

1楼 · 编辑于 2024-09-30 23:35:44

扫描输入文件中的项时，将这些项放入collections.defaultdict(list)，其中键是项，值是出现索引的列表。读取文件并建立此数据结构需要线性时间，而获取项的第一次和最后一次出现索引需要恒定时间，而获取项的出现次数则需要恒定时间。在

下面是它的工作原理

mydict = collections.defaultdict(list)
for item, index in itemfilereader: # O(n)
    mydict[item].append(index)

# first occurrence of item, O(1)
mydict[item][0]

# last occurrence of item, O(1)
mydict[item][-1]

# number of occurrences of item, O(1)
len(mydict[item])

网友

2楼 · 编辑于 2024-09-30 23:35:44

也许它值得改变使用的数据结构。我将使用一个dict，它使用pc作为键，使用occurrence作为值。在

lookup = dict{}
counter = 0
for line in ins:
    values = lookup.setdefault(int(line[0]),[])
    values.append(tuple(counter,int(line[1])))
    counter += 1

for key, val in lookup.iteritems():
    value_of_first_occurence = lookup[key][1][1]
    value_of_last_occurence = lookup[key][-1][1]
    first_occurence = lookup[key][1][0]
    last_occurence = lookup[key][-1][0]
    value = lookup[key][0]

网友

3楼 · 编辑于 2024-09-30 23:35:44

尝试以下操作：

from collections import defaultdict

# Keep a dictionary of our rd and pc values, with the value as a list of the line numbers each occurs on
# e.g. {'10': [1, 45, 79]}
pc_elements = defaultdict(list)
rd_elements = defaultdict(list)

with open(file, 'rb') as f:
    line_number = 0
    csvin = csv.reader(f, delimiter='\t')
    for row in csvin:
        try:
            pc_elements[int(row[0])].append(line_number)
            rd_elements[int(row[1])].append(line_number)
            line_number += 1
        except ValueError:
            print("Not a number")
            print(row)
            line_number += 1
            continue

for pc, indexes in pc_elements.iteritems():
    print("pc  {0} appears {1} times. First on row {2}, last on row {3}".format(
        pc,
        len(indexes),
        indexes[0],
        indexes[-1]
    ))

这是通过在读取TSV时创建一个字典，以pc值为键，以出现列表为值。根据dict的性质，键必须是唯一的，因此我们避免使用set，而{}值只用于保存键所在的行。在

示例：

^{pr2}$

将输出：

"pc 10 appears 4 times. First on row 4, last on row 101"
"pc 8 appears 3 times. First on row 3, last on row 13"

相关问题更多 >

编程相关推荐

热门问题

热门文章