用Python加载大文件

Partition of a set of 36043 objects. Total size = 5307704 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 15934 44 1301016 25 1301016 25 str 1 50 0 628400 12 1929416 36 dict of __main__.NodeStatistics 2 7584 21 620936 12 2550352 48 tuple 3 781 2 590776 11 3141128 59 dict (no owner) 4 90 0 278640 5 3419768 64 dict of module 5 2132 6 255840 5 3675608 69 types.CodeType 6 2059 6 247080 5 3922688 74 function 7 1716 5 245408 5 4168096 79 list 8 244 1 218512 4 4386608 83 type 9 224 1 213632 4 4600240 87 dict of type <104 more rows. Type e.g. '_.more' to view.>

Partition of a set of 15934 objects. Total size = 1301016 bytes. Index Count % Size % Cumulative % Referred Via: 0 2132 13 274232 21 274232 21 '.co_code' 1 2132 13 189832 15 464064 36 '.co_filename' 2 2024 13 114120 9 578184 44 '.co_lnotab' 3 247 2 110672 9 688856 53 "['__doc__']" 4 347 2 92456 7 781312 60 '.func_doc', '[0]' 5 448 3 27152 2 808464 62 '[1]' 6 260 2 15040 1 823504 63 '[2]' 7 201 1 11696 1 835200 64 '[3]' 8 188 1 11080 1 846280 65 '[0]' 9 157 1 8904 1 855184 66 '[4]' <4717 more rows. Type e.g. '_.more' to view.>

Partition of a set of 7584 objects. Total size = 620936 bytes. Index Count % Size % Cumulative % Referred Via: 0 1995 26 188160 30 188160 30 '.co_names' 1 2096 28 171072 28 359232 58 '.co_varnames' 2 2078 27 157608 25 516840 83 '.co_consts' 3 261 3 21616 3 538456 87 '.__mro__' 4 331 4 21488 3 559944 90 '.__bases__' 5 296 4 20216 3 580160 93 '.func_defaults' 6 55 1 3952 1 584112 94 '.co_freevars' 7 47 1 3456 1 587568 95 '.co_cellvars' 8 35 0 2560 0 590128 95 '[0]' 9 27 0 1952 0 592080 95 '.keys()[0]' <189 more rows. Type e.g. '_.more' to view.>

Partition of a set of 781 objects. Total size = 590776 bytes. Index Count % Size % Cumulative % Referred Via: 0 1 0 98584 17 98584 17 "['locale_alias']" 1 29 4 35768 6 134352 23 '[180]' 2 28 4 34720 6 169072 29 '[90]' 3 30 4 34512 6 203584 34 '[270]' 4 27 3 33672 6 237256 40 '[0]' 5 25 3 26968 5 264224 45 "['data']" 6 1 0 24856 4 289080 49 "['windows_locale']" 7 64 8 20224 3 309304 52 "['inters']" 8 64 8 17920 3 327224 55 "['galog']" 9 64 8 17920 3 345144 58 "['salog']" <84 more rows. Type e.g. '_.more' to view.>

s 1.231932886 _25_ AGT --- 0 exp 10 [0 0 0 0 Y Y] ------- [25:0 0:0 32 0 0] s 1.232087886 _25_ MAC --- 0 ARP 86 [0 ffffffff 67 806 Y Y] ------- [REQUEST 103/25 0/0] r 1.232776108 _42_ MAC --- 0 ARP 28 [0 ffffffff 67 806 Y Y] ------- [REQUEST 103/25 0/0] r 1.232776625 _34_ MAC --- 0 ARP 28 [0 ffffffff 67 806 Y Y] ------- [REQUEST 103/25 0/0] r 1.232776633 _9_ MAC --- 0 ARP 28 [0 ffffffff 67 806 Y Y] ------- [REQUEST 103/25 0/0] r 1.232776658 _0_ MAC --- 0 ARP 28 [0 ffffffff 67 806 Y Y] ------- [REQUEST 103/25 0/0] r 1.232856942 _35_ MAC --- 0 ARP 28 [0 ffffffff 64 806 Y Y] ------- [REQUEST 100/25 0/0] s 1.232871658 _0_ MAC --- 0 ARP 86 [13a 67 1 806 Y Y] ------- [REPLY 1/0 103/25] r 1.233096712 _29_ MAC --- 0 ARP 28 [0 ffffffff 66 806 Y Y] ------- [REQUEST 102/25 0/0] r 1.233097047 _4_ MAC --- 0 ARP 28 [0 ffffffff 66 806 Y Y] ------- [REQUEST 102/25 0/0] r 1.233097050 _26_ MAC --- 0 ARP 28 [0 ffffffff 66 806 Y Y] ------- [REQUEST 102/25 0/0] r 1.233097051 _1_ MAC --- 0 ARP 28 [0 ffffffff 66 806 Y Y] ------- [REQUEST 102/25 0/0] r 1.233109522 _25_ MAC --- 0 ARP 28 [13a 67 1 806 Y Y] ------- [REPLY 1/0 103/25] s 1.233119522 _25_ MAC --- 0 ACK 38 [0 1 67 0 Y Y] r 1.233236204 _17_ MAC --- 0 ARP 28 [0 ffffffff 65 806 Y Y] ------- [REQUEST 101/25 0/0] r 1.233236463 _20_ MAC --- 0 ARP 28 [0 ffffffff 65 806 Y Y] ------- [REQUEST 101/25 0/0] D 1.233236694 _18_ MAC COL 0 ARP 86 [0 ffffffff 65 806 67 1] ------- [REQUEST 101/25 0/0]

import gc for i,line in enumerate(file(datafile)): if (i%500000==0): print '-----------This is line number', i collected = gc.collect() print "Garbage collector: collected %d objects." % (collected)

Partition of a set of 35474 objects. Total size = 5273376 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 15889 45 1283640 24 1283640 24 str 1 50 0 628400 12 1912040 36 dict of __main__.NodeStatistics 2 7559 21 617496 12 2529536 48 tuple 3 781 2 589240 11 3118776 59 dict (no owner) 4 90 0 278640 5 3397416 64 dict of module 5 2132 6 255840 5 3653256 69 types.CodeType 6 2059 6 247080 5 3900336 74 function 7 1716 5 245408 5 4145744 79 list 8 244 1 218512 4 4364256 83 type 9 224 1 213632 4 4577888 87 dict of type <104 more rows. Type e.g. '_.more' to view.>

class PacketStatistics(object): __slots__ = ('event_id', 'event_source', 'event_dest',...) def __init__(self): self.event_id = 0 self.event_source = 0 self.event_dest = 0 ...

Partition of a set of 36157 objects. Total size = 4758960 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 15966 44 1304424 27 1304424 27 str 1 7592 21 624776 13 1929200 41 tuple 2 780 2 587424 12 2516624 53 dict (no owner) 3 90 0 278640 6 2795264 59 dict of module 4 2132 6 255840 5 3051104 64 types.CodeType 5 2059 6 247080 5 3298184 69 function 6 1715 5 245336 5 3543520 74 list 7 225 1 232344 5 3775864 79 dict of type 8 244 1 223952 5 3999816 84 type 9 166 0 190096 4 4189912 88 dict of class <101 more rows. Type e.g. '_.more' to view.>

3条回答

网友
1楼 · 编辑于 2024-06-28 15:03:53

@mgilson的回答是正确的。不过，这个简单的解决方案值得官方提及（@HerrKaputt在评论中提到了这一点）
file = open('datafile') for line in file: process(line) file.close()
这是简单的，Python式的，可以理解的。如果你不明白with是如何工作的，就用这个。
如另一张海报所述，这不会创建像file.readlines（）这样的大列表。相反，它以unix文件/管道的传统方式一次完成一行。

网友
2楼 · 编辑于 2024-06-28 15:03:53

fileinput模块允许您逐行读取它，而无需将整个文件加载到内存中。pydocs
import fileinput for line in fileinput.input(['myfile']): do_something(line)
取自yak.net的代码示例

网友
3楼 · 编辑于 2024-06-28 15:03:53

with open('datafile') as f:
    for line in f:
        process(line)

这是因为文件是迭代器，一次只能产生一行，直到没有更多的行可以产生为止。

相关问题更多 >

编程相关推荐

热门问题

热门文章