在python中拆分一行；只取前4个值

1020 200123242151111231 bla bla bla 200123331231231441 bla bla bla 1030 200123242151111231 bla bla bla 200123331231231441 bla bla bla

200123242151111231 bla bla bla 200123331231231441 bla bla bla

3条回答

网友

1楼 · 编辑于 2024-10-02 14:26:08

逐行阅读会很好。您可以检查字符串长度是否为4，然后跳过它。你知道吗

网友

2楼 · 编辑于 2024-10-02 14:26:08

逐行读取文件可能更好。这样，如果文件太大，您就不会遇到内存过载的问题，而且您还可以对行本身运行4位检查，而不会出现尴尬的拆分。你知道吗

doc = 0
towrite = ""
with open("somefile.txt", "r") as f:
    for i, line in enumerate(f):
        if len(line.strip()) == 4 and line.strip().isdigit():
            if i > 0:  # write txt from prior parse
                wfile = open("{}.txt".format(doc), "w")
                wfile.write(towrite)
                wfile.close()
            doc = line.strip()
            towrite = ""  # reset
        else:
            towrite += line
wfile = open("{}.txt".format(doc), "w")
wfile.write(towrite)
wfile.close()

测试文件：

1234
43267583291483 1234 3213213
57489367483929 32133248 3728913
3267
32163721837362 4723 3291832
42189323471911 321113 3211111132
326189183828327 3218484828283 828238281
21838282387 3726173 6278
1111
1236274818 327813678
32167382167894829013 321

结果：

1234.txt文件

43267583291483 1234 3213213
57489367483929 32133248 3728913

3267.txt文件

32163721837362 4723 3291832
42189323471911 321113 3211111132
326189183828327 3218484828283 828238281
21838282387 3726173 6278

1111.txt

1236274818 327813678
32167382167894829013 321

网友
3楼 · 编辑于 2024-10-02 14:26:08

^匹配字符串的开头

$匹配字符串的结尾

findall返回所有匹配项的列表，如果使用（捕获组），它将返回捕获组

（？：）是非捕获组

*是贪婪的，*？不是

此解决方案应适用于：

import re

file = open('testnew.txt', 'r')

i=0
for x in re.findall(r"((?:.|\n)*?)(?:(?:^|\n)\d{4}\n|$)", file.read()):
    if x: # skip empty matches
      f = open('%d.txt' %i,'w')
      f.write(x)
      f.close()
      print (x,i)
      i = i+1

相关问题更多 >

编程相关推荐

热门问题

热门文章