<p>鉴于问题<a href="https://docs.google.com/document/d/1Vv78_UAW3t3dtDka9QTKAiwIilS-OMzWI8-DqbBArVc/edit" rel="nofollow noreferrer">Google doc</a>的案文:</p>
<pre><code>001ea9a6-2c30-4ce6-b4ee-f445803303db 596d89bf-e641-4e9c-9374-695241694a6d 45640 MUNDO JURIDICO 26204 20350 \N A0000000031010 10530377 MATEO FLORES ELMER ASOC INTEGRACION LOS OLIVOS AV NAPOLES MZ I LTE 31 visanet 650011581 1 0 mpos contactless 0 0 0 visa 421410 5969 12 24 c6d06de1-c9f0-4b4a-992b-ad3a5f263039 000066 666407 2021-07-20 09:31:23 000 301201522830426 995212014496392 158348 \N \N \N PEN 200.00 200.00 0.00 1 0 0.00 200026214 5999 ASOC INTEGRACION LOS OLIVOS AV NAPOLES MZ I LTE 31 MUNDO JURIDICO 947761525 AUTHORIZED 2021-07-20 09:31:24 2021-07-20 09:32:00 210720 093123
001ec0fd-8d0e-4332-851a-bcd93fdf0a37 ee32d094-8fc4-4d92-b788-58ca2ae2a590 36750 Chifa Maryori 18923 25313 \N A0000000031010 46753818 Chifa Maryori San isidro Peru visanet 650011581 1 0 mpos contactless 0 0 0 visa 421355 5765 04 26 7d00f10e-d0b7-40ec-9b97-d20590cb7710 000708 620744 2021-06-27 16:52:21 000 301178787424243 996211783887688 100513 \N \N \N PEN 17.00 17.00 0.00 0 0 0.00 400034782 5499 San isidro Peru Chifa Maryori +51 01 5179805 AUTHORIZED 2021-06-27 16:52:23 2021-06-27 16:52:23 210627 165221
001ec214-97e6-4d84-a64a-d1bee0079fd8 3c4a98cc-d279-4f9e-af8b-5198647889c5 33214 Inversiones Polleria el rey MPOS 15699 23053 \N \N 88846910264 JoseOrbegoso Puelle D 5 Lt 2 urb Mariscal
\
ref : altura de elektra visanet 650011581 1 0 mpos chip 0 0 0 visa 455788 2123 09 22 8b3fd975-140a-42d5-8601-1b4bcc16bd60 000022 170790 2020-10-20 11:32:44 687 \N \N \N \N \N \N PEN 1.00 1.00 0.00 0 0 0.00 200020168 5999 D 5 Lt 2 urb Mariscal
\
ref : altura de elektra Inversiones Polleria el rey MPOS 964974226 DENIED 2020-10-20 11:32:44 2020-10-20 11:32:44 201020 113244
001ec66f-6350-4a33-a1fe-b9375dac2161 34169a7a-a66f-4258-80c2-8c0d4512aa36 27044 ABRAHAM LORENZO MELGAREJO 10353 13074 \N \N 99944748991 ABRAHAM LORENZO MELGAREJO JR RENOVACION 399 visanet 650011581 1 0 mpos chip 0 0 0 visa 455788 4712 08 24 83915520-1a1f-4d5e-b118-f0c57a9d96ae 000367 161286 2020-10-14 15:59:51 000 300288755920408 995202889603642 242407 \N \N \N PEN 15.00 15.00 0.00 1 0 0.00 200012523 5811 JR RENOVACION 399 ABRAHAM LORENZO MELGAREJO 957888909 AUTHORIZED 2020-10-14 15:59:52 2020-10-14 16:00:21 201014 155951
001eebaf-bccc-47a7-87a3-be8b09eb971b c5e14889-d61c-4f18-8000-d3bac0564cfb 27605 Polleria Arroyo Express 10792 21904 \N A0000000031010 41429707 Polleria Arroyo Express San isidro Peru visanet 650011581 1 0 mpos contactless 0 0 0 visa 421355 9238 09 25 3c5268e8-5731-4fea-905d-45b03ed623d2 000835 379849 2021-01-30 19:43:36 000 301031026163928 995210303271986 928110 \N \N \N PEN 24.00 24.00 0.00 0 0 0.00 400026444 5499 San isidro Peru Polleria Arroyo Express +51 01 5179805 AUTHORIZED 2021-01-30 19:43:37 2021-01-30 19:43:37 210130 194336
</code></pre>
<p>…使用来自<a href="https://stackoverflow.com/questions/47535168/concatenate-lines-with-previous-line-based-on-number-of-letters-in-first-column">Concatenate lines with previous line based on number of letters in first column</a>的代码,如果“好的行”都从<code>001</code>开始(这就是下面正则表达式检查的内容),您可以尝试</p>
<p><strong>代码:</strong></p>
<pre><code>import re
import pandas as pd
all_the_data = ""
pattern = r'^001.*$'
with open('aws.txt') as f, open('aws_out.csv','w') as output:
all_the_data = ""
for line in f:
if not re.search(pattern, line):
all_the_data = re.sub("\n$", "", all_the_data)
all_the_data = "".join([all_the_data, line])
output.write(all_the_data)
df = pd.read_csv('aws_out.csv', header=None, sep='\t')
print(df)
</code></pre>
<p><strong>输出示例:</strong></p>
<p><a href="https://i.stack.imgur.com/qg06Q.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/qg06Q.png" alt="enter image description here"/></a></p>