如何在pythonscrpit中提取两行之间的数据

for name in files: with open(name, encoding="utf8") as infile: copy = False cnt=0 for line in infile: if line.strip()=="Page": copy = True continue if line.strip()=="TAX": copy = True continue elif line.strip() == "State": copy = False continue elif copy: print(line)

1条回答

网友

1楼 · 发布于 2024-09-28 01:33:04

正如Onno Rouast所评论的，提取规则是什么并不十分清楚。下面的两个例子都适用，但谁能说未来会带来什么呢

Regex Demo

import re

rex = r"""(?xm)         # extended mode and multiline
(?:^(?:Page|TAX).*\n)   # preceded by a line starting with either Page or TAX
\b([A-Z ]+)\b           # Looking for all capital letters or spaces"""

text = """TAX INVOICE (Under Rule 46 of the Central Goods & Service Tax Rules, 2017)
ANURAG ENTERPRISES ANURAG ENTERPRISES, VEDAVATHI NAGAR,CHALLAKERE ROAD HIRIYUR
State Code: 29

Page 1 of 1
KS LINGAPPA AND SON Industrial Area, Plot No 14. KSSIDC TBDam Road, Hosapete-583201 State Karnataka
State Code 29"""

companies = [s.strip() for s in re.findall(rex, text)]
print(companies)

印刷品：

['ANURAG ENTERPRISES ANURAG ENTERPRISES', 'KS LINGAPPA AND SON']

更新

import re

rex = r"""(?xm)         # extended mode and multiline
(?:^(?:Page|TAX).*\n)   # preceded by a line starting with either Page or TAX
\b([A-Z ]+)\b           # Looking for all capital letters or spaces"""

files = ['name1', name2', 'etc.']
all_companies = []
for name in files:
    with open(name, encoding="utf8") as infile:
        text = infile.read()
        # in case there can be multiple occurences in each file (it's not clear):
        companies = [s.strip() for s in re.findall(rex, text)]
        print(companies)
        all_companies.extend(companies) # list of all companies found in all files

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在pythonscrpit中提取两行之间的数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >