如何提取文本fi中的特定行

2024-05-18 11:41:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在对一个大文档进行文本挖掘。我想提取一个特定的行。你知道吗

CONTINUED ON NEXT PAGE   CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 4 OF 16 PAGES  

SPE2DH-20-T-0133   SECTION B  

PR: 0081939954   NSN/MATERIAL: 6530015627381

ITEM DESCRIPTION

BOTTLE, SAFETY CAP

BOTTLE, SAFETY CAP   RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT

RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:

我想立即在ITEM DESCRIPTION下提取描述。你知道吗

我尝试过许多不成功的尝试。你知道吗

我最近的尝试是:

for line in text:
    if 'ITEM' and 'DESCRIPTION'in line:
        print ('Possibe Descript:\n', line)

但它没有找到文本。你知道吗

有没有办法找到ITEM DESCRIPTION并得到后面的行或类似的东西?你知道吗


Tags: of文本bottleonlinepagedescriptionitem
3条回答

下面的函数在某些给定的pattern下面的行中查找描述,例如“ITEM description”,并忽略中间可能存在的任何空行。但是,请注意,当模式存在时,函数不会处理特殊情况,但描述不会。你知道吗

txt = '''
CONTINUED ON NEXT PAGE CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED:    PAGE 4 OF 16 PAGES

SPE2DH-20-T-0133 SECTION B

PR: 0081939954 NSN/MATERIAL: 6530015627381

ITEM DESCRIPTION

BOTTLE, SAFETY CAP

BOTTLE, SAFETY CAP RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT

RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:
'''

我假设您将文本作为文本字符串,因此下面的函数将其拆分为一系列行。。你知道吗

pattern = "ITEM DESCRIPTION" # to search for

def find_pattern_in_txt(txt, pattern):
    lines = [line for line in txt.split("\n") if line] # remove empty lines
    if pattern in lines: return lines[lines.index(pattern)+1]
    return None

print(find_pattern_in_txt(txt, pattern)) # prints: "BOTTLE, SAFETY CAP"

测试如下:

description = False
for line in text:
    if 'ITEM DESCRIPTION' in line:
        description = True
    if description:
        print(line)

我知道这会奏效,但你需要一些东西来停止阅读的描述,也许像这样的另一个标题

description = False
for line in text:
    if 'ITEM DESCRIPTION' in line:
        description = True
    if description:
        print(line)
    if "END OF SOMETHING":
        description = False

使用字符串函数find,如下所示,“find”将返回您要查找的字符串的索引,因此正数表示您已找到它。你知道吗

代码:


txt = "Hello, welcome to my world."
x = txt.find("welcome")
if x > 0:  
    print(x)

***
output:
***
7

相关问题 更多 >