Python在词组或词组之后提取信息

2条回答

网友

1楼 · 编辑于 2024-09-24 00:31:06

我想这应该是你想要的：

import re

textlist = ["some other amount as $32,4545.34 and Total Cumulative Payment (USD) $999,999.00 and such","Total cumulative payment $55587323.23"]

matchlist = []

for text in textlist:
    match = re.findall("(\$[.\d,]+)", text)
    if match:
        matchlist.extend(match)

print(matchlist)

结果：

^{pr2}$

正则表达式是寻找一个$和grab.，和数字到下一个空格。根据您正在解析的其他类型的数据，它可能需要调整，我假设您只想捕获句点、逗号和数字。在

更新：

它现在将找到任何数量的事件，并将它们全部放入一个列表中

网友

2楼 · 编辑于 2024-09-24 00:31:06

这样的事情可以用正则表达式来完成：

import re

source = 'total cumulative payment $2000.00;   some other amount $1234.56.    Total Cumulative Payment (USD) $5,600,000.06'
matches = re.findall( r'total\s+cumulative\s+payment[^$0-9]+\$([0-9,.]+)', source, re.IGNORECASE )
amounts = [ float( x.replace( ',', '' ).rstrip('.') ) for x in matches ]

这将与您给出的两个具体示例相匹配。但是你还没有给出多少关于匹配标准应该有多宽松，或者规则是什么的想法。如果源文档在单词“cumulative”中出现拼写错误，上述解决方案将丢失金额。或者如果金额没有美元符号出现。它还允许任何在“累计付款总额”和美元金额之间插入文本（因此您将从source = "This document contains information about total cumulative payment values, (...3 more pages of introductory material...) and by the way you owe me $20."得到一个假阳性）现在，这些东西可以被调整和改进-但前提是你知道什么是重要的，什么不是，并相应地收紧问题的规格。在

相关问题更多 >

编程相关推荐

热门问题

热门文章