Python文件和文本处理

text:'frank bora three', entityType:'noun' text:'jack blad four', entityType:'noun' text:'go', entityType:'action' text:'stay', entityType:'action' text:'three hundred sixty', entityType:'value' text:'two hundred eleven', entityType:'value'

2条回答

网友

1楼 · 编辑于 2024-09-30 12:16:53

你并不真的需要正则表达式：

只需在括号中拆分字符串：）

s = "- [frank bora three]asdasd(noun) [go](action) level [three hundred sixty](value)"

print(s[s.find("[")+1:s.find("]")]) #text inside []
print(s[s.find("(")+1:s.find(")")]) #noun inside ()

现在，您需要插入文件分割线并循环：

stringfile = """- [frank bora three](noun) [go](action) level [three hundred sixty](value)
- [jack blad four](noun) [stay](action) level [two hundred eleven](value)"""


for s in stringfile.splitlines():
    text = s[s.find("[")+1:s.find("]")]
    noun = s[s.find("(")+1:s.find(")")]

    print(text)
    print(noun)

网友

2楼 · 编辑于 2024-09-30 12:16:53

当解析像这样的复杂字符串时，使用两阶段方法通常更容易。如果我们首先拆分每个字符串：

temp = foo.split(')')[0:3]

为第一个字符串提供一个字符串列表：

temp = ['[frank bora three](noun', ' [go](action', ' level [three hundred sixty](value']

现在我们可以编写更简单的正则表达式，从每个子字符串中提取所需的文本：

re_text = re.compile(r'\[.+\]')
re_entity = re.compile(r'\(.+')
mytext = []
myentitites = []
for target in temp:
     mytext.append(re.search(re_text, target).group().strip('[]'))
     myentities.append(re.search(re_entity, target).group().strip('()'))

现在您有两个列表：

mynouns = ['frank bora three', 'go', 'three hundred sixty']
myentities = ['noun', 'action', 'value']

将它们压缩在一起，并创建一个新的元组对列表：

result = list(zip(mynouns, myentities)) #fix

看起来是这样的：

[('frank bora three', 'noun'),
 ('go', 'action'),
 ('three hundred sixty', 'value')]

现在你可以把这些输入到一个字符串中。（要为所需输出对该字符串集合进行分组，您可以创建一个字符串列表，然后在输出到文件之前按最后一个字对其进行排序）

相关问题更多 >

编程相关推荐

热门问题

热门文章