在python中删除字符中间字符串（iterable）

import sys import pprint occ_list = [] observed = {} lines = sys.stdin.readlines() for line in lines: l = line.strip() i = l.split(' ') word = i[0] rel = i[1] wirts = i[2:-1] wirt = ' '.join(wirts) # Word-in-relation-to (which may be compund) occ = i[-1] # Frequency of specific "word, rel, wirt" arb = (word, rel, wirt) occ_list.append(int(occ)) if not arb in observed.keys(): observed[arb] = [] if not occ in observed[arb]: observed[arb].append(int(occ)/float(1064542)) pprint.pprint(observed)

3条回答

网友

1楼 · 编辑于 2024-06-28 19:41:05

使用正则表达式：

#!/usr/bin/env python
import fileinput
import re
from collections import defaultdict
from pprint import pprint

occ_list = []
observed = defaultdict(list)
for line in fileinput.input():
    m = re.search(r"(\S+)\s+([^:]+:[^:]+:\S+)\s+(\S+)\s+(\d+)", line)
    if m:
       word, rel, wirt, occ = m.groups()
       occ = int(occ)
       occ_list.append(occ)
       observed[word, rel, wirt].append(occ / 1064542.0)

pprint(occ_list)
pprint(dict(observed))

Output

[1, 1, 6, 1, 1, 1, 1]
{('abroad', 'a:at:n', 'request'): [9.393711098293914e-07],
 ('abroad', 'a:at:n', 'silence'): [9.393711098293914e-07],
 ('abroad', 'a:at:n', 'time'): [5.636226658976349e-06],
 ('abroad', 'a:because of:n', 'schedule'): [9.393711098293914e-07],
 ('abroad', 'a:by:n', 'american'): [9.393711098293914e-07],
 ('abroad', 'a:by:n', 'bank'): [9.393711098293914e-07],
 ('abroad', 'a:by:n', 'blow'): [9.393711098293914e-07]}

网友

2楼 · 编辑于 2024-06-28 19:41:05

除了第二位信息中的冒号之外，您希望文本文件中还有冒号吗？如果不是，我建议用分号来去掉空格。但是，如果您想在其他信息中允许使用其他冒号，那么我建议使用^{} (regex)模块。你知道吗

# Split on colons:
bits = l.split(':')
# remove spaces in the second part
bits[1] = bits[1].replace(' ','')
# join again
l = ':'.join(bits)
# do rest of code.

另外，我想你在问题中也提到了这一点，但我想澄清一下。你有这样的台词吗？你知道吗

abroad a:by:because of american 1

在这种情况下，您希望rel成为a:by:because of吗？你知道吗

信息部分3（wirts）可以是多个单词吗？关于：

abroad a:by:because of american silence 2

你怎么知道哪个词属于哪个词？你知道吗

我想你需要一本字典，里面有空格，在这种情况下是允许的。你知道吗

网友

3楼 · 编辑于 2024-06-28 19:41:05

从空格处分开开始。如果第二项不包含空格，则其中应包含2个冒号；如果只有一个冒号，则第二项中有空白，因此第二项和第三项是单个项的一部分。你知道吗

parts = line.split()
if parts[1].count(":") == 1:
    parts[1 : 3] = [" ".join(parts[1 : 3])]

Output

相关问题更多 >

编程相关推荐

热门问题

热门文章