Python:解析.txt fi中的文本

3条回答

网友

1楼 · 编辑于 2024-09-29 17:15:24

如果我对你的理解正确（虽然我不完全确定我是否理解），这将产生我认为你需要的结果。你知道吗

import re

with open('data.txt', 'r') as f:
    f_txt = f.read() # Change file object to text
    f_lines = re.split(r'\n(?=\d)', f_txt)
    matrix = []
    for line in f_lines:
        inner1 = line.split('\n')
        inner2 = [re.split(r'\s{2,}', l) for l in inner1]
        matrix.append(inner2)

print(matrix)
print('')
for row in matrix:
    print(row)

程序输出：

[[['1', 'firm A', 'Manhattan (company name)', '25,000'], ['', 'SK Ventures', '25,000'], ['', 'AEA investors', '10,000']], [['2', 'firm B', 'Tencent collaboration', '16,000'], ['', 'id TechVentures', '4,000']], [['3', 'firm C', 'xxx', '625']]]

[['1', 'firm A', 'Manhattan (company name)', '25,000'], ['', 'SK Ventures', '25,000'], ['', 'AEA investors', '10,000']]
[['2', 'firm B', 'Tencent collaboration', '16,000'], ['', 'id TechVentures', '4,000']]
[['3', 'firm C', 'xxx', '625']]

我是基于你希望矩阵的第一行是： [[1,Firm A,Manhattan,25,000],['',SK Ventures,25,000],['',AEA investors,10,000]]

然而，为了用更多的行来实现这一点，我们得到了一个嵌套3层的列表。这就是print(matrix)的输出。这可能有点难以使用，这就是为什么TessellatingHeckler的答案使用字典来存储数据，我认为这是一个更好的方式来访问你需要的。但是如果你想要的是“矩阵”列表，那么我上面写的代码就是这样做的。你知道吗

网友

2楼 · 编辑于 2024-09-29 17:15:24

根据您所给出的data*，如果行以数字或空格开头，则输入会发生变化，并且可以将数据分隔为

（数字）（空格）（字母加1空格）（空格）（字母加1空格）（空格）（数字+逗号）

或者

（空格）（字母加1空格）（空格）（数字+逗号）

这就是下面的两个正则表达式所要寻找的，它们用前导数字的索引构建了一个字典，每个索引都有一个公司名称和一个公司和值对列表。你知道吗

我真的不知道你的矩阵安排是什么。你知道吗

import re

data = {}
f = open('data.txt')
for line in f:
    if re.match('^\d', line):
        matches = re.findall('^(\d+)\s+((\S\s|\s\S|\S)+)\s\s+((\S\s|\s\S|\S)+)\s\s+([0-9,]+)', line)
        idx, firm, x, company, y, value = matches[0]
        data[idx] = {}
        data[idx]['firm'] = firm.strip()
        data[idx]['company'] = [(company.strip(), value)]
    else:
        matches = re.findall('\s+((\S\s|\s\S|\S)+)\s\s+([0-9,]+)', line)
        company, x, value = matches[0]
        data[idx]['company'].append((company.strip(), value))

import pprint
pprint.pprint(data)

{'1': {'company': [('Manhattan (company name)', '25,000'),
                   ('SK Ventures', '25,000'),
                   ('AEA investors', '10,000')],
       'firm': 'firm A'},

 '2': {'company': [('Tencent collaboration', '16,000'),
                   ('id TechVentures', '4,000')],
       'firm': 'firm B'},

 '3': {'company': [('xxx', '625')], 
       'firm': 'firm C'}
}

*这适用于您的示例，但可能无法很好地适用于您的实际数据。基督教青年会。你知道吗

网友

3楼 · 编辑于 2024-09-29 17:15:24

如果你知道所有的开始位置：

# 0123456789012345678901234567890123456789012345678901234567890
# 1       firm A         Manhattan (company name)     25,000 
#                        SK Ventures                  25,000
#                        AEA investors                10,000 
# 2       firm B         Tencent collaboration        16,000 
#                        id TechVentures              4,000 
# 3       firm C         xxx                          625 
# Field #1 is 8 wide (0 -> 7)
# Field #2 is 15 wide (8 -> 22)
# Field #3 is 19 wide (23 -> 41) 
# Field #4 is arbitrarily wide (42 -> end of line)
field_lengths = [ 8, 15, 19, ]
data = []
with open('/path/to/file', 'r') as f:
    row = f.readline()
    row = row.strip()
    pieces = []
    for x in field_lengths:
        piece = row[:x].strip()
        pieces.append(piece)
        row = row[x:]
    pieces.append(row)
    data.append(pieces)

相关问题更多 >

编程相关推荐

热门问题

热门文章