基于文本和空格的表转换为数组

2024-10-05 14:23:10 发布

您现在位置:Python中文网/ 问答频道 /正文

将基于文本和空格的表转换为数组的最佳方法是什么

我从一个硬件读卡器得到以下输出。 我需要解析并获取每个单元格的值

TANK  PRODUCT               GALLONS  INCHES   WATER  DEG F   ULLAGE

  1   REGULAR UNLEADED  87    56987   75.77     0.0   83.3     4785
  2   18                       4578   86.08           83.6     1661
  3   SAMPLE                   1234   77.94     0.0   86.4     2140

两个测试用例

  • 当产品列作为空格示例“12数据”时
  • 当产品列作为两个空格示例“普通无铅87”时
  • 当任何单元格为空时

Tags: 方法文本示例硬件产品数组product空格
2条回答

由于单元格和数据中有空格,因此需要一点RegExp魔力:

import re

s = """
TANK  PRODUCT               GALLONS  INCHES   WATER  DEG F   ULLAGE

  1   12 Data                 56987   75.77     0.0   83.3     4785
  2   18                       4578   86.08     0.0   83.6     1661
  3   SAMPLE                   1234   77.94     0.0   86.4     2140
"""

s = re.sub('^\n', '', s)     # remove empty line a the start of 's'
s = re.sub('\n+ +', '\n', s) # remove spaces at the start of lines and empty lines
s = re.sub('\s +', '\t', s)  # replace two or more spaces with tab

table = [row.split('\t') for row in s.splitlines()]

print(table)

输出:

[
    ['TANK', 'PRODUCT', 'GALLONS', 'INCHES', 'WATER', 'DEG F', 'ULLAGE'], 
    ['1', '12 Data', '56987', '75.77', '0.0', '83.3', '4785'], 
    ['2', '18', '4578', '86.08', '0.0', '83.6', '1661'], 
    ['3', 'SAMPLE', '1234', '77.94', '0.0', '86.4', '2140']
]

但只有在单元格外部至少有2个空间,并且在单元格内部使用单个空间时,它才有效


短变量(相同的输出):

import re

s = """
TANK  PRODUCT               GALLONS  INCHES   WATER  DEG F   ULLAGE

  1   12 Data                 56987   75.77     0.0   83.3     4785
  2   18                       4578   86.08     0.0   83.6     1661
  3   SAMPLE                   1234   77.94     0.0   86.4     2140
"""

s = re.sub('  +', '\t', s)   # replace two or more spaces with tab

table = [row.strip().split('\t') for row in s.splitlines() if len(row) > 1]

print(table)

扩展版。它可以处理空单元格:

import re
from pprint import pprint

s = """
TANK  PRODUCT               GALLONS  INCHES   WATER  DEG F   ULLAGE

  1   REGULAR UNLEADED 87     56987   75.77     0.0   83.3     4785
  2   18                       4578   86.08           83.6     1661
  3   SAMPLE                   1234   77.94     0.0   86.4     2140
"""

lines = s.splitlines()                  # split the data by lines
lines = [l for l in lines if len(l)>0]  # remove empty lines

# function takes a line
# and returns a list of separators (positions of left edges of the cells)
def get_separators(line):
    separators = []
    i = 0
    while i < len(line)-1:
        i += 1
        if line[i] == " ": continue      # skip all spaces
        while i < len(line)-2:
            i += 1
            if line[i] != " ": continue  # then skip all non spaces
            if line[i+1] == " ":         # if there are two spaces
                separators.append(i)     # add the position to the separators
                break
    separators.append(len(line))         # add a separator at the end
    return separators

# get separators from the first line
separators = get_separators(lines[0])

# go through all lines and adjust positions of separators
for line in lines:
    separators_cur_line = get_separators(line)
    if len(separators) != len(separators_cur_line): continue
    for i, sep in enumerate(separators_cur_line):
        if sep > separators[i]:
            separators[i] = sep

# function takes a line and a list of separators
# and returns a list of cells (the line divided by the separators)
def get_cells(line, separators):
    res=[]
    start = 0
    for end in separators:
        cell = line[start:end].strip()
        start = end
        res.append(cell)
    return res

# get cells from all lines
data = [get_cells(line, separators) for line in lines]

pprint(data)

输出:

[
   ['TANK', 'PRODUCT', 'GALLONS', 'INCHES', 'WATER', 'DEG F', 'ULLAGE'],
   ['1', 'REGULAR UNLEADED 87', '56987', '75.77', '0.0', '83.3', '4785'],
   ['2', '18', '4578', '86.08', '', '83.6', '1661'],
   ['3', 'SAMPLE', '1234', '77.94', '0.0', '86.4', '2140']
]

限制:

  • 单元格内只允许单个空格
  • 列和单元格之间应至少有两个空格
  • 单元格只能包含一行

尝试利用str.splitlines()str.split()方法,例如:

s = """
TANK  PRODUCT               GALLONS  INCHES   WATER  DEG F   ULLAGE

  1   12                      56987   75.77     0.0   83.3     4785
  2   18                       4578   86.08     0.0   83.6     1661
  3   SAMPLE                   1234   77.94     0.0   86.4     2140
"""
result = []
for row in s.splitlines():
    result.append(row.split())

或者,使用列表理解:

result = [row.split() for row in s.splitlines()]

相关问题 更多 >