如何解析某些文本数据？

import re text = """B2100 Door Driver Key Cylinder Switch Failure B2101 Head Rest Switch Circuit Failure B2102 Antenna Circuit Short to Ground B2103 Antenna Not Connected B2104 Door Passenger Key Cylinder Switch Failure B2105 Throttle Position Input Out of Range Low B2106 Throttle Position Input Out of Range High B2107 Front Wiper Motor Relay Circuit Short to Vbatt B2108 Trunk Key Cylinder Switch Failure""" # text_arr = text.split("\^B[0-9][0-9][0-9][0-9]$\gi"); l = re.compile('\^B[0-9][0-9][0-9][0-9]$\gi').split(text) print(l)

['B2100\tDoor Driver Key Cylinder Switch Failure B2101\tHead Rest Switch Circuit Failure B2102\tAntenna Circuit Short to Ground B2103\tAntenna Not Connected B2104\tDoor Passenger Key Cylinder Switch Failure B2105\tThrottle Position Input Out of Range Low B2106\tThrottle Position Input Out of Range High B2107\tFront Wiper Motor Relay Circuit Short to Vbatt B2108\tTrunk Key Cylinder Switch Failure']

3条回答

网友

1楼 · 编辑于 2024-10-03 15:25:01

import re
text = """B2100 Door Driver Key Cylinder Switch Failure B2101   Head Rest Switch Circuit Failure B2102  Antenna Circuit Short to Ground B2103   Antenna Not Connected B2104 Door Passenger Key Cylinder Switch Failure B2105    Throttle Position Input Out of Range Low B2106  Throttle Position Input Out of Range High B2107 Front Wiper Motor Relay Circuit Short to Vbatt B2108    Trunk Key Cylinder Switch Failure"""

l = [i for i in re.split('(B[0-9]{4}\s+)', text) if i]
print '\n'.join(['{}*{}'.format(id_.strip(), label.strip()) for id_,label in zip(l[0::2], l[1::2])])

.split可以在拆分后保留分隔符，如果在正则表达式中包含（）。以上产生输出：

B2100*Door Driver Key Cylinder Switch Failure
B2101*Head Rest Switch Circuit Failure
B2102*Antenna Circuit Short to Ground
B2103*Antenna Not Connected
B2104*Door Passenger Key Cylinder Switch Failure
B2105*Throttle Position Input Out of Range Low
B2106*Throttle Position Input Out of Range High
B2107*Front Wiper Motor Relay Circuit Short to Vbatt
B2108*Trunk Key Cylinder Switch Failure

网友

2楼 · 编辑于 2024-10-03 15:25:01

基本上，你想：

在输入中查找任何Bxxxx字符串。你知道吗
用换行符替换前面的空白。你知道吗
用*替换它们后面的空白。你知道吗

这一切都可以通过一个re.sub()来完成：

re.sub(r'\s*(B\d{4})\s*', r'\n\1*', text).strip()

匹配模式：

\s*              # Any amount of whitespace
   (B\d{4})      # "B" followed by exactly 4 digits
           \s*   # Any amount of whitespace

替换模式：

\n               # Newline
  \1             # The first parenthesized sequence from the matching pattern (B####)
    *            # Literal "*"

strip()的目的是删减任何前导或尾随的空格，包括将由第一个B#####序列的子序列产生的换行符。你知道吗

网友

3楼 · 编辑于 2024-10-03 15:25:01

首先，你的正则表达式是错误的 “^B[0-9][0-9][0-9][0-9]$\gi”

修饰符在Python上不是这样工作的
^和$表示行首和行尾，与文本中的任何内容都不匹配
倍数[0-9]可以替换为“[0-9]{4}”
如果您想忽略大小写，请在Pythonregex上使用相应的东西

考虑到这一点，实现所需的简单代码如下：

l = [x.strip() for x in re.compile('\s*(B\d{4})\s*', re.IGNORECASE).split(text)]
lines = ['*'.join(l[i:i+2]) for i in range(0,len(l),2)]

相关问题更多 >

编程相关推荐

热门问题

热门文章