使用Regex和Python测试和提取多行文本

2024-09-28 05:27:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我想基于某个特性测试使用python中的regex获取包含的数据块。简而言之,这个伪代码解释了我想要实现的目标

If (Color feature is in the block message):
   bring that block

这是我在str.txt文件中的数据示例

.
.
This file contains various types of data formats and blocks

Country of the survey
CONTRY CODE: AAAA
POPULATION: 11111
GDP RANK: 22222

.
BLOCK MESSAGE
      BLOCK A:
LENGTH(M): 1.6
WEIGHT(KG):    76
    DISSABLITIY STATUS(Y/N): N
CHRONIC DISEASE: NONE

FAMILY MEMBERS: 3

END BLOCK

BLOCK MESSAGE

    BLOCK B:
EYE COLOR: BLACK

LENGTH(M): 1.9
     WEIGHT(KG): 89
DISSABLITIY STATUS(Y/N): N
   CHRONIC DISEASE: NONE
           FAMILY MEMBERS: 1
END BLOCK
BLOCK MESSAGE
BLOCK C:
     LENGTH(M): 17
WEIGHT(KG): 90
        DISSABLITIY STATUS(Y/N): Y

CHRONIC DISEASE: Yes
FAMILY MEMBERS: 4
END BLOCK

BLOCK MESSAGE
   BLOCK D:
   LENGTH(M): 195
   WEIGHT(KG): 90
   EYE COLOR: BROWN
DISSABLITIY STATUS(Y/N): N
CHRONIC DISEASE: NONE
FAMILY MEMBERS: 2
END BLOCK

.
.

我期望得到的是

BLOCK MESSAGE
BLOCK B:
EYE COLOR: BLACK
LENGTH(M): 1.9
WEIGHT(KG): 89
DISSABLITIY STATUS(Y/N): N
CHRONIC DISEASE: NONE
FAMILY MEMBERS: 1
END BLOCK

BLOCK MESSAGE
BLOCK D:
LENGTH(M): 195
WEIGHT(KG): 90
EYE COLOR: BROWN
DISSABLITIY STATUS(Y/N): N
CHRONIC DISEASE: NONE
FAMILY MEMBERS: 2
END BLOCK

我的问题是,如何才能从“block MESSAGE”到“END block”得到具有眼睛颜色特征的块消息?考虑到以下标准:

  1. 文本可能有不同的数据块
  2. 可能包含许多空格和新行
  3. 所需的特征“眼睛颜色”在消息中可能有不同的位置

我将高度重视,如果有任何解释的想法(s)和代码(s)为这个问题

提前谢谢大家


Tags: nonemessagestatusblockfamilylengthendeye
1条回答
网友
1楼 · 发布于 2024-09-28 05:27:42

一种简单的方法是使用循环:

  1. 打开文本文件并开始读取每行的文件
  2. 读一行直到找到一个块的开始
  3. 读这一行直到这一块的末尾
  4. 检查此块是否包含颜色
  5. 如果验证了4,则向输出添加块
  6. 返回2

注:

  • 只需使用操作符in检查一行是否包含字符串
  • 我使用regex模块替换行开头的空格(只是为了更漂亮的输出)

代码:

# Import regex module
import re

# Save block in a list
output = []
# Open file
with open("../temp.txt", "r")  as f:
    # Read file line per line
    line = f.readline()
    # While not at the end of file
    while line:
        # Search beginning block with "BLOCK MESSAGE"
        if "BLOCK MESSAGE" in line:
            # Init block variable
            block = ""

            # Loop till the string "END BLOCK"
            while line and "END BLOCK" not in line:
                # Add line
                block += line
                # Read next line
                line = f.readline()

            # If COLOR is in the block
            if "COLOR" in block:
                # Add the last line ("END BLOCK")
                block += line
                # Remove space begining line
                block = re.sub(r'\n\s+', '\n', block)
                # Add block to the outputs
                output.append(block)
        # Read next line
        line = f.readline()

输出:


print(output)
# ['BLOCK MESSAGE\nBLOCK B:\nEYE COLOR: BLACK\nLENGTH(M): 1.9\nWEIGHT(KG): 89\nDISSABLITIY STATUS(Y/N): N\nCHRONIC DISEASE: NONE\nFAMILY MEMBERS: 1\nEND BLOCK\n',
#  'BLOCK MESSAGE\nBLOCK D:\nLENGTH(M): 195\nWEIGHT(KG): 90\nEYE COLOR: BROWN\nDISSABLITIY STATUS(Y/N): N\nCHRONIC DISEASE: NONE\nFAMILY MEMBERS: 2\nEND BLOCK\n']

[ print(o) for o in output]
# BLOCK MESSAGE
# BLOCK B:
# EYE COLOR: BLACK
# LENGTH(M): 1.9
# WEIGHT(KG): 89
# DISSABLITIY STATUS(Y/N): N
# CHRONIC DISEASE: NONE
# FAMILY MEMBERS: 1
# END BLOCK

# BLOCK MESSAGE
# BLOCK D:
# LENGTH(M): 195
# WEIGHT(KG): 90
# EYE COLOR: BROWN
# DISSABLITIY STATUS(Y/N): N
# CHRONIC DISEASE: NONE
# FAMILY MEMBERS: 2
# END BLOCK

相关问题 更多 >

    热门问题