在提取多个模式之间的数据时需要Python解决方案保存到csv

2024-09-27 20:16:14 发布

您现在位置:Python中文网/ 问答频道 /正文

有人能帮我解决这个问题吗。我在文本文件中有一组问题,我想提取multple regex模式之间可用的内容,并将其写入文本或csv文件

我想将问题的内容(底部给出的测试数据)添加到一个带有“,”分隔符的新文本/csv文件中 文件内容输出的第一行如下所示

以下哪种,快速处理器,它们必须是双…,类似RAM…,快速网络…,B,双主…

输出文本/csv文件的下一行应包含 以下哪项是……、Micro、Warm、特洛伊木马、病毒、B、双宿主或双宿主可指……。

注意:忽略。。。。如上所述,用于表示之前的剩余内容。 问题两个信息,分开

我想用regex和loop实现这一点,因为问题编号、多项选择选项、答案和解释字段都是提取数据的主要实体

测试数据: 问题1

以下哪项是IDS/IPS系统或代理服务器需要的硬件要求 必须具备才能正常工作

A。 快速处理器有助于网络流量分析

B。 他们必须是双亲

C。 类似的RAM要求

D。 快速网络接口卡

答:B 说明:

双主或双主可以指具有多个网络的以太网设备 接口,出于冗余目的,或在防火墙技术中,双主机是防火墙的一种 用于实现预防性安全的体系结构,如IDS/IPS系统

问题2

以下哪一项是需要主机应用程序进行复制的应用程序

A。 微型

B。 虫子

C。 特洛伊木马

D

病毒

答:D 说明:

计算机病毒感染其主机上的各种不同子系统。计算机病毒是一种病毒 恶意软件在执行时,通过复制自身或通过病毒感染其他程序进行复制 修改它们。感染计算机程序的还可能包括数据文件或计算机的引导扇区 硬盘。当复制成功时,受影响的区域被称为“受感染”

import re
inFile = open("input_new.txt",encoding='utf-8')
outFile = open("result.txt", "w")
buffer1 = ""
keepCurrentSet = True
for line in inFile:
   buffer1=buffer1+(line)

buffer1=re.findall(r"(?<=QUESTION NO:\s\d\s*) (.*?) (?=A\.)", buffer1)  
outFile.write("".join(buffer1))  
inFile.close()
outFile.close()

Tags: 文件csv文本网络内容处理器infileoutfile
1条回答
网友
1楼 · 发布于 2024-09-27 20:16:14

由于您拥有的问答文本文件是结构化文件,因此可以采用以下方法将单个问答数据转换为csv行

import re

text = """Which of the following is a hardware requirement that either an IDS/IPS system or a proxy server must have in order to properly function?

A. Fast processor to help with network traffic analysis

B. They must be dual-homed

C. Similar RAM requirements

D. Fast network interface cards

Answer: B Explanation:

Dual-homed or dual-homing can refer to either an Ethernet device that has more than one network interface, for redundancy purposes, or in firewall technology, dual-homed is one of the firewall architectures, such as an IDS/IPS system, for implementing preventive security."""

text1 = re.sub(r'(\n+A\.)|(\n+B\.)|(\n+C\.)|(\n+D\.)', ',', text)
text1 = re.sub(r'\n+Answer:|Explanation:\n+', ',', text1)

print(text1)

最终输出:

Which of the following is a hardware requirement that either an IDS/IPS system or a proxy server must have in order to properly functi
on?, Fast processor to help with network traffic analysis, They must be dual-homed, Similar RAM requirements, Fast network interface c
ards, B ,Dual-homed or dual-homing can refer to either an Ethernet device that has more than one network interface, for redundancy pur
poses, or in firewall technology, dual-homed is one of the firewall architectures, such as an IDS/IPS system, for implementing prevent
ive security.

正如您所看到的,text变量保存着一个问题-答案数据(如果您使用python从文件中读取该数据,就会得到换行符)

接下来,我所做的是

text1 = re.sub(r'(\n+A\.)|(\n+B\.)|(\n+C\.)|(\n+D\.)', ',', text)

例如\n+A\.匹配多项选择答案中类似“A.”的文本(\n+匹配多项选择选项前面的任何换行符)。使用上面的代码,我们将MCQ选项标记转换为,

在最后一步,我这样做

text1 = re.sub(r'\n+Answer:|Explanation:\n+', ',', text1)

它负责实际的答案和解释

您可以通过对文件中的每个问题循环上述逻辑来推断这一点,以获得所需的输出

编辑:

您还可以使用以下代码一次处理多个问题:

import re

text = """QUESTION NO: 1

Which of the following is a hardware requirement that either an IDS/IPS system or a proxy server must have in order to properly function?

A. Fast processor to help with network traffic analysis

B. They must be dual-homed

C. Similar RAM requirements

D. Fast network interface cards

Answer: B Explanation:

Dual-homed or dual-homing can refer to either an Ethernet device that has more than one network interface, for redundancy purposes, or in firewall technology, dual-homed is one of the firewall architectures, such as an IDS/IPS system, for implementing preventive security.

QUESTION NO: 2

Which of the following is an application that requires a host application for replication?

A. Micro

B. Worm

C. Trojan

D.

Virus

Answer: D Explanation:

Computer viruses infect a variety of different subsystems on their hosts. A computer virus is a malware that, when executed, replicates by reproducing itself or infecting other programs by modifying them. Infecting computer programs can include as well, data files, or the boot sector of the hard drive. When this replication succeeds, the affected areas are then said to be "infected"."""


text1 = re.sub(r'QUESTION NO: \d+', '\n', text)
text1 = re.sub(r'(\n+A\.)|(\n+B\.)|(\n+C\.)|(\n+D\.)', ',', text1)
text1 = re.sub(r'\n+Answer:|Explanation:\n+', ',', text1)
text1 = re.sub(r'\n+', '\n', text1)

print(text1)

这将打印以下内容:

Which of the following is a hardware requirement that either an IDS/IPS system or a proxy server must have in order to properly functi
on?, Fast processor to help with network traffic analysis, They must be dual-homed, Similar RAM requirements, Fast network interface c
ards, B ,Dual-homed or dual-homing can refer to either an Ethernet device that has more than one network interface, for redundancy pur
poses, or in firewall technology, dual-homed is one of the firewall architectures, such as an IDS/IPS system, for implementing prevent
ive security.                                                                                                                         
Which of the following is an application that requires a host application for replication?, Micro, Worm, Trojan,                      
Virus, D ,Computer viruses infect a variety of different subsystems on their hosts. A computer virus is a malware that, when executed,
 replicates by reproducing itself or infecting other programs by modifying them. Infecting computer programs can include as well, data
 files, or the boot sector of the hard drive. When this replication succeeds, the affected areas are then said to be "infected". 

相关问题 更多 >

    热门问题