Python:使用表单将大型文本文件转换为数据帧

2024-10-17 08:22:09 发布

您现在位置:Python中文网/ 问答频道 /正文

制作一个网页刮板来制作列表,例如spotify的播放列表信息、Really的工作描述或Linked In的公司列表。我现在有大的文本文件,我想通过转换成csv或字典来格式化为数据帧。你知道吗

文本文件:

Scribd
MobileQAEngineer




VitaminT
MobileQAEngineer




Welocalize
MobileQAEngineer




RWSMoravia
MobileQAEngineer



期望输出:

Scribd,MobileQAEngineer
VitaminT,MobileQAEngineer
Welocalize,MobileQAEngineer
RWSMoravia,MobileQAEngineer

我想我可以试试这样的东西:

if line of text does not have 4 \n afterwards
    then it is the 1st tuple
if line of text has 4 \n afterwards
    then it is the 2st tuple
with open(input("Enter a file to read: "),'r') as f:
    for line in f:
        newline = line + ":"
        #f.write(newline)
        print(newline)

当我试图在行尾放一个“:”时,我在行前和行后都放了一个:

:
Scribd
:
MobileQAEngineer
:


:
VitaminT
:
MobileQAEngineer
:


:
Welocalize
:
MobileQAEngineer
:


:
RWSMoravia
:
MobileQAEngineer
:

Tags: oftext列表iflinenewlineit文本文件
1条回答
网友
1楼 · 发布于 2024-10-17 08:22:09

您可以使用regex解析数据,然后将其转换为DataFrame

import re
import pandas as pd

with open('data.txt', 'r') as f:
    data = f.read()

m = re.findall('(\w+)\n(\w+)', data)
d = {'Company': [c[0] for c in m], 'Position': [c[1] for c in m]}
df = pd.DataFrame(data=d)

输出:

      Company          Position
0      Scribd  MobileQAEngineer
1    VitaminT  MobileQAEngineer
2  Welocalize  MobileQAEngineer
3  RWSMoravia  MobileQAEngineer

相关问题 更多 >