将python文本文件转换为行和列

2024-09-27 09:32:00 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我已经尝试了一段时间,似乎遇到了一个障碍,希望得到帮助

我有几个文本文件。没有写出来,这里有一个例子:

2020
Grum Grum
Stamina: 20
Agility: 23
Strength: 20.5%
Resistances: 20-21-30

2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20

等等等等。有些是这样的,每6行它会启动一个新的统计文件,有些文本文件有它,所以每10行有一个新的统计表

我的目标是每次统计表结束时,将其放入行和列中。我想用电子表格的术语来说,这叫做转置,但我知道我做错了什么。或者即使这是正确的说法

作为一个例子,我希望文件在我完成后看起来像这样

Year | Name | Stamina | Agility | Str | Res
2020 | Grum Grum | Stamina: 20 | Agility: 23 | Strength: 20.5% | Resistances: 20-21-30

我试过Numpy、Pandas和idk我做错了什么,真的不知道该搜索什么来找到正确的答案

如果我能得到任何帮助,我将不胜感激,这些文件非常大,我希望能够指定我需要填写统计表的列数

如果您能提供帮助,请提前向您表示感谢


Tags: 文件目标strength例子电子表格mondo行和列障碍
3条回答

您可以尝试以下操作以获得所需的数据帧:

with open(r'test1.txt','r') as file:
    data=file.read().split('\n\n')
data=[i.split('\n') for i in data]
df=pd.DataFrame(data,columns=['Year','Name','Stamina','Agility','Str','Res'])

print(df)

输出:

   Year        Name  ...              Str                    Res
0  2020   Grum Grum  ...  Strength: 20.5%  Resistances: 20-21-30
1  2020  Mondo Silo  ...  Strength: 10.5%  Resistances: 20-21-20
2  2020   Grum Grum  ...  Strength: 20.5%  Resistances: 20-21-30
3  2020  Mondo Silo  ...  Strength: 10.5%  Resistances: 20-21-20

要写入具有不同行数和相同结构的.txt文件列表的数据帧,您可以尝试:

选择1

import pandas as pd

files=['test1.txt','test2.txt']                     #list of files

df=pd.DataFrame(columns=['Year','Name','Stamina','Agility','Str','Res'])  #create the dataframe

for file in files:                                  #we open each file
    with open(r'path_of_files'+file,'r') as file_r:   
        data=file_r.read().strip().split('\n\n')
        data=[i.split('\n') for i in data if i!=''] #get the rows
        print(data)
        s = pd.DataFrame(data, columns=df.columns)  
        df =pd.concat([df, s], ignore_index=True)   #we append the new rows to the dataframe
        
        
print(df)
df.to_csv(r'test3.txt', sep='|', index=False)       #write the final dataframe to the output file('test3.txt'), with '|' as separator 

选择2

import pandas as pd

files=['test1.txt','test2.txt']                      #list of files

for file in files:                                   #we open each file
    with open(r'path_of_files'+file,'r') as file_r, open(r'test3.txt', 'a') as fout:
        data=file_r.read().strip().split('\n\n')
        data=[i.split('\n') for i in data if i!='']
        df=pd.DataFrame(data,columns=['Year','Name','Stamina','Agility','Str','Res'])   #create a dataframe with the data of the current file
        if files.index(file)==0:
            fout.write(df.to_string( index = False)) #we let header=true to the first iteration to write the columns, and also write the data
        else:
            fout.write(df.to_string(header = False, index = False))  #we write the dataframe without the index and the columns names
        fout.write('\n')                             #a newline to place correctly the next rows

示例
对于下面的一些伪文件(test1.txt,test2.txt),您可以通过两个选项看到结果(test3.txt):

test1.txt

2020
Grum Grum
Stamina: 20
Agility: 23
Strength: 20.5%
Resistances: 20-21-30

2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20

test2.txt

2020
Grum Grum
Stamina: 20
Agility: 23
Strength: 20.5%
Resistances: 20-21-30

2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20

2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20

2020
Mondo Silo
Stamina: 23
Agility: 13
Strength: 10.5%
Resistances: 20-21-20

带有选项1的test3.txt(输出文件)

Year|Name|Stamina|Agility|Str|Res
2020|Grum Grum|Stamina: 20|Agility: 23|Strength: 20.5%|Resistances: 20-21-30
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
2020|Grum Grum|Stamina: 20|Agility: 23|Strength: 20.5%|Resistances: 20-21-30
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20
2020|Mondo Silo|Stamina: 23|Agility: 13|Strength: 10.5%|Resistances: 20-21-20

带有选项2的test3.txt(输出文件)

 Year        Name      Stamina      Agility              Str                    Res
 2020   Grum Grum  Stamina: 20  Agility: 23  Strength: 20.5%  Resistances: 20-21-30
 2020  Mondo Silo  Stamina: 23  Agility: 13  Strength: 10.5%  Resistances: 20-21-20
 2020   Grum Grum  Stamina: 20  Agility: 23  Strength: 20.5%  Resistances: 20-21-30
 2020  Mondo Silo  Stamina: 23  Agility: 13  Strength: 10.5%  Resistances: 20-21-20
 2020  Mondo Silo  Stamina: 23  Agility: 13  Strength: 10.5%  Resistances: 20-21-20
 2020  Mondo Silo  Stamina: 23  Agility: 13  Strength: 10.5%  Resistances: 20-21-20

您可以逐行读取文件,将每一行添加到输出行,并在遇到空行时写入该输出行,然后需要最后写入一次,以防文件末尾没有最后的空行。 我编写了一个小程序,它将您的输入作为test.txt并将其写入test_out.txt

2020 | Grum Grum | Stamina: 20 | Agility: 23 | Strength: 20.5% | Resistances: 20-21-30
2020 | Mondo Silo | Stamina: 23 | Agility: 13 | Strength: 10.5% | Resistances: 20-21-20

代码如下:

with open("test.txt", "r") as infile:
    with open("test_out.txt", "w") as outfile:
        columns = ""
        for line in infile:
            line = line.replace("\n", "") # remove newline from end of line
            print(line)
            if line == "" and len(columns) > 0: # if the line is a blank line, and we have columns to write, split into a new row
                outfile.write(columns + "\n")
                columns = "" # reset row
            else:
                if len(columns) > 0: # Put a seperator before every column except for the first
                    columns += " | "
                columns += line
        if len(columns) > 0: # write final row
            outfile.write(columns + "\n")
  • 此选项在将数据加载到数据帧之前修复数据格式。
    • 这将以标准表格格式显示数据作为一个选项,因为已经有其他很好的答案可以将数据转换为请求的格式。
      • 每列顶部的标题和标题下方每行的数据
    • 从信息存储和检索的角度来看,这是表示和存储数据的标准方式
    • 以标准方式存储数据使检索和使用其他工具可视化数据变得更容易
  • [0::6]:列表切片,从0开始获取列表中的每6个值
  • [1::6]:列表切片,从1开始获取列表中的每6个值
  • 使用^{}获取列表元素并将其转换为字典
  • 使用sep=','sep='|'将数据帧保存到csv
  • df = pd.read_csv('characters.csv', sep='|')读回文件
import pandas as pd
from collections import defaultdict as dd

# read the file
with open('test.txt', 'r') as f:
    # read the text in; results in a list of strings
    text_list = [r.strip() for r in f.readlines() if r.strip()]  # remove all new lines and empty rows

# add Year: in front of each year number
years = text_list[0::6]  # create a list of each year
text_list[0::6] = [f'Year: {f}' for f in years]

# add Name: in front of each name
names = text_list[1::6]  # create a list of each name
text_list[1::6] = [f'Name: {f}' for f in names]

# split each string at ': '
text_list = [x.split(': ') for x in text_list]

# create a dict for each value
data = dd(list)
for text in text_list:
    data[text[0]].append(text[1])

# load data into a dataframe
df = pd.DataFrame(data)

# display df
   Year        Name Stamina Agility Strength Resistances
0  2020   Grum Grum      20      23    20.5%    20-21-30
1  2020  Mondo Silo      23      13    10.5%    20-21-20

# save
df.to_csv('characters.csv', sep='|', index=False)

# file output
year|name|Stamina|Agility|Strength|Resistances
2020|Grum Grum|20|23|20.5%|20-21-30
2020|Mondo Silo|23|13|10.5%|20-21-20

相关问题 更多 >

    热门问题