使用Pandas导入每行具有不同列数的csv

2024-10-04 09:23:36 发布

您现在位置:Python中文网/ 问答频道 /正文

使用Pandas或CSV模块将每行具有不同列数的CSV导入Pandas数据框的最佳方法是什么。

"H","BBB","D","Ajxxx Dxxxs"
"R","1","QH","DTR"," "," ","spxxt rixxls, raxxxd","1"

使用此代码:

import pandas as pd
data = pd.read_csv("smallsample.txt",header = None)

生成以下错误

Error tokenizing data. C error: Expected 4 fields in line 2, saw 8

Tags: 模块csv数据方法pandasdatapdbbb
3条回答

在read_csv()中提供列名列表应该可以做到这一点。

例如:名称=['a','b','c','d','e']

https://github.com/pydata/pandas/issues/2981

编辑:如果不想提供列名,请按照尼古拉斯的建议执行

我们甚至可以使用pd.read_table()方法来读取csv文件,它将csv文件转换为单个列的DataFrame类型,这些列可以被“,”读取和拆分

可以将列名动态生成为简单计数器(0、1、2等)。

动态生成列名

# Input
data_file = "smallsample.txt"

# Delimiter
data_file_delimiter = ','

# The max column count a line in the file could have
largest_column_count = 0

# Loop the data lines
with open(data_file, 'r') as temp_f:
    # Read the lines
    lines = temp_f.readlines()

    for l in lines:
        # Count the column count for the current line
        column_count = len(l.split(data_file_delimiter)) + 1

        # Set the new most column count
        largest_column_count = column_count if largest_column_count < column_count else largest_column_count

# Close file
temp_f.close()

# Generate column names (will be 0, 1, 2, ..., largest_column_count - 1)
column_names = [i for i in range(0, largest_column_count)]

# Read csv
df = pandas.read_csv(data_file, header=None, delimiter=data_file_delimiter, names=column_names)
# print(df)

Missing values将被分配给CSV行没有值的列。

相关问题 更多 >