是否将CSV文件的注释行保留在Pandas中?

2024-10-05 14:22:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我刚刚开始深入研究熊猫的世界,我发现的第一个奇怪的CSV文件是一个开头有两行注释(不同列宽)的文件

sometext, sometext2
moretext, moretext1, moretext2
*header*
actual data ---
---------------

我知道如何使用skiprowsheader=跳过这些行,但是,在使用read_csv时如何保留这些注释呢?有时注释作为文件元信息是必要的,我不想把它们扔掉


Tags: 文件csvreaddata世界header深入研究actual
2条回答

您可以先读取元数据,然后使用read_csv

with open('f.csv') as file:
    #read first 2 rows to metadata
    header = [file.readline() for x in range(2)]
    meta = [value.strip().split(',') for value in header]
    print (meta)
    [['sometext', ' sometext2'], ['moretext', ' moretext1', ' moretext2']]

    df = pd.read_csv(file)
    print (df)

          *header*
    0  actual data

Pandas设计用于读取结构化数据

对于非结构化数据,只需使用内置的^{}

with open('file.csv') as f:
    reader = csv.reader(f)
    row1 = next(reader)  # gets the first line
    row2 = next(reader)  # gets the second line

您可以像这样将字符串附加到数据帧:

df.comments = 'My Comments'

But note

Note, however, that while you can attach attributes to a DataFrame, operations performed on the DataFrame (such as groupby, pivot, join or loc to name just a few) may return a new DataFrame without the metadata attached. Pandas does not yet have a robust method of propagating metadata attached to DataFrames.

相关问题 更多 >