Pandas数据帧如何看起来完全相同，但失败的是equals（）？

csv_text = """\ Title,Year,Director North by Northwest,1959,Alfred Hitchcock Notorious,1946,Alfred Hitchcock The Philadelphia Story,1940,George Cukor To Catch a Thief,1955,Alfred Hitchcock His Girl Friday,1940,Howard Hawks """ import pandas as pd df1 = pd.read_csv('sample.csv') df1.columns = map(str.lower, df1.columns) print(df1) df2 = df1.groupby(['director', df1.index]).first() df3 = df2.reset_index('director') df4 = df3[['title', 'year', 'director']] df5 = df4.sort_index() print(df5) print() print(repr(df1.columns)) print(repr(df5.columns)) print() print(df1.dtypes) print(df5.dtypes) print() print(df1 == df5) print() print(df1.index == df5.index) print() print(df1.equals(df5))

1条回答

网友

1楼 · 发布于 2024-09-30 22:16:43

这对我来说是一个错误，但可能只是我误解了什么。块按不同的顺序列出：

>>> df1._data
BlockManager
Items: Index(['title', 'year', 'director'], dtype='object')
Axis 1: Int64Index([0, 1, 2, 3, 4], dtype='int64')
IntBlock: slice(1, 2, 1), 1 x 5, dtype: int64
ObjectBlock: slice(0, 4, 2), 2 x 5, dtype: object
>>> df5._data
BlockManager
Items: Index(['title', 'year', 'director'], dtype='object')
Axis 1: Int64Index([0, 1, 2, 3, 4], dtype='int64')
ObjectBlock: slice(0, 4, 2), 2 x 5, dtype: object
IntBlock: slice(1, 2, 1), 1 x 5, dtype: int64

在core/internals.py中，我们有BlockManager方法

^{pr2}$

最后一个all假设self和other中的块对应。但是如果我们在它之前添加一些print调用，我们会看到：

>>> df1.equals(df5)
blocks self: (IntBlock: slice(1, 2, 1), 1 x 5, dtype: int64, ObjectBlock: slice(0, 4, 2), 2 x 5, dtype: object)
blocks other: (ObjectBlock: slice(0, 4, 2), 2 x 5, dtype: object, IntBlock: slice(1, 2, 1), 1 x 5, dtype: int64)
False

所以我们在比较错误的东西。我不确定这是否是一个bug是因为我不确定equals是否意味着如此挑剔。如果是这样的话，我认为至少有一个doc错误，因为equals应该大声说它并不是用于您可能认为的名称和docstring中的内容。在

相关问题更多 >

编程相关推荐

热门问题

热门文章