为什么这两组运算之间没有顺序不变性？

# Operation set 1: dropping duplicates, sorting and reindexing the table jdf1.drop_duplicates(subset=dateColName, inplace=True) jdf1.sort_values(dateColName, inplace=True) jdf1.reset_index(drop=True, inplace=True) # Operatrion set 2: converting column type and filtering the rows in case of CSV's contents are covering a wider interval jdf1[dateColName] = pd.to_datetime(jdf1[jdf1.columns[0]], format="%Y-%m-%d") maskL = jdf1[dateColName] < interval[0] maskR = jdf1[dateColName] > interval[1] mask = maskL | maskR jdf1.drop(jdf1[mask].index, inplace=True)

# Operatrion set 2: converting column type and filtering the rows in case of CSV's contents are covering a wider interval jdf2[dateColName] = pd.to_datetime(jdf2[jdf2.columns[0]], format="%Y-%m-%d") maskL = jdf2[dateColName] < interval[0] maskR = jdf2[dateColName] > interval[1] mask = maskL | maskR jdf2.drop(jdf2[mask].index, inplace=True) # Operation set 1: dropping duplicates, sorting and reindexing the table jdf2.drop_duplicates(subset=dateColName, inplace=True) jdf2.sort_values(dateColName, inplace=True) jdf2.reset_index(drop=True, inplace=True)

1条回答

网友

1楼 · 发布于 2024-09-30 01:24:02

乍一看是一样的，但不是

因为有两种不同的过滤方式可以相互影响：

drop_duplicates() -> remove M rows, together ALL rows - M
boolean indexing with mask -> remove N rows, together ALL - M - N

你知道吗

boolean indexing with mask -> remove K rows, together ALL rows - K
drop_duplicates() -> remove L rows, together ALL - K - L

K != M
L != N

如果交换这个操作，结果应该是不同的，因为两者都会删除行。调用它们的顺序很重要，因为有些行只删除drop\u重复项，有些行只删除布尔索引

在我看来这两种方法都是对的，这要看需要什么

相关问题更多 >

编程相关推荐

热门问题

热门文章