沿DataFrame中的行处理重复项，并删除Python中除最后一行之外的所有行 - 问答 - Python中文网

沿DataFrame中的行处理重复项，并删除Python中除最后一行之外的所有行

2024-07-06 23:54:58 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我被困在熊猫数据清理工作中，非常痛苦。我举了一个简单的例子来说明我的问题。对于每一行，我想删除重复项并保留最后一行。目前，我的数据框是“动物”。我希望它是数据框“动物清洁”

想象一下这个数据帧。您可以看到沿轴=0的重复项，例如，“cat”在第0行中重复

list_of_animals = [['cat','dog','monkey','sparrow', 'cat'],['cow', 'eagle','rat', 'eagle', 'owl'],['deer', 'horse', 'goat', 'falcon', 'falcon']]
animals = pd.DataFrame(list_of_animals)

外观：

这就是我想要的结果。您可以看到每行中的重复项都标记为“X”，保留最后一行

list_of_animals_clean = [['X','dog','monkey','sparrow', 'cat'],['cow', 'X','rat', 'eagle', 'owl'], ['deer', 'horse', 'goat', 'X', 'falcon']]
animals_clean = pd.DataFrame(list_of_animals_clean)

应该是这样的：

Tags： of 数据 clean owl list cat monkey eagle

1条回答

网友

1楼 · 发布于 2024-07-06 23:54:58

使用keep='last'尝试apply+mask+duplicated：

import pandas as pd

list_of_animals = [['cat', 'dog', 'monkey', 'sparrow', 'cat'],
                   ['cow', 'eagle', 'rat', 'eagle', 'owl'],
                   ['deer', 'horse', 'goat', 'falcon', 'falcon']]
animals = pd.DataFrame(list_of_animals)

animals = animals.apply(
    lambda s: s.mask(s.duplicated(keep='last'), 'x'),
    axis=1
)

print(animals)

输出：

      0      1       2        3       4
0     x    dog  monkey  sparrow     cat
1   cow      x     rat    eagle     owl
2  deer  horse    goat        x  falcon

相关问题更多 >

编程相关推荐

热门问题

热门文章