从被丢弃的条目创建备份数据帧

2024-10-01 02:27:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个熊猫数据帧,ds。我想从一个名为'Name'的特定列中删除重复条目。你知道吗

+---------+------+-------+----------+--------+
| Invoice | Name | Price |   Date   | Coupon |
+---------+------+-------+----------+--------+
|  123412 | Jim  |    50 | 12/01/17 | ALBB1  |
|  431311 | Jane |    25 | 12/02/17 | BB2    |
|  134123 | Joe  |    70 | 12/03/17 | BB2    |
|  333131 | Jim  |    85 | 12/04/17 | ALBB1  |
+---------+------+-------+----------+--------+

这是我的密码:

ds = ds.drop_duplicates(subset='Name', keep='first')

我使用keep='first'选项来保留在dataframe中找到的第一个实例。你知道吗

我想做的是从所有丢弃的条目中创建一个单独的数据帧。你知道吗

所以,在这个例子中。第二个数据帧ds2等于:

+---------+------+-------+----------+--------+
| Invoice | Name | Price |   Date   | Coupon |
+---------+------+-------+----------+--------+
|  333131 | Jim  |    85 | 01/04/18 | ALBB1  |
+---------+------+-------+----------+--------+

Tags: 数据namedateds条目invoicepricefirst
1条回答
网友
1楼 · 发布于 2024-10-01 02:27:54

对布尔掩码使用^{},并按^{}过滤。你知道吗

注意:keep='first'应该省略,因为默认值

df1 = df[df.duplicated(subset='Name')]
print (df1)
   Invoice Name  Price      Date Coupon
3   333131  Jim     85  12/04/17  ALBB1

此布尔掩码可用于生成DataFrame,而~用于反转布尔掩码:

m = df.duplicated(subset='Name')
df1 = df[m]
print (df1)
   Invoice Name  Price      Date Coupon
3   333131  Jim     85  12/04/17  ALBB1

df1 = df[~m]
print (df1)

   Invoice  Name  Price      Date Coupon
0   123412   Jim     50  12/01/17  ALBB1
1   431311  Jane     25  12/02/17    BB2
2   134123   Joe     70  12/03/17    BB2

细节:

print (m)
0    False
1    False
2    False
3     True
dtype: bool

print (~m)

0     True
1     True
2     True
3    False
dtype: bool

编辑:

也可以使用keep='last'来提取没有最后一个的所有重复,或者使用keep=False来提取所有重复值:

print (df)
   Invoice  Name  Price      Date Coupon
0   123412   Jim     50  12/01/17  ALBB1
1   431311  Jane     25  12/02/17    BB2
2   134123   Joe     70  12/03/17    BB2
3   333131   Jim     85  12/04/17  ALBB1
4   333131   Jim     86  12/04/17  ALBB2 <- added new dupe row

m = df.duplicated(subset='Name')
df11 = df[m]
print (df11)
   Invoice Name  Price      Date Coupon
3   333131  Jim     85  12/04/17  ALBB1
4   333131  Jim     86  12/04/17  ALBB2

m = df.duplicated(subset='Name', keep='last')
df12 = df[m]
print (df12)
   Invoice Name  Price      Date Coupon
0   123412  Jim     50  12/01/17  ALBB1
3   333131  Jim     85  12/04/17  ALBB1

m = df.duplicated(subset='Name', keep=False)
df13 = df[m]
print (df13)
  Invoice Name  Price      Date Coupon
0   123412  Jim     50  12/01/17  ALBB1
3   333131  Jim     85  12/04/17  ALBB1
4   333131  Jim     86  12/04/17  ALBB2

相关问题 更多 >