如何使用pandas删除基于特定列的重复值？

# Python 3.5.2 # Pandas library version 0.22 import pandas as pd # Save the Excel workbook in a variable current_workbook = pd.ExcelFile('C:\\Users\\userX\\Desktop\\cost_values.xlsx') # convert the workbook to a data frame current_worksheet = pd.read_excel(current_workbook, index_col = 'vend_num') # current output print(current_worksheet) | vend_number | vend_name | quantity | source | | ----------- |----------------------- | -------- | -------- | CHARLS Charlie & Associates $5,700.00 Central CHARLS Charlie & Associates $5,700.00 South CHARLS Charlie & Associates $5,700.00 North CHARLS Charlie & Associates $5,700.00 West HUGHES Hughinos $3,800.00 Central HUGHES Hughinos $3,800.00 South FERNAS Fernanda Industries $3,500.00 South FERNAS Fernanda Industries $3,500.00 North FERNAS Fernanda Industries $3,000.00 West ....

2条回答

网友

1楼 · 编辑于 2024-09-30 23:29:12

有一种方法。在

df['CentralFlag'] = (df['source'] == 'Central')

df = df.sort_values('CentralFlag', ascending=False)\
       .drop_duplicates(['vend_name', 'quantity'])\
       .drop('CentralFlag', 1)

#   vend_number           vend_name   quantity   source
# 0      CHARLS  Charlie&Associates  $5,700.00  Central
# 4      HUGHES            Hughinos  $3,800.00  Central
# 6      FERNAS  FernandaIndustries  $3,500.00    South
# 8      FERNAS  FernandaIndustries  $3,000.00     West

说明

创建一个标志列，按此降序排序，这样就可以优先使用中心列。在
按vend_name和quantity排序，然后删除标志列。在

网友

2楼 · 编辑于 2024-09-30 23:29:12

你可以做两个步骤

s=df.loc[df['source']=='Central',:]
t=df.loc[~df['vend_number'].isin(s['vend_number']),:]

pd.concat([s,t.drop_duplicates(['vend_number','quantity'],keep='first')])

相关问题更多 >

编程相关推荐

热门问题

热门文章