删除重复的问题

2024-05-19 15:20:21 发布

您现在位置:Python中文网/ 问答频道 /正文

这里是初学者! 我的守则的目的:

  • 导入csv和excel文件(无问题)
  • 格式、筛选器将条件应用于两者(无问题)
  • 连接两个df(无问题)
  • 从上次df中删除重复项(问题)

我遇到的问题是没有从df2中删除重复项,我使用['Scaffold ID']作为子集

**df3 sample**

  

   

 

     Scaffold ID | Supervisor | Client Requester |BLK Location  \
                 42324   Dave            Ben            11A Block   
                 42836   Dave            Ben            11A Block   
                 29601   Phil            Paul           FMT   
                 43140   Ryan            Jess           11A Block   
                 45044   Ryan            Leon           11b Block                                          
                 ...           ...              ...          ...   
                 3420    Alan            Mike           31A Block   
                 3421    Alan            Mike           31B Block   
                 3422    Alan            Mike           35A Block   
                 3461    Ryan            Ben            11A Block   
                 3463    Ryan            Kev            11b Block 
    
         [3045 rows x 18 columns]
    
    **df2 sample**
    
     
    
        Scaffold ID | Supervisor | Client Requester |BLK Location  \
                 42324   Dave            Ben            11A Block   
                 42836   Dave            Ben            11A Block   
                 29601   Phil            Paul           FMT   
                 43140   Ryan            Jess           11A Block   
                 45044   Ryan            Leon           11b Block                                          
                 ...           ...              ...          ...   
                 3420    Alan            Mike           31A Block   
                 3421    Alan            Mike           31B Block   
                 3464    Alex            Joe            56A Block   
                 3468    Paul            Lucy           42A Block   
                 3467    Tony            Adam           13b Block   
    
        [1105 rows x 6 columns]
    
         
    **Desired output**
    
    Scaffold ID | Supervisor | Client Requester |BLK Location  \
                 42324   Dave            Ben            11A Block   
                 42836   Dave            Ben            11A Block   
                 29601   Phil            Paul           FMT   
                 43140   Ryan            Jess           11A Block   
                 45044   Ryan            Leon           11b Block                                          
                 ...           ...              ...          ...   
                 3420    Alan            Mike           31A Block   
                 3421    Alan            Mike           31B Block   
                 3422    Alan            Mike           35A Block   
                 3461    Ryan            Ben            11A Block   
                 3463    Ryan            Kev            11b Block
                 3464    Alex            Joe            56A Block   
                 3468    Paul            Lucy           42A Block   
                 3467    Tony            Adam           13b Block    
    
       [3048 rows x 18 columns]



import pandas as pd
import datetime

# Task details rpt / re measure mapping rpt  path's 
df = pd.read_csv('/content/TaskDetails_rpt (3).csv')
df3 = pd.read_excel('/content/RE-Measure Mapping Report.xlsx', skiprows=1)
#print(list(df))
df2 = df[['CapeNo3', 'Supervisor', 'ClientRequester', 'Location2', 'LocationDescription', 'ErCompleteDate', 'Trade']]
# Filter trade col for Access
access_filt = df.Trade == 'Access'
is_access = pd.Series(access_filt)
df2 = df2[is_access]

# remove NaN from df2 ErCompleteDate col
df2 = df2[df2['ErCompleteDate'].notna()]

# Convert ErCompleteDate from object to datetime, Filter ErCompleteDate to last 90 days 
df2['ErCompleteDate'] = pd.to_datetime(df2['ErCompleteDate'],format="%d/%m/%Y")
date_range_filt = df2.ErCompleteDate > datetime.datetime.now() - pd.to_timedelta("90day")  
df2 = df2[date_range_filt]

# rename df2 col
df2.rename(columns={'CapeNo3': 'Scaffold ID', 
                    'ClientRequester': 'Client Requester',
                    'Location2': 'BLK Location',
                    'LocationDescription': 'Location Description',
                    'ErCompleteDate': 'Erect date'}, inplace=True)
# Drop Trade from df2 
df2.drop('Trade', axis=1, inplace=True)

# Concatenate df3 and df2 and drop duplicates from df2 (assign to df4)
df4 = pd.concat([df3,df2]).drop_duplicates(subset=['Scaffold ID'], keep='first')
df4

数据样本(我突出显示了df3结束和df2开始的位置)

enter image description here

提前感谢您的帮助


Tags: iddflocationblockpdmikescaffoldben