这里是初学者! 我的守则的目的:
我遇到的问题是没有从df2中删除重复项,我使用['Scaffold ID']作为子集
**df3 sample**
Scaffold ID | Supervisor | Client Requester |BLK Location \
42324 Dave Ben 11A Block
42836 Dave Ben 11A Block
29601 Phil Paul FMT
43140 Ryan Jess 11A Block
45044 Ryan Leon 11b Block
... ... ... ...
3420 Alan Mike 31A Block
3421 Alan Mike 31B Block
3422 Alan Mike 35A Block
3461 Ryan Ben 11A Block
3463 Ryan Kev 11b Block
[3045 rows x 18 columns]
**df2 sample**
Scaffold ID | Supervisor | Client Requester |BLK Location \
42324 Dave Ben 11A Block
42836 Dave Ben 11A Block
29601 Phil Paul FMT
43140 Ryan Jess 11A Block
45044 Ryan Leon 11b Block
... ... ... ...
3420 Alan Mike 31A Block
3421 Alan Mike 31B Block
3464 Alex Joe 56A Block
3468 Paul Lucy 42A Block
3467 Tony Adam 13b Block
[1105 rows x 6 columns]
**Desired output**
Scaffold ID | Supervisor | Client Requester |BLK Location \
42324 Dave Ben 11A Block
42836 Dave Ben 11A Block
29601 Phil Paul FMT
43140 Ryan Jess 11A Block
45044 Ryan Leon 11b Block
... ... ... ...
3420 Alan Mike 31A Block
3421 Alan Mike 31B Block
3422 Alan Mike 35A Block
3461 Ryan Ben 11A Block
3463 Ryan Kev 11b Block
3464 Alex Joe 56A Block
3468 Paul Lucy 42A Block
3467 Tony Adam 13b Block
[3048 rows x 18 columns]
import pandas as pd
import datetime
# Task details rpt / re measure mapping rpt path's
df = pd.read_csv('/content/TaskDetails_rpt (3).csv')
df3 = pd.read_excel('/content/RE-Measure Mapping Report.xlsx', skiprows=1)
#print(list(df))
df2 = df[['CapeNo3', 'Supervisor', 'ClientRequester', 'Location2', 'LocationDescription', 'ErCompleteDate', 'Trade']]
# Filter trade col for Access
access_filt = df.Trade == 'Access'
is_access = pd.Series(access_filt)
df2 = df2[is_access]
# remove NaN from df2 ErCompleteDate col
df2 = df2[df2['ErCompleteDate'].notna()]
# Convert ErCompleteDate from object to datetime, Filter ErCompleteDate to last 90 days
df2['ErCompleteDate'] = pd.to_datetime(df2['ErCompleteDate'],format="%d/%m/%Y")
date_range_filt = df2.ErCompleteDate > datetime.datetime.now() - pd.to_timedelta("90day")
df2 = df2[date_range_filt]
# rename df2 col
df2.rename(columns={'CapeNo3': 'Scaffold ID',
'ClientRequester': 'Client Requester',
'Location2': 'BLK Location',
'LocationDescription': 'Location Description',
'ErCompleteDate': 'Erect date'}, inplace=True)
# Drop Trade from df2
df2.drop('Trade', axis=1, inplace=True)
# Concatenate df3 and df2 and drop duplicates from df2 (assign to df4)
df4 = pd.concat([df3,df2]).drop_duplicates(subset=['Scaffold ID'], keep='first')
df4
数据样本(我突出显示了df3结束和df2开始的位置)
提前感谢您的帮助
目前没有回答
相关问题 更多 >
编程相关推荐