<p>解决方案使用pandas在计算天数差异后合并两个数据帧,并任意定义间隔标签</p>
<pre><code># reproduce the test case
import pandas as pd
data_1 = {'ID': [1, 2, 3],
'Interval': ['annual', 'quarterly', 'semiannual']}
df1 = pd.DataFrame(data_1)
data_2 = {'ID': [1, 1, 1, 2, 2, 3, 3, 3],
'Start': ['AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21'],
'End': ['JAN-FY21', 'OCT-FY21', 'AUG-FY22', 'JAN-FY21', 'OCT-FY21', 'JAN-FY21', 'OCT-FY21', 'AUG-FY22']}
df2 = pd.DataFrame(data_2)
# compute the days interval based on start and stop
df2['Days_interval'] = (pd.to_datetime(df2.End.str.replace('-FY', ' 20')) - pd.to_datetime(df2.Start.str.replace('-FY', ' 20'))).abs().dt.days
df2['Interval'] = ''
# assign labels based on days interval
df2.loc[df2['Days_interval'] < 100, 'Interval'] = 'quarterly'
df2.loc[(df2['Days_interval'] >= 100) & (df2['Days_interval'] <= 300), 'Interval'] = 'semiannual'
df2.loc[df2['Days_interval'] > 300, 'Interval'] = 'annual'
# exclude helper columns
df2.drop('Days_interval', axis = 1, inplace = True)
# merge both dfs by ID and interval
output = pd.merge(df1, df2, how='inner', on = ['ID', 'Interval'])
# exclude helper columns from original df
df2.drop('Interval', axis = 1, inplace = True)
output
ID Interval Start End
0 1 annual AUG-FY21 AUG-FY22
1 2 quarterly AUG-FY21 OCT-FY21
2 3 semiannual AUG-FY21 JAN-FY21
</code></pre>