模糊连接多个条件

网友

1楼 · 编辑于 2024-09-29 21:20:29

创建DF组合A&B：

A = {'index':range(1,11),'A':[300,0,400,0,0,0,0,0,100,0]}
B = {'index':range(1,11),'B':[102,103,94,120,145,114,126,117,107,87]}
df_A = pd.DataFrame(data=A)
df_B = pd.DataFrame(data=B)
df_com = pd.concat([df_A,df_B],axis=1).drop('index',axis=1)

创建索引：

indexA = list(df_com.A[df_com.A.ne(0)].index + 1)
indexB = np.array(indexA) - 2
indexB = np.append(indexB[1:],(len(df_com)-1))

将0替换为A列中的ffill（）：

df_com['A'] = df_com.A.replace(0,method='pad')

groupby和add索引列：

df_new =df_com.groupby("A",sort=False).apply(lambda x:x.B.shift(1).sum()).reset_index()
df_new['indexA'] = indexA
df_new['indexB'] = indexB
df_new

网友

2楼 · 编辑于 2024-09-29 21:20:29

这里有一个非常详细的解决方案，我希望它可以推广到您的完整数据。我相信你可以简化它。你知道吗

C = A.join(B.set_index('index'), on='index')

C['A_filled'] = C['A'].replace(to_replace=0, method='ffill')
C['cumul_load'] = C['A'].cumsum()
C['load_number'] = C.groupby('cumul_load').ngroup() + 1
C['B_accum'] = C.groupby('load_number')['B'].cumsum()
C['A_fully_crushed'] = C['B_accum'] > C['A_filled']
C['first_index_fully_crushed'] = C.groupby('load_number')['A_fully_crushed'].cumsum() == 1

indexA_ = C['index'][C['A'] > 0].tolist()
A_ = C['A'][C['A'] > 0].tolist()
indexB_ = C['index'][C['first_index_fully_crushed'] == True].tolist()
B_accumulate_ = C['B_accum'][C['first_index_fully_crushed'] == True].tolist()
result = pd.DataFrame({'indexA': indexA_, 'A': A_, 'indexB': indexB_, 'B_accumulate': B_accumulate_})

这就产生了

   indexA    A  indexB  B_accumulate
0       1  300       4           419
1       6  400       9           464

网友

3楼 · 编辑于 2024-09-29 21:20:29

我简化了结构，用Series代替DataFrame，索引从零开始。应用cumsum（）和searchsorted（）。你知道吗

Load = pd.Series([300,0,0,400,50,0,0,0,150,0])  # aka 'A'
Rate = pd.Series([102,103,94,120,145,114,126,117,107,100])  # aka 'B'

# Storage for the result:
H=[]    # [ (indexLoad, Load, indexRate, excess) ... ]

# Find the 1st non 0 load:
load1_idx= len(Load)

for lix in range(len(Load)):
    a= Load[lix]
    if a!=0:
        csumser= Rate.cumsum()
        rix= csumser.searchsorted(a)
        excess= csumser[rix]-a
        H.append( (lix,a,rix,excess) )
        load1_idx=lix
        break

# Processing
for lix in range(load1_idx+1,len(Load)):

    a=Load[lix]
    if a==0:
        continue

    last_rix= H[-1][-2]
    csumser[last_rix:]= Rate[last_rix:]
    if lix==last_rix:
        csumser[lix]= H[-1][-1] # excess

    csumser[last_rix:]= csumser[last_rix:].cumsum()

    rix= csumser[last_rix:].searchsorted(a)
    rix+= last_rix
    excess= csumser[rix]-a
    H.append( (lix,a,rix,excess) )       

df= pd.DataFrame(H, columns=["indexLoad","Load","indexRate","rate_excess"])
print(df)

   indexLoad  Load  indexRate  rate_excess
0          0   300          3          119
1          3   400          6          104
2          4    50          6           76
3          8   150          7           93

相关问题更多 >

编程相关推荐

热门问题

热门文章

模糊连接多个条件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >