Python如何优化比较数据帧的行？

new_org old_org asn cc 85736 pcizzi 85736 - Pcizzi S .a 23201 PY 001 001 Host 40244 US 001 001 IT Complex 55734 IN 001 hospedagem 001 Hospedagem Ltda 36351 US 001web action.us.001web.net 36351 US

matching_dic = [] tuples = [tuple(x) for x in df_compare.values] # tuples for i in range(len(tuples)): for j in range(i+1, len(tuples)): if tuples[i][1]!=tuples[j][1]: compare = str(tuples[i][0]) + '|' + str(tuples[j][0]) originals_asn = str(tuples[i][2]) + '|' + str(tuples[j][2]) originals_cc = str(tuples[i][3]) + '|' + str(tuples[j][3]) if tuples[i][0]==tuples[j][0]: if tuples[i][2]==tuples[j][2]: first_tag = 'match' matching_dic.append({'originals_asn':originals_asn,'originals_cc':originals_cc, 'compare': compare, 'first_tag': first_tag}) dftest = DataFrame(matching_dic)

1条回答

网友

1楼 · 发布于 2024-10-06 11:21:58

有点像黑客，但我认为它应该比你目前正在做的要快一点（找到答案的一种方法）。我复制了一行以更好地检查它是否正常工作（因此下面您可以看到第0行和第2行与第4行和第5行匹配）：

          new_org               old_org    asn  cc
0    85736 pcizzi   85736 - Pcizzi S .a  23201  PY
1             001              001 Host  40244  US
2      85736 blah       85736 - whatevs  23201  PY
3             001        001 IT Complex  55734  IN
4  001 hospedagem   001 Hospedagem Ltda  36351  US
5          001web  action.us.001web.net  36351  US

无论如何，如果这是一个有用的方法（而且足够快），它肯定可以进一步清理。你知道吗

org, asn, cc = [], [], []

for i in range(1,len(df)):
    df2 = df.iloc[i:].reset_index(drop=True)
    df3 = df.iloc[:-i]
    mask = df2.asn == df3.asn
        if any(mask):
        org.append( [ df2.new_org[mask].values[0], df3.new_org[mask].values[0] ] )
        asn.append(   df2.asn[mask].values[0] )
        cc.append(    df2.cc[mask].values[0] )

matches = pd.concat( [pd.DataFrame(org, columns=['org1','org2']),
                      pd.DataFrame(asn, columns=['asn']),
                      pd.DataFrame(cc,  columns=['cc']),], axis=1 )

matches

         org1            org2    asn  cc
0      001web  001 hospedagem  36351  US
1  85736 blah    85736 pcizzi  23201  PY

相关问题更多 >

编程相关推荐

热门问题

热门文章