使用python将simlar产品标记在一起

df = pd.DataFrame({'uniqueid': {0: 'a', 1: 'b', 2: 'b', 3: 'c', 4: 'd', 5: 'd', 6: 'e', 7: 'e',8:'g',9:'g',10:'h',11:'l',12:'m'}, 'uniqueid2': {0: 'a', 1: 'b', 2: 'b', 3: 'c', 4: 'd', 5: 'd', 6: 'e', 7: 'e',8:'g',9:'g',10:'h',11:'l',12:'l'}, 'uniqueid3': {0: 'z', 1: 'y', 2: 'x', 3: 'y', 4: 'x', 5: 'v', 6: 'x', 7: 'u',8:'h',9:'i',10:'k',11:'k',12:'n'}})

1条回答

网友

1楼 · 发布于 2024-09-30 18:34:31

因此，您希望构建一个dict并具有两个嵌套循环，每行，然后是每个键：set_值

# build a dictionary that contains the new keys and the unique values it refers to
# initialize with the first row
# and use numbers for keys, so we can +=1 later on
newkeys = {1: set(df.iloc[0].values)}
key_col = []
nextkey = 2

# loop df rows without the index
for row in df.itertuples(index=False):
    # and get unique row values
    rowset = set(row)

    # see if the row can be tagged with an existing newkey
    for key, values in newkeys.items():
        # if there is a value that appears in a previous row then the intersection will not be empty
       if rowset & values:
            # exit the for loop and skip the else clause
            # current newkey will be selected for the row
            break

    else:
        # for loop exhausted without breaking
        # none of  rowset values appear in any previous key
        # then create a new key
        key = nextkey
        nextkey += 1

    # add values to the newkey and tag row
    newkeys[key].update(rowset)
    key_col.append(key)

# save to df
df['new_key'] = key_col

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用python将simlar产品标记在一起

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >