如果某行的列值与其他行的其他列值匹配,则获取该行的列值

2024-10-03 04:24:53 发布

您现在位置:Python中文网/ 问答频道 /正文

Get column value if it matches another column value in the same table相同,但使用Python/pandas比使用SQL更有效,因为查询运行时间太长

我有一个df:

Id   | replyId | commentID_parentID | usernameChannelId | channelId
1    | NULL    | NULL               | a                 | g
2    | NULL    | NULL               | b                 | k
NULL | 1.k     | 1                  | a                 | p
NULL | 1.p     | 1                  | c                 | i
3    | NULL    | NULL               | d                 | h
NULL | 2.k     | 2                  | g                 | g

还有一个带有如下通道的表:

我想知道哪个用户(userChannelId)回复了哪个用户

因此,我在一行中添加一条注释,并检查是否:

Id == NULL? Then it's a reply -> get userChannelId where commentID_parentID == Id
Id != NULL? Then it's a main comment -> userChannelId replied to channelId

结果应该是:

userChannelId_Source | userChannelId_Target
a                    | g
b                    | k
a                    | a
c                    | a
g                    | b

注释“d”没有commentID\u parentID==Id的条目,因此将其忽略

到目前为止,我的代码是:

cm["usernameChannelId_reply"] = None

for row in cm.itertuples():
    if cm.commentID_parentID is None: # comment is a main comment
        cm.at[row.Index, 'usernameChannelId_reply'] = cm.channelId
    else: # comment is a reply comment
        temp = cm.loc[cm.Id == row.commentID_parentID]["usernameChannelId"][0]
        #temp = cm.query("Id == commentID_parentID").head(1).loc[:, 'usernameChannelId']
        print(temp)
        if len(set(temp)) == 0:
            print(0, row.Index)
            #cm.at[row.Index, 'usernameChannelId_reply'] = temp
        else:
            cm.at[row.Index, 'usernameChannelId_reply'] = temp

但是我有一个

KeyError: 0

删除[0]个打印,例如:

997 UCOYb6iKhuCHKDwvd_iBnIBw Name: usernameChannelId, dtype: object


Tags: idindexifcommentcmitreplynull
1条回答
网友
1楼 · 发布于 2024-10-03 04:24:53

IIUC,您希望将commentID_parentID中的值与与同一Id关联的usernameChannelId的值映射。您可以尝试:

#create the mapper
s_map = df.loc[df.Id.ne('NULL'), :].set_index(['Id'])['usernameChannelId']

# create the column by mapping the values where comment_parentID is not NULL, otherwise channelID
df['userChannelId_Target'] = np.where( df['commentID_parentID'].ne('NULL'), 
                                       df['commentID_parentID'].map(s_map), df['channelId'])

# see result
print (df[['usernameChannelId', 'userChannelId_Target' ]])
  usernameChannelId userChannelId_Target
0                 a                    g
1                 b                    k
2                 a                    a
3                 c                    a
4                 d                    h
5                 g                    b

相关问题 更多 >