datafram中的条件替换

1条回答

网友

1楼 · 发布于 2024-10-03 09:12:49

提取大量用户的访问者ID的第一直觉是好的，但是一旦有了它们，就不需要遍历数据帧

以下是您的方法：

histdf = pd.DataFrame({'Visitor_ID':[1, 1, 2, 2, 2, 3], 
                   'content ': ["url" + str(x) for x in range(6)], 
                   'time':["timestamp n° " + str(x) for x in range(6)]}) 

# At first we consider that no user is a heavy user
histdf['heavy user'] = False

# Then we extract the ID's of heavy users
user_visits = histdf.Visitor_ID.value_counts()
id_heavy_users = user_visits[user_visits > 1].index

# Finally we consider those users as heavy users in the corresponding column
histdf.loc[histdf['Visitor_ID'].isin(id_heavy_users), 'heavy user'] = True

输出：

  Visitor_ID content             time  heavy user
0           1     url0  timestamp n° 0        True
1           1     url1  timestamp n° 1        True
2           2     url2  timestamp n° 2        True
3           2     url3  timestamp n° 3        True
4           2     url4  timestamp n° 4        True
5           3     url5  timestamp n° 5       False

如果您只想保留问题末尾提到的大量用户，您可以这样做，而无需创建第三列：

histdf = pd.DataFrame({'Visitor_ID':[1, 1, 2, 2, 2, 3], 
                   'content ': ["url" + str(x) for x in range(6)], 
                   'time':["timestamp n° " + str(x) for x in range(6)]}) 

user_visits = histdf.Visitor_ID.value_counts()
id_heavy_users = user_visits[user_visits > 1].index

heavy_users = histdf[histdf['Visitor_ID'].isin(id_heavy_users)]

In [1] : print(heavy_users)
Out[1] :    Visitor_ID content             time
0           1     url0  timestamp n° 0
1           1     url1  timestamp n° 1
2           2     url2  timestamp n° 2
3           2     url3  timestamp n° 3
4           2     url4  timestamp n° 4

相关问题更多 >

编程相关推荐

热门问题

热门文章

datafram中的条件替换

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >