Python，dataframes：如何为Python dataframe中键重复的每个值生成dictionary/series/dataframe？

+---------+--------+-----+-------------+ | VideoID | long | lat | viewerCount | +---------+--------+-----+-------------+ | 123 | -1.1 | 1.1 | 25 | +---------+--------+-----+-------------+ | 123 | -1.1 | 1.1 | 20 | +---------+--------+-----+-------------+

1条回答

网友

1楼 · 发布于 2024-09-27 22:38:13

所以，在我回答你的问题之前，我有一个评论。除非您处理的是“大数据”（内存中连接操作的成本超过存储空间和可能的更新成本），否则建议您将表分成两部分。
-第一个将包含视频详细信息Video_id*, longitude, latitude, location
-第二个表将包含Video_id, refreshes and Views

是的。你知道吗

话虽如此，但要达成这一最终代表权，还有几个选择。我使用self的解决方案是将Viewers_count存储为列表。列表将是有益的，因为可以一起删除Num_refresh，因为它可以从元素索引中重新计算。在这种情况下，使用dict将是不必要的昂贵和复杂，但我也将添加语法。你知道吗

df = pd.DataFrame({'id': list("aabb"), 
                   'location': list("xxyy"),
                   'views': [3, 4, 1, 2]})
#   id location  views
# 0  a        x      3
# 1  a        x      4
# 2  b        y      1
# 3  b        y      2

grouped_df = (df
              .groupby(["id", "location"])["views"] # Create a group for each [id, location] and select view
              .apply(np.hstack)                     # Transform the grouped views to a list
            # .apply(lambda x: dict(zip(range(len(x)), x))) # Dict
              .reset_index())                       # Move id and location to regular columns

#   id location   views
# 0  a        x  [3, 4]
# 1  b        y  [1, 2]

更新：

您在注释中提到了迭代过程中嵌套列表的问题。可以用np.hstack替换list。你知道吗

# Second iterations 
iter_2 = pd.DataFrame({'id': list("aabb"), 
                       'location': list("xxyy"),
                       'views': [30, 40, 10, 20]})

grouped_df = (grouped_df
              .append(iter_2)                       # Add the rows of the new dataframe to the grouped_df
              .groupby(["id", "location"])["views"]
              .apply(np.hstack)
              .reset_index())

#   id location           views
# 0  a        x  [3, 4, 30, 40]
# 1  b        y  [1, 2, 10, 20]

相关问题更多 >

编程相关推荐

热门问题

热门文章