基于某些特定列合并数据，如

timestamp ip domain http_status \ 0 1475740500.0 192.168.1.1 example.com 200 1 1475740500.0 192.168.1.1 example.com 200 2 1475740500.0 192.168.1.1 example.com 201 3 1475740500.0 192.168.1.1 example.com 201 4 1475740500.0 192.168.1.1 example.com 202 test b_count b_sum test_count test_sum data1 \ 0 False 46 24742949931480 46 9.250 0 1 True 48 28151237474796 48 9.040 0 2 False 36 21702308613722 36 7.896 0 3 True 24 13112423049120 24 5.602 0 4 False 62 29948023487954 62 12.648 0 data2 0 0 1 0 2 0 3 0 4 0

timestamp ip domain http_status \ 0 1475740500.0 192.168.1.1 example.com 200 1 1475740500.0 192.168.1.1 example.com 200 2 1475740500.0 192.168.1.1 example.com 201 3 1475740500.0 192.168.1.1 example.com 201 4 1475740500.0 192.168.1.1 example.com 202 test b_count b_sum test_count test_sum data1 \ 0 False 44 22349502626302 44 9.410 0 1 True 32 16859760597754 32 5.988 0 2 False 46 23478212117794 46 8.972 0 3 True 36 20956236750016 36 7.124 0 4 False 54 35255787384306 54 9.898 0 data2 0 0 1 0 2 0 3 0 4 0

groupby_fields = ['timestamp', 'ip', 'domain', 'http_status', 'test'] pd.merge(t1h, t2h, on=groupby_fields) timestamp ip domain http_status \ 0 1475740500.0 192.168.1.1 example.com 200 1 1475740500.0 192.168.1.1 example.com 200 2 1475740500.0 192.168.1.1 example.com 201 3 1475740500.0 192.168.1.1 example.com 201 4 1475740500.0 192.168.1.1 example.com 202 test b_count_x b_sum_x test_count_x test_sum_x \ 0 False 46 24742949931480 46 9.250 1 True 48 28151237474796 48 9.040 2 False 36 21702308613722 36 7.896 3 True 24 13112423049120 24 5.602 4 False 62 29948023487954 62 12.648 data1_x data2_x b_count_y b_sum_y \ 0 0 0 44 22349502626302 1 0 0 32 16859760597754 2 0 0 46 23478212117794 3 0 0 36 20956236750016 4 0 0 54 35255787384306 test_count_y test_sum_y data1_y data2_y 0 44 9.410 0 0 1 32 5.988 0 0 2 46 8.972 0 0 3 36 7.124 0 0 4 54 9.898 0 0

timestamp ip domain http_status \ 0 1475740500.0 192.168.1.1 example.com 200 1 1475740500.0 192.168.1.1 example.com 200 2 1475740500.0 192.168.1.1 example.com 201 3 1475740500.0 192.168.1.1 example.com 201 4 1475740500.0 192.168.1.1 example.com 202 test b_count b_sum test_count test_sum \ 0 False 90 47092452557782 90 18.660 1 True 80 45010998072550 80 15.028 2 False 82 45180520731516 82 16.868 3 True 60 34068659799136 60 12.726 4 False 116 65203810872260 116 22.546 data1 data2 \ 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0

1条回答

网友

1楼 · 发布于 2024-09-27 23:18:47

`groupby.agg()`函数的最佳用例

假设t1h和t2h已经存在，并且具有相同的列名

groupby_fields = ['timestamp', 'ip', 'domain', 'http_status', 'test']

df = t2h.append(t2h, ignore_index = True)

agg_dict = {'b_count':'count',
        'b_sum':'sum',
        'test_count':'count',
        'test_sum':'sum',
        'data1':'sum',
        'data2':'sum'}

df.groupby(groupby_fields).agg(agg_dict).reset_index()

`groupby.agg()`函数的最佳用例

相关问题更多 >

编程相关推荐

热门问题

热门文章