假设我有两个数据帧,t1h
和t2h
。我希望以这样一种方式合并该数据帧:对于特定的列列表,如果这些行看起来相似,我需要对其余列的内容执行加法操作。你知道吗
t1h
timestamp ip domain http_status \
0 1475740500.0 192.168.1.1 example.com 200
1 1475740500.0 192.168.1.1 example.com 200
2 1475740500.0 192.168.1.1 example.com 201
3 1475740500.0 192.168.1.1 example.com 201
4 1475740500.0 192.168.1.1 example.com 202
test b_count b_sum test_count test_sum data1 \
0 False 46 24742949931480 46 9.250 0
1 True 48 28151237474796 48 9.040 0
2 False 36 21702308613722 36 7.896 0
3 True 24 13112423049120 24 5.602 0
4 False 62 29948023487954 62 12.648 0
data2
0 0
1 0
2 0
3 0
4 0
t2h
timestamp ip domain http_status \
0 1475740500.0 192.168.1.1 example.com 200
1 1475740500.0 192.168.1.1 example.com 200
2 1475740500.0 192.168.1.1 example.com 201
3 1475740500.0 192.168.1.1 example.com 201
4 1475740500.0 192.168.1.1 example.com 202
test b_count b_sum test_count test_sum data1 \
0 False 44 22349502626302 44 9.410 0
1 True 32 16859760597754 32 5.988 0
2 False 46 23478212117794 46 8.972 0
3 True 36 20956236750016 36 7.124 0
4 False 54 35255787384306 54 9.898 0
data2
0 0
1 0
2 0
3 0
4 0
根据以下列列表,我需要获得输出:
groupby_fields = ['timestamp', 'ip', 'domain', 'http_status', 'test']
pd.merge(t1h, t2h, on=groupby_fields)
timestamp ip domain http_status \
0 1475740500.0 192.168.1.1 example.com 200
1 1475740500.0 192.168.1.1 example.com 200
2 1475740500.0 192.168.1.1 example.com 201
3 1475740500.0 192.168.1.1 example.com 201
4 1475740500.0 192.168.1.1 example.com 202
test b_count_x b_sum_x test_count_x test_sum_x \
0 False 46 24742949931480 46 9.250
1 True 48 28151237474796 48 9.040
2 False 36 21702308613722 36 7.896
3 True 24 13112423049120 24 5.602
4 False 62 29948023487954 62 12.648
data1_x data2_x b_count_y b_sum_y \
0 0 0 44 22349502626302
1 0 0 32 16859760597754
2 0 0 46 23478212117794
3 0 0 36 20956236750016
4 0 0 54 35255787384306
test_count_y test_sum_y data1_y data2_y
0 44 9.410 0 0
1 32 5.988 0 0
2 46 8.972 0 0
3 36 7.124 0 0
4 54 9.898 0 0
我希望它的输出应该是这样的:
注:除groupby_fields
中的列外,其他每列都是int
或float
类型
timestamp ip domain http_status \
0 1475740500.0 192.168.1.1 example.com 200
1 1475740500.0 192.168.1.1 example.com 200
2 1475740500.0 192.168.1.1 example.com 201
3 1475740500.0 192.168.1.1 example.com 201
4 1475740500.0 192.168.1.1 example.com 202
test b_count b_sum test_count test_sum \
0 False 90 47092452557782 90 18.660
1 True 80 45010998072550 80 15.028
2 False 82 45180520731516 82 16.868
3 True 60 34068659799136 60 12.726
4 False 116 65203810872260 116 22.546
data1 data2 \
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
请让我知道如何以优化的方式实现这一点。你知道吗
groupby.agg()
函数的最佳用例假设
t1h
和t2h
已经存在,并且具有相同的列名相关问题 更多 >
编程相关推荐