使用queryhttp://abc:9200/abc/_search?q=aid:123
我可以得到以下结果。我想筛选的第一件事是“aid”,然后只捕获所需的字段(“act\u timestamp”、“act\u type”、“mod\u path”、“mod\u size”)以及“procguid”,并将它们放在一个表中。你知道吗
提供以下准确的输入和输出:
输入:
{
"took": 95,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "xxx",
"_type": "_doc",
"_id": "xxxxxxxxx",
"_score": 1,
"_source": {
"aid": 123,
"activity": [
{
"act_timestamp": 1549437711762,
"act_type": "reg_act",
"reg_action": "writeval",
"reg_action_typeid": 3
},
{
"act_timestamp": 1549437714812,
"act_type": "reg_act",
"reg_action": "writeval",
"reg_action_typeid": 3
},
{
"act_timestamp": 1549437711418,
"act_type": "child_proc",
"child_name": "audiodg.exe"
}
],
"event_timestamp": 1549437712183,
"tid": "demo",
"procguid": "11111",
"doc_id": 1000881655327721
}
},
{
"_index": "xxx",
"_type": "_doc",
"_id": "xxxxxxxxx",
"_score": 1,
"_source": {
"aid": 123,
"activity": [
{
"act_timestamp": 1549437977317,
"act_type": "mod_load",
"mod_path": "path1",
"mod_size": 32768
},
{
"act_timestamp": 1549437977390,
"act_type": "mod_load",
"mod_path": "path2",
"mod_size": 667648
},
{
"act_timestamp": 1549437977399,
"act_type": "reg_act",
"reg_action": "writeval",
"reg_action_typeid": 3
},
{
"act_timestamp": 1549437977407,
"act_type": "child_proc",
"child_name": "conhost.exe"
}
],
"event_timestamp": 1549437977406,
"tid": "demo",
"procguid": "22222",
"doc_id": 1010298962848944
}
},
{
"_index": "xxx",
"_type": "_doc",
"_id": "xxxxxxxxx",
"_score": 1,
"_source": {
"aid": 123,
"activity": [
{
"act_timestamp": 1549437772915,
"act_type": "mod_load",
"mod_path": "path3",
"mod_size": 409600
},
{
"act_timestamp": 1549437772940,
"act_type": "mod_load",
"mod_path": "path4",
"mod_size": 937984
},
{
"act_timestamp": 1549437953405,
"act_type": "child_proc",
"child_name": "conhost.exe"
}
],
"event_timestamp": 1549437953405,
"tid": "demo",
"procguid": "22222",
"doc_id": 1007612603810098
}
},
{
"_index": "xxx",
"_type": "_doc",
"_id": "xxxxxxxxx",
"_score": 1,
"_source": {
"aid": 123,
"activity": [
{
"act_timestamp": 1549437848842,
"act_type": "mod_load",
"mod_path": "path5",
"mod_size": 1679360
},
{
"act_timestamp": 1549437848844,
"act_type": "mod_load",
"mod_path": "path6",
"mod_size": 2121728
},
{
"act_timestamp": 1549437848864,
"act_type": "mod_load",
"mod_path": "path7",
"mod_size": 266240
},
{
"act_timestamp": 1549437849590,
"act_type": "reg_act",
"reg_action": "writeval",
"reg_action_typeid": 3
},
{
"act_timestamp": 1549437953418,
"act_type": "child_proc",
"child_name": "wpscloudsvr.exe"
}
],
"event_timestamp": 1549437953417,
"tid": "demo",
"procguid": "33333",
"doc_id": 1007725853753652
}
}]}}
我希望输出如下:
procguid act_type mod_path mod_size
22222 mod_load path1 32768
22222 mod_load path2 667648
22222 mod_load path3 409600
22222 mod_load path4 937984
33333 mod_load path5 1679360
33333 mod_load path6 2121728
33333 mod_load path7 266240
那么,有没有办法把这些海量的elasticsearch文档转换成这样的数据帧呢?你知道吗
有什么建议/解决方案代码吗?由于我不熟悉elasticsearch/json,我已经被困了很长时间。你知道吗
编辑:有些块在具有“procguid”时也缺少“activity”字段
你可以把你的任务分解成几个部分。一开始你专注于
这有列
现在可以遍历列
_source.activity
,将每个单元格转换为数据帧并收集它们:把所有的东西放在一起(希望如此)就能得到预期的结果:
相关问题 更多 >
编程相关推荐