使用复杂条件更新elasticsearch索引

2024-10-02 14:19:06 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在处理英国2017年大选数据。我有csv文件格式和Elasticsearch索引。以下是来自Elasticsearch索引的Chichester选区样本：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 8.03183,
    "hits" : [
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "eCtGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "EMERSON",
          "first_name" : "Andrew",
          "party" : "Patria",
          "Party Identifer" : "Patria",
          "votes" : "84"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "eStGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "MONCREIFF",
          "first_name" : "Andrew Malcolm",
          "party" : "UK Independence Party (UKIP)",
          "Party Identifer" : "UKIP",
          "votes" : "1650"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "eitGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "BARRIE",
          "first_name" : "Heather Margaret",
          "party" : "Green Party",
          "Party Identifer" : "Green Party",
          "votes" : "1992"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "eytGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "BROWN",
          "first_name" : "Jonathan",
          "party" : "Liberal Democrats",
          "Party Identifer" : "Liberal Democrats",
          "votes" : "6749"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "fCtGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "FARWELL",
          "first_name" : "Mark Andrew",
          "party" : "Labour Party",
          "Party Identifer" : "Labour",
          "votes" : "13411"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "fStGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "KEEGAN",
          "first_name" : "Gillian",
          "party" : "The Conservative Party Candidate",
          "Party Identifer" : "Conservative",
          "votes" : "36032"
        }
      }
    ]
  }
}

我想创建一个新的“列”，称为“排名”，然后选择每个不同的选区，并为相关候选人添加适当的数字。因此，在上面的例子中，保守党候选人的排名为1，工党候选人的排名为2，依此类推

每个选区的候选人人数并不相同

一些最终目标是： 1）计算并分组每个政党的席位数 2）要选择那些选区，多数是最小的，并对它们进行排序 3）写一个算法，指出战术选民应该做出什么选择（当然取决于你想要的结果）

我不知道该怎么做（除了手动更新原始电子表格）

是否应该通过编程方式将cUrl命令直接放入集群中？或者使用Python脚本处理csv文件

请有人建议最好的方法，并提供一个代码示例

我的第一个想法是为每个不同的选区对返回的对象进行排序，使用总点击数循环遍历数据并在此基础上更新排名字段。我同意这一点：

curl -X POST "localhost:9200/ge/_search?pretty" -H 'Content-Type: application/json' -d'
{
   "query" : {
      "term" : { "Constituency" : "Aldershot" }
   },
   "sort" : [
      {"votes.keyword" : {"order" : "desc"}}
   ]
}'

返回一个空的数据集。所以我被卡住了。感谢所有的帮助

Tags： name id source index doc party type code

0条回答

目前没有回答

使用复杂条件更新elasticsearch索引

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用复杂条件更新elasticsearch索引

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >