使用python获取ES索引中的所有文档

esClient = Elasticsearch() response = esClient.search(index = 'news', body = {}, ) #scrollId = response["_scroll_id"] #print(scrollId) esDocs = response["hits"]["hits"] fields = {} for num, doc in enumerate(esDocs): sourceData = doc["_source"] #response = esClient.scroll(scroll_id=scrollId, scroll = '1m') #scrollId = response['_scroll_id'] #print(scrollId) for key, val in sourceData.items(): if key == 'tags' or key == 'text' or key == 'title': try: fields[key] = np.append(fields[key], val) except KeyError: fields[key] = np.array([val]) else: continue; df = pd.DataFrame(fields)

2条回答

网友

1楼 · 编辑于 2024-10-01 19:33:36

如果您试图通过pandas DataFrame API访问Elasticsearch索引，我建议使用Eland。然后，不必将所有文档加载到内存中，就可以对它们执行操作

<；披露：我是Eland的维护者，受雇于Elastic>

网友

2楼 · 编辑于 2024-10-01 19:33:36

您需要指定size，即要返回的文档数

esClient.search(index = 'news', body = {'size': 44908})

但这太多文档了，它很可能会崩溃

相关问题更多 >

编程相关推荐

热门问题

热门文章