mongodb:不合理的慢查询,索引和简单文档

2024-10-03 13:26:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用的是python和mongodb,现在我需要从数据库中查询文档并保存文档中的一些信息,现在我的代码是:

for trips in trip.find({},{'latlng_start':1, 'latlng_end':1, 'trip_data':1, 'trip_id':1}).batch_size(500):
    orig_coord = trips['latlng_start']['coordinates']
    dest_coord = trips['latlng_end']['coordinates']
    cell_start = citymap.find({"trips_orig": {"$exists": True},"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":orig_coord}}}})
    cell_end = citymap.find({"trips_dest": {"$exists": True},"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":dest_coord}}}})

    if cell_start.count() == 1 and cell_end.count() == 1 and cell_start[0]['big_cell8']['POI'] != {} and cell_end[0]['big_cell8']['POI'] != {}:
        try:
            labels_raw.append(purpose_mapping[trips['trip_data']['purpose']])           
            user_ids_raw.append(int(trips['trip_id'][:10]))         
            venue_feature_start.append([cell_start[0]['big_cell8']['POI'], orig_coord])
            venue_feature_end.append([cell_end[0]['big_cell8']['POI'], dest_coord]) 
        except:
            continue

    else:
        continue

我已将2dsphere index分配给集合citymap,此集合的索引为:

[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "CitySeg2014.grid750"
    },
    {
        "v" : 1,
        "key" : {
            "latlng" : "2dsphere"
        },
        "name" : "latlng_2dsphere",
        "ns" : "CitySeg2014.grid750",
        "2dsphereIndexVersion" : 2
    },
    {
        "v" : 1,
        "key" : {
            "cell_latlng" : "2dsphere"
        },
        "name" : "cell_latlng_2dsphere",
        "ns" : "CitySeg2014.grid750",
        "2dsphereIndexVersion" : 2
    },
    {
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "trips_dest_text_trips_orig_text",
        "ns" : "CitySeg2014.grid750",
        "weights" : {
            "trips_dest" : 1,
            "trips_orig" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 2
    }
]

问题是,虽然只有47000次行程,而citymap只包含11600个文档,但查询大约需要3000秒!!!但今天早上,当我运行同一个程序时,大约需要800秒。我不知道为什么会这样。有什么办法提高效率吗


Tags: idcellstartdestendpoiorigbig