大型集合的Firestore DeadlineExceeded异常

2024-10-04 01:23:20 发布

您现在位置:Python中文网/ 问答频道 /正文

为了测试和存档的目的,我正在尝试从googlefirestore读取更大的集合。当我试图从包含超过6k个文档的集合中获取所有文档时,遇到了一些有趣的错误。在

简单的Python解决方案

我的第一次尝试是使用Pythongoogle-cloud-firestore(version0.30.0)库。在

source_client = firestore.Client()
source = source_client.collection(collection)
source_data = source.get()

counter = 0
for f in source_data:
    app.logger.info(f.id)
    counter += 1
    if counter % 100 == 0:
        app.logger.info('%s %d', datetime.now(), counter)

    app.logger.info('%s Finally read all %d documents', datetime.now(), counter)

其输出如下:

^{pr2}$

这似乎是配额造成的。即使我看不见它。它似乎是基于时间的,因为当我在元素之间小憩的情况下运行时,吞吐量会降低,并在~50秒后出现异常

使用Python分页

对于这个问题,这个库中有一个分页部分。由于我的应用程序不应该关心我试图传输什么类型的数据,所以我不能使用start_after接口,但是仍然有一个偏移接口,我至少可以批量读取它。在

for f in source_collection.offset(last_read_offset).get():

只要last_read_offset低于1001,就会得到正确的结果。如果我从偏移量1000开始,我可以得到结果,直到从上面得到google.api_core.exceptions.DeadlineExceeded exception。但当我开始做更大的事情时,我得到:

Traceback (most recent call last):
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 2309, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 2295, in wsgi_app
    response = self.handle_exception(e)
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1741, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/carsten/projects/transfertool/firestore/transfertool/main.py", line 144, in transfer
    count_collection(source_collection)
  File "/home/carsten/projects/transfertool/firestore/transfertool/main.py", line 94, in count_collection
    for f in source_collection.offset(1001).get():
  File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/google/cloud/firestore_v1beta1/query.py", line 599, in get
    raise ValueError(msg)
ValueError: Unexpected server response. All responses other than the first must contain a document. The response at index 1 was
read_time {
  seconds: 1541668338
  nanos: 420813000
}
skipped_results: 1

查看库代码,似乎后端正在发送一条被解释为无效的消息。在

通过重试节点.js在

好吧,也许我的代码或者Python客户机库有问题。让我们试试node。在

const admin = require('firebase-admin');
admin.initializeApp({
    credential: admin.credential.applicationDefault()
});

var db = admin.firestore();
admin.firestore().settings( { timestampsInSnapshots: true })
var counter = 0

console.log('Read collection')
db.collection(collection).get()
    .then(querySnapshot => {
        querySnapshot.forEach(documentSnapshot => {
            counter++;
        });
        console.log(counter)
    })
    .catch( error => {
        console.log(error)
});

这与python库相同,即使超时更明显地是60秒

[2018-11-09T08:36:30.992Z] App listening on port 8080
[2018-11-09T08:36:30.993Z] Press Ctrl+C to quit.
[2018-11-09T08:36:37.390Z] Read collection
[2018-11-09T08:37:37.406Z] { Error: 4 DEADLINE_EXCEEDED: Deadline Exceeded
    at Object.exports.createStatusError (/home/carsten/projects/node_modules/grpc/src/common.js:87:15)
    at ClientReadableStream._emitStatusIfDone (/home/carsten/projects/node_modules/grpc/src/client.js:235:26)
    at ClientReadableStream._readsDone (/home/carsten/projects/node_modules/grpc/src/client.js:201:8)
    at /home/carsten/projects/node_modules/grpc/src/client_interceptors.js:679:15
  code: 4,
  metadata: Metadata { _internal_repr: {} },
  details: 'Deadline Exceeded' }

有没有类似的经历或是如何继续的好建议?在

注:只有exportDocument/importDocument接口是不够的,因为我们有时需要在读取后调整数据。我不知道Firestore要导出到Google云存储的格式是什么,也不知道如何转换它。在

编辑:高朗

我尝试了GolangAPI。在

log.Println("Collecting data")
snapshotIter := client.Collection(collection.(string)).Documents(ctx)
defer snapshotIter.Stop()

if err != nil {
    log.Fatalln(err)
}

i := 0

for {
    _, err := snapshotIter.Next()

    if err == iterator.Done {
        break
    }
    if err != nil {
        log.Fatalln(err)
    }

    if i % 100 == 0{
        log.Println(i)
    }
    i++
}

log.Println("Done")

运行到与预期相同的超时。在

2018/11/12 15:01:20 Collecting data
2018/11/12 15:01:21 0
2018/11/12 15:01:21 100
2018/11/12 15:01:21 200
2018/11/12 15:01:21 300
2018/11/12 15:01:21 400
2018/11/12 15:01:22 500
2018/11/12 15:01:22 600
2018/11/12 15:01:22 700
....
2018/11/12 15:02:22 29800
2018/11/12 15:02:23 29900
2018/11/12 15:02:23 rpc error: code = DeadlineExceeded desc = The datastore operation timed out, or the data was temporarily unavailable.

但除此之外,补偿效果良好:

snapshotIter := client.Collection(collection.(string)).Offset(30000).Documents(ctx)

Tags: inpyappflaskhomevenvlibpackages
1条回答
网友
1楼 · 发布于 2024-10-04 01:23:20

在firebase支持团队的帮助下,我们发现python客户机api确实存在一个bug。在下一个版本中有一个错误修复。它很可能使python库能够按documentid排序,因此使用start_after()。在

在此之前,您有两种可能的解决方案:

  1. 使用另一个字段进行排序并使用start_after()

  2. 使用节点.js具有分页功能的库如下:

var db = admin.firestore();
admin.firestore().settings({ timestampsInSnapshots: true });
function readNextPage(lastReadDoc) {
  let query = db
    .collection(collection)
    .orderBy(admin.firestore.FieldPath.documentId())
    .limit(100);
}

相关问题 更多 >