大型集合的 Firestore DeadlineExceeded 异常
Posted
技术标签:
【中文标题】大型集合的 Firestore DeadlineExceeded 异常【英文标题】:Firestore DeadlineExceeded exception for big collections 【发布时间】:2019-04-13 01:47:17 【问题描述】:我正在尝试从 Google Firestore 读取更大的集合以进行测试和存档。当我尝试从包含超过 6k 个文档的集合中获取所有文档时,我遇到了一些有趣的错误。
朴素的 Python 解决方案
我的第一次尝试是使用 Python google-cloud-firestore
(版本 0.30.0)库。
source_client = firestore.Client()
source = source_client.collection(collection)
source_data = source.get()
counter = 0
for f in source_data:
app.logger.info(f.id)
counter += 1
if counter % 100 == 0:
app.logger.info('%s %d', datetime.now(), counter)
app.logger.info('%s Finally read all %d documents', datetime.now(), counter)
它给出以下输出:
INFO:flask.app:2018-11-08 09:49:03.923795 6400
INFO:flask.app:2018-11-08 09:49:04.115410 6500
...
INFO:flask.app:2018-11-08 09:49:03.923795 6400
INFO:flask.app:2018-11-08 09:49:04.115410 6500
WARNING:flask.app:2018-11-08 09:49:04.128478 copy brocken by exception
Traceback (most recent call last):
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 2309, in __call__
return self.wsgi_app(environ, start_response)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 2295, in wsgi_app
response = self.handle_exception(e)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1741, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/carsten/projects/transfertool/firestore/transfertool/main.py", line 142, in transfer
count_collection(source_collection)
File "/home/carsten/projects/transfertool/firestore/transfertool/main.py", line 94, in count_collection
for f in source_collection.offset(1000).get():
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/google/cloud/firestore_v1beta1/query.py", line 588, in get
for index, response_pb in enumerate(response_iterator):
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 83, in next
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
# Permission is hereby granted, free of charge, to any person obtaining a copy
google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded
这似乎是由配额引起的。即使我看不到它here。它似乎是基于时间的,因为当我在元素之间以小睡眠运行时,吞吐量会降低,并且在大约 50 秒后会出现异常。
使用 Python 进行分页
对于这个问题,这个库中有一个分页部分。由于我的应用程序不应该关心我尝试传输什么样的数据,我无法使用start_after
接口,但仍然有一个偏移接口,我至少可以使用它分批读取。
for f in source_collection.offset(last_read_offset).get():
只要last_read_offset
低于 1001,这会给我正确的结果。如果我从 1000 的偏移量开始,我可以得到结果,直到我从上面得到 google.api_core.exceptions.DeadlineExceeded exception
。但是当我从更大的事情开始时,我会得到:
Traceback (most recent call last):
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 2309, in __call__
return self.wsgi_app(environ, start_response)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 2295, in wsgi_app
response = self.handle_exception(e)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1741, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/carsten/projects/transfertool/firestore/transfertool/main.py", line 144, in transfer
count_collection(source_collection)
File "/home/carsten/projects/transfertool/firestore/transfertool/main.py", line 94, in count_collection
for f in source_collection.offset(1001).get():
File "/home/carsten/projects/transfertool/venv/lib/python3.6/site-packages/google/cloud/firestore_v1beta1/query.py", line 599, in get
raise ValueError(msg)
ValueError: Unexpected server response. All responses other than the first must contain a document. The response at index 1 was
read_time
seconds: 1541668338
nanos: 420813000
skipped_results: 1
查看库代码,后端似乎正在发送一条被解释为无效的消息。
通过 node.js 重试
好吧,也许我的代码或 Python 客户端库有问题。让我们尝试使用节点。
const admin = require('firebase-admin');
admin.initializeApp(
credential: admin.credential.applicationDefault()
);
var db = admin.firestore();
admin.firestore().settings( timestampsInSnapshots: true )
var counter = 0
console.log('Read collection')
db.collection(collection).get()
.then(querySnapshot =>
querySnapshot.forEach(documentSnapshot =>
counter++;
);
console.log(counter)
)
.catch( error =>
console.log(error)
);
这与 python 库的作用相同,即使超时更明显是 60 秒。
[2018-11-09T08:36:30.992Z] App listening on port 8080
[2018-11-09T08:36:30.993Z] Press Ctrl+C to quit.
[2018-11-09T08:36:37.390Z] Read collection
[2018-11-09T08:37:37.406Z] Error: 4 DEADLINE_EXCEEDED: Deadline Exceeded
at Object.exports.createStatusError (/home/carsten/projects/node_modules/grpc/src/common.js:87:15)
at ClientReadableStream._emitStatusIfDone (/home/carsten/projects/node_modules/grpc/src/client.js:235:26)
at ClientReadableStream._readsDone (/home/carsten/projects/node_modules/grpc/src/client.js:201:8)
at /home/carsten/projects/node_modules/grpc/src/client_interceptors.js:679:15
code: 4,
metadata: Metadata _internal_repr: ,
details: 'Deadline Exceeded'
有没有人有类似的经历或很好的提示如何继续?
PS:exportDocument
/importDocument
接口不够用,有时我们需要在读取后调整数据。而且我不知道 Firestore 将哪种格式导出到 Google Cloud Storage 或如何转换它。
编辑:golang
为了它,我尝试了 golang api。
log.Println("Collecting data")
snapshotIter := client.Collection(collection.(string)).Documents(ctx)
defer snapshotIter.Stop()
if err != nil
log.Fatalln(err)
i := 0
for
_, err := snapshotIter.Next()
if err == iterator.Done
break
if err != nil
log.Fatalln(err)
if i % 100 == 0
log.Println(i)
i++
log.Println("Done")
这会遇到与预期相同的超时。
2018/11/12 15:01:20 Collecting data
2018/11/12 15:01:21 0
2018/11/12 15:01:21 100
2018/11/12 15:01:21 200
2018/11/12 15:01:21 300
2018/11/12 15:01:21 400
2018/11/12 15:01:22 500
2018/11/12 15:01:22 600
2018/11/12 15:01:22 700
....
2018/11/12 15:02:22 29800
2018/11/12 15:02:23 29900
2018/11/12 15:02:23 rpc error: code = DeadlineExceeded desc = The datastore operation timed out, or the data was temporarily unavailable.
但除此之外,偏移量也可以正常工作:
snapshotIter := client.Collection(collection.(string)).Offset(30000).Documents(ctx)
【问题讨论】:
您可以记录 Firebase 支持的问题firebase.google.com/support/contact 好主意,谢谢。 【参考方案1】:在我的情况下,我在获取整个集合时遇到了这个错误。它甚至不是那么大的集合,但我猜集合中的文档很大。我做了一个分页更新。这是一个节点 firebase 函数:
let lastReadDoc = false;
let lastDoc: string = '';
const limitRecordCount = 10;
do
await db
.collection('something/' + somethingId + '/somethingcollection')
.orderBy('id')
.limit(limitRecordCount)
.startAfter(lastDoc)
.get()
.then((snapshot: any) =>
let counter = 0;
snapshot.docs.forEach((doc: any) =>
const docData = doc.data();
if (lastDoc !== docData.id)
lastDoc = docData.id;
counter = counter + 1;
// ***********************
// business logic per doc here
// ***********************
);
if (counter < limitRecordCount)
lastReadDoc = true;
)
.catch((err: any) =>
lastReadDoc = true;
console.log('Error getting booking documents', err);
);
while (lastReadDoc === false);
【讨论】:
【参考方案2】:在 firebase 支持团队的帮助下,我们发现 python 客户端 api 确实存在错误。在下一个版本中会有一个错误修复。很可能它将使 python 库能够按 documentid 排序,因此使用start_after()
。
在此之前,您有两种可能的解决方案:
使用另一个字段进行排序并使用start_after()
使用带有分页功能的 node.js 库,例如:
var db = admin.firestore();
admin.firestore().settings( timestampsInSnapshots: true );
function readNextPage(lastReadDoc)
let query = db
.collection(collection)
.orderBy(admin.firestore.FieldPath.documentId())
.limit(100);
【讨论】:
请再次查看:函数读取 NextPage(lastReadDoc) ... lastReadDoc 参数应该去哪里?以上是关于大型集合的 Firestore DeadlineExceeded 异常的主要内容,如果未能解决你的问题,请参考以下文章
新的 Firebase Firestore DocumentDb 如何为大型子集合建模
将大型 json 文件从 Firebase 存储传输到 Firestore
如果文档使用大型 Map 字段,则 Firebase Firestore 查询错误
如何定期将大型 JSON 数据集导入 Cloud Firestore?