奇数请求大小太大错误,Python+Databricks+CosmosDB

2024-09-28 22:42:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我的databricks笔记本中有以下脚本,在将数据帧保存到cosmos时,我遇到了请求大小过大的问题

    writeConfig = {
      "Endpoint": "https://mycosmosdbendpoint:443/",
      "Masterkey": "*****",
      "Database": database,
      "Collection": collection,
      "WritingBatchSize": "1",
      "Upsert": "true", 
    }
example_df = spark.sql(f'SELECT * FROM {myData}.temp_cosmos_export')
example_df.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).mode("overwrite").save()

这会产生以下错误

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77285.0 failed 4 times, most recent failure: Lost task 0.3 in stage 77285.0 (TID 676380, executor 209): java.lang.Exception: Errors encountered in bulk import API execution. PartitionKeyDefinition: {"paths":["/id"],"kind":"Hash"}, Number of failures corresponding to exception of type: com.microsoft.azure.documentdb.DocumentClientException = 1; FAILURE: com.microsoft.azure.documentdb.DocumentClientException: {"Errors":["Request size is too large"]}

调试此问题时,我已将数据帧中的项目数减少到仅1项,但仍然失败!。真正奇怪的是,如果我添加以下代码行display(example_df ),例如

example_df = spark.sql(f'SELECT * FROM {myData}.temp_cosmos_export')
display(example_df )
example_df.write.format("com.microsoft.azure.cosmosdb.spark").options(**writeConfig).mode("overwrite").save()

然后单记录成功发送到Cosmos! 当我检查CosmosDB中物体的大小时,它只比Cosmos的2MB极限低了大约250k

为什么显示数据帧的行为会导致它工作?对于250kb的文档,我的请求大小怎么可能太大

还有谁遇到过这样的事情吗?如有任何帮助或建议,将不胜感激


Tags: 数据infromcomdfsqlexampleazure