如何解决一个403错误,该错误限制了谷歌云数据流可以增加的工作人员数量?

2024-10-05 12:58:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我知道有相关的问题,但我已经为此工作了几个小时。我正试图通过tensorflow数据集处理并下载一个名为C4的公共爬网数据集的清理版本,该数据集使用Apache Beam管道和Google Cloud Dataflow将工作负载分配给数百名工作人员。为了与instructions for generating big datasets with Apache Beam保持一致,我遵循谷歌云数据流Quickstart instructions,通过谷歌云控制台设置我的项目、账单、凭证等,然后创建一个虚拟环境,安装tensorflow和谷歌云sdk,然后使用^{设置凭证。当我设置我的_BUCKET、我的_PROJECT和我的_REGION变量时,我实际上运行了指令

pip install tfds-nightly[c4]
echo 'tfds-nightly[c4]' > /tmp/beam_requirements.txt
python -m tensorflow_datasets.scripts.download_and_prepare \
  --datasets=c4/en \
  --data_dir=gs://$MY_BUCKET/tensorflow_datasets \
  --beam_pipeline_options="project=$MY_PROJECT,job_name=c4,staging_location=gs://$MY_BUCKET/binaries,temp_location=gs://$MY_BUCKET/temp,runner=DataflowRunner,requirements_file=/tmp/beam_requirements.txt,experiments=shuffle_mode=service,region=$MY_REGION"

它开始运行,但我收到一条“Failed:Resize Instance Group Manager”(失败:调整实例组管理器大小)消息,提示每20秒出现一次403错误

enter image description here

当控制台输出不断地说它试图扩展到1000时,我被限制为2个工作人员

.
.
.
I1014 15:43:05.446238 140556195309312 dataflow_runner.py:248] 2020-10-14T21:43:01.141Z: JOB_MESSAGE_DETAILED: Workers have started successfully.
I1014 15:43:05.446516 140556195309312 dataflow_runner.py:248] 2020-10-14T21:43:01.171Z: JOB_MESSAGE_DETAILED: Workers have started successfully.
I1014 15:49:42.444391 140556195309312 dataflow_runner.py:248] 2020-10-14T21:49:40.042Z: JOB_MESSAGE_BASIC: Autoscaling: Resizing worker pool from 1 to 2.
I1014 15:49:47.653243 140556195309312 dataflow_runner.py:248] 2020-10-14T21:49:45.542Z: JOB_MESSAGE_DETAILED: Autoscaling: Raised the number of workers to 2 based on
the rate of progress in the currently running stage(s).
I1014 15:51:37.070624 140556195309312 dataflow_runner.py:248] 2020-10-14T21:51:36.931Z: JOB_MESSAGE_BASIC: Autoscaling: Resizing worker pool from 2 to 1000.
I1014 16:39:49.413619 140556195309312 transport.py:179] Refreshing due to a 401 (attempt 1/2)
I1014 16:39:49.448023 140556195309312 client.py:795] Refreshing access_token
I1014 17:39:53.158122 140556195309312 transport.py:179] Refreshing due to a 401 (attempt 1/2)
I1014 17:39:53.191963 140556195309312 client.py:795] Refreshing access_token
I1014 18:39:54.347596 140556195309312 transport.py:179] Refreshing due to a 401 (attempt 1/2)
I1014 18:39:54.377913 140556195309312 client.py:795] Refreshing access_token
I1014 19:39:59.015963 140556195309312 transport.py:179] Refreshing due to a 401 (attempt 1/2)
I1014 19:39:59.051589 140556195309312 client.py:795] Refreshing access_token
.
.
.

从403错误和相关问题来看,我认为这是一个权限问题,但我按照说明创建了一个服务帐户并将其设置为“所有者”,因此,如果我将凭据设置为从GC控制台获得的json文件,为什么我会缺少权限,如何验证我的凭据,等,以停止得到这403个错误,并成功地提高数百名工人

最后,我认为这可能是一个配额问题,但在通过谷歌云控制台检查配额时,似乎作业没有遇到任何配额。他们每个人都有一个绿色的复选标记,而且还远远不够满


Tags: topymessagebucketmytensorflowjobrunner
1条回答
网友
1楼 · 发布于 2024-10-05 12:58:03

QUOTA_FOR_INSTANCES指的是这个quota,它在配额页面中不可见。要增加配额,请增加CPU配额

如果需要更多VM实例的配额,请请求更多CPU,因为拥有更多CPU会增加此配额

您还可以设置max_num_workers以使VM的数量保持在配额内

403可能意味着权限被拒绝,因为您没有足够的配额,而不是因为您的SA没有任何其他权限

相关问题 更多 >

    热门问题