我知道有相关的问题,但我已经为此工作了几个小时。我正试图通过tensorflow数据集处理并下载一个名为C4的公共爬网数据集的清理版本,该数据集使用Apache Beam管道和Google Cloud Dataflow将工作负载分配给数百名工作人员。为了与instructions for generating big datasets with Apache Beam保持一致,我遵循谷歌云数据流Quickstart instructions,通过谷歌云控制台设置我的项目、账单、凭证等,然后创建一个虚拟环境,安装tensorflow和谷歌云sdk,然后使用^{
pip install tfds-nightly[c4]
echo 'tfds-nightly[c4]' > /tmp/beam_requirements.txt
python -m tensorflow_datasets.scripts.download_and_prepare \
--datasets=c4/en \
--data_dir=gs://$MY_BUCKET/tensorflow_datasets \
--beam_pipeline_options="project=$MY_PROJECT,job_name=c4,staging_location=gs://$MY_BUCKET/binaries,temp_location=gs://$MY_BUCKET/temp,runner=DataflowRunner,requirements_file=/tmp/beam_requirements.txt,experiments=shuffle_mode=service,region=$MY_REGION"
它开始运行,但我收到一条“Failed:Resize Instance Group Manager”(失败:调整实例组管理器大小)消息,提示每20秒出现一次403错误
当控制台输出不断地说它试图扩展到1000时,我被限制为2个工作人员
.
.
.
I1014 15:43:05.446238 140556195309312 dataflow_runner.py:248] 2020-10-14T21:43:01.141Z: JOB_MESSAGE_DETAILED: Workers have started successfully.
I1014 15:43:05.446516 140556195309312 dataflow_runner.py:248] 2020-10-14T21:43:01.171Z: JOB_MESSAGE_DETAILED: Workers have started successfully.
I1014 15:49:42.444391 140556195309312 dataflow_runner.py:248] 2020-10-14T21:49:40.042Z: JOB_MESSAGE_BASIC: Autoscaling: Resizing worker pool from 1 to 2.
I1014 15:49:47.653243 140556195309312 dataflow_runner.py:248] 2020-10-14T21:49:45.542Z: JOB_MESSAGE_DETAILED: Autoscaling: Raised the number of workers to 2 based on
the rate of progress in the currently running stage(s).
I1014 15:51:37.070624 140556195309312 dataflow_runner.py:248] 2020-10-14T21:51:36.931Z: JOB_MESSAGE_BASIC: Autoscaling: Resizing worker pool from 2 to 1000.
I1014 16:39:49.413619 140556195309312 transport.py:179] Refreshing due to a 401 (attempt 1/2)
I1014 16:39:49.448023 140556195309312 client.py:795] Refreshing access_token
I1014 17:39:53.158122 140556195309312 transport.py:179] Refreshing due to a 401 (attempt 1/2)
I1014 17:39:53.191963 140556195309312 client.py:795] Refreshing access_token
I1014 18:39:54.347596 140556195309312 transport.py:179] Refreshing due to a 401 (attempt 1/2)
I1014 18:39:54.377913 140556195309312 client.py:795] Refreshing access_token
I1014 19:39:59.015963 140556195309312 transport.py:179] Refreshing due to a 401 (attempt 1/2)
I1014 19:39:59.051589 140556195309312 client.py:795] Refreshing access_token
.
.
.
从403错误和相关问题来看,我认为这是一个权限问题,但我按照说明创建了一个服务帐户并将其设置为“所有者”,因此,如果我将凭据设置为从GC控制台获得的json文件,为什么我会缺少权限,如何验证我的凭据,等,以停止得到这403个错误,并成功地提高数百名工人
最后,我认为这可能是一个配额问题,但在通过谷歌云控制台检查配额时,似乎作业没有遇到任何配额。他们每个人都有一个绿色的复选标记,而且还远远不够满
QUOTA_FOR_INSTANCES
指的是这个quota,它在配额页面中不可见。要增加配额,请增加CPU配额如果需要更多VM实例的配额,请请求更多CPU,因为拥有更多CPU会增加此配额
您还可以设置max_num_workers以使VM的数量保持在配额内
403可能意味着权限被拒绝,因为您没有足够的配额,而不是因为您的SA没有任何其他权限
相关问题 更多 >
编程相关推荐