如何将AML连接到ADLS第2代?

2024-05-20 09:10:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我想在我的Azure机器学习工作区(azureml-core==1.12.0)中注册来自ADLS Gen2的数据集。鉴于Python SDKdocumentation中不需要.register_azure_data_lake_gen2()的服务主体信息,我成功地使用以下代码将ADLS gen2注册为数据存储:

from azureml.core import Datastore

adlsgen2_datastore_name = os.environ['adlsgen2_datastore_name']
account_name=os.environ['account_name'] # ADLS Gen2 account name
file_system=os.environ['filesystem']

adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(
    workspace=ws,
    datastore_name=adlsgen2_datastore_name,
    account_name=account_name, 
    filesystem=file_system
)

但是,当我尝试注册数据集时,使用

from azureml.core import Dataset
adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
data = Dataset.Tabular.from_delimited_files((adls_ds, 'folder/data.csv'))

我犯了一个错误

Cannot load any data from the specified path. Make sure the path is accessible and contains data. ScriptExecutionException was caused by StreamAccessException. StreamAccessException was caused by AuthenticationException. 'AdlsGen2-ReadHeaders' for '[REDACTED]' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID <CLIENT_REQUEST_ID>, request ID <REQUEST_ID>. Error message: [REDACTED] | session_id=<SESSION_ID>

我是否需要支持服务主体使其能够工作?使用ML Studio UI,似乎甚至需要服务主体来注册数据存储

我注意到的另一个问题是,AMLS正在尝试访问此处的数据集: https://adls_gen2_account_name.**dfs**.core.windows.net/container/folder/data.csv而ADLS Gen2中的实际URI是:https://adls_gen2_account_name.**blob**.core.windows.net/container/folder/data.csv


Tags: 数据namefromcoreiddataaccountazureml
1条回答
网友
1楼 · 发布于 2024-05-20 09:10:07

根据这个documentation,您需要启用服务主体

1.您需要注册您的应用程序,并向服务主体授予存储Blob数据读取器访问权限

enter image description here

2.尝试以下代码:

adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(workspace=ws,
                                                             datastore_name=adlsgen2_datastore_name,
                                                             account_name=account_name,
                                                             filesystem=file_system,
                                                             tenant_id=tenant_id,
                                                             client_id=client_id,
                                                             client_secret=client_secret
                                                             )

adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
dataset = Dataset.Tabular.from_delimited_files((adls_ds,'sample.csv'))
print(dataset.to_pandas_dataframe())

结果:

enter image description here

相关问题 更多 >