如何使用Python从Windows共享网络驱动器获取文件并上载到Azure Data Lake存储位置?

2024-10-03 21:35:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要从Windows共享网络驱动器位置获取文件(扩展名为.xml),并使用Python脚本(在PyCharm中)将其上载到ADL(Azure Data Lake存储)

我尝试使用以下代码-

import os
import subprocess

file_src = os.listdir('\\\\<Shared Dir Server>\\<Directory>')
local_directory="F:\\Files\\*"
sasToken="<SAS Token>"

endpoint="https://<storageAccount>.blob.core.windows.net/<container>/<target directory>"
copyscript= str(file_src) + " copy " + "\""+ local_directory + "\"" + "\""+endpoint+sasToken + "\"" + " --recursive"

print(copyscript)
subprocess.call(copyscript)

但它失败了-

['temp1.xml', 'temp2.xml', 'abc1.xml', 'desf2.xml', 'file.txt'] copy "F:\Files\*""https://<storageAccount>.blob.core.windows.net/<container>/<Target Directory>/sasToken" --recursive
Traceback (most recent call last):
  File "C:\Program Files\PycharmProjects\pythonProject\venv\Upload_SharedDrive_Files.py", line 17, in <module>
    subprocess.call(myscript)
  File "C:\Program Files\Python39\lib\subprocess.py", line 349, in call
    with Popen(*popenargs, **kwargs) as p:
  File "C:\Program Files\Python39\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Program Files\Python39\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

Process finished with exit code 1

我是Python的新手。请帮忙

谢谢


Tags: inpyliblinefilesxmlcallprogram
1条回答
网友
1楼 · 发布于 2024-10-03 21:35:50

我能够使用下面的代码完成此操作(不确定这是否是最好的方法)——

from azure.storage.filedatalake import DataLakeFileClient
from azure.storage.blob import BlobServiceClient
from azure.storage.filedatalake import DataLakeServiceClient
import os
import io
import shutil
import sys

connect_str="DefaultEndpointsProtocol=https;AccountName=<storageAccount>;AccountKey=<storageAccountKey>;EndpointSuffix=core.windows.net"
myfilesystem="<adlsContainer>"
myfolder="F:\\Files"

trgt_dir = "<adlsTargetDirectory>"
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)

def upload_file_to_directory(trgt,src, filename, filesystem):
    file_system_client = datalake_service_client.get_file_system_client(file_system=filesystem)
    directory_client = file_system_client.get_directory_client(trgt)
    file_client = directory_client.create_file(filename)

    local_file = io.open(os.path.join(src,filename), 'r', errors="ignore")
    file_contents = local_file.read()

    file_client.upload_data(file_contents, overwrite=True)

sys.path.extend(myfolder)

src = '\\\\<hostServer>\\<sourceDirectory>'
files = os.listdir(src)
dst = "F:\\Files"


for file in files:
    if file.endswith('.xml'):
        print(os.path.join(src, file))
        shutil.copy2(os.path.join(src, file), dst)

for fsrc in os.listdir(myfolder):
    print(f"Now uploading {fsrc}")
    upload_file_to_directory(trgt_dir,myfolder, fsrc, myfilesystem)
    print(f"Now removing {fsrc}")
    os.remove(os.path.join(myfolder, fsrc))

请随时提供您的想法

相关问题 更多 >