连接到AWS S3和Redshift的方便包装
nordata的Python项目详细描述
诺达塔
作者:
尼克·布克
简介:
nordata是访问aws s3和aws redshift的一个小的实用函数集合。它是由Nordstrom分析团队的一位数据科学家撰写的。nordata的目标是成为一个简单、健壮的包,以简化数据工作流程。它并不打算处理所有可能的需求(例如,凭证管理很大程度上留给用户),但它旨在简化常见任务。
目录:
安装Nordata:
为Nordata设置凭据:
如何使用nordata:
红移:
- Importing nordata Redshift functions
- Reading a SQL script into Python as a string
- Executing a SQL query that does not return data
- Executing a SQL query that returns data
- Executing a SQL query that returns data for pandas
- Creating a connection object (experienced users)
S3:
- Importing S3 functions
- Downloading a single file from S3
- Downloading with a profile name
- Downloading a list of files from S3
- Downloading files matching a pattern from S3
- Downloading all files in a directory from S3
- Uploading a single file to S3
- Uploading with a profile name
- Uploading a list of files to S3
- Uploading files matching a pattern to S3
- Uploading all files in a directory to S3
- Deleting a single file in S3
- Deleting with a profile name
- Deleting a list of files in S3
- Deleting files matching a pattern in S3
- Deleting all files in a directory in S3
- Creating a bucket object (experienced users)
BOTO3(有经验的用户):
在redshift和s3之间传输数据:
测试:
Installing Nordata:
Nordata can be install via pip. As always, use of a project-level virtual environment is recommended.
Nordata requires Python >= 3.6.
^{pr 1}$Setting up credentials for Nordata:
Redshift:
Nordata is designed to ingest your Redshift credentials as an environment variable in the below format. This method allows the user freedom to handle credentials a number of ways. As always, best practices are advised. Your credentials should never be placed in the code of your project such as in a ^{
S3:
If the user is running locally, their ^{
Note the the profile name in brackets. If the profile name differs in your credentials file, you will likely need to pass this profile name to the S3 functions as an argument.
How to use Nordata:
Redshift:
Importing nordata Redshift functions:
^{pr 4}$Reading a SQL script into Python as a string:
^{pr 5}$Executing a SQL query that does not return data:
^{pr 6}$Executing a SQL query that returns data as a list of tuples and column names as a list of strings:
^{pr 7}$Executing a SQL query that returns data as a dict for easy ingestion into a pandas DataFrame:
^{pr 8}$Creating a connection object that can be manipulated directly by experienced users:
^{pr 9}$S3:
^{pr 10}$Downloading a single file from S3:
^{pr 11}$Downloading with a profile name:
^{pr 12}$Downloading a list of files from S3 (will not upload contents of subdirectories):
^{pr 13}$Downloading files matching a pattern from S3 (will not upload contents of subdirectories):
^{pr 14}$Downloading all files in a directory from S3 (will not upload contents of subdirectories):
^{pr 15}$Uploading a single file to S3:
^{pr 16}$Uploading with a profile name:
^{pr 17}$Uploading a list of files to S3 (will not upload contents of subdirectories):
^{pr 18}$Uploading files matching a pattern to S3 (will not upload contents of subdirectories):
^{pr 19}$Uploading all files in a directory to S3 (will not upload contents of subdirectories):
^{pr 20}$ ^{pr 21}$ ^{pr 22}$Deleting a list of files in S3:
^{pr 23}$Deleting files matching a pattern in S3:
^{pr 24}$Deleting all files in a directory in S3:
^{pr 25}$Creating a bucket object that can be manipulated directly by experienced users:
^{pr 26}$Boto3:
^{pr 27}$
Retrieves Boto3 credentials as a string for use in ^{
Creating a boto3 session object that can be manipulated directly by experienced users:
^{pr 29}$Transferring data between Redshift and S3:
Transferring data from Redshift to S3 using an ^{
fromnordataimportboto_get_creds,redshift_execute_sqlcreds=boto_get_creds(profile_name='default',region_name='us-west-2',session=None)sql=f''' unload ( 'select col1 ,col2 from my_schema.my_table' ) to 's3://mybucket/unload/my_table/' credentials '{creds}' parallel off header gzip allowoverwrite;'''redshift_execute_sql(sql=sql,env_var='REDSHIFT_CREDS',return_data=False,return_dict=False)
Transferring data from S3 to Redshift using a ^{
fromnordataimportboto_get_creds,redshift_execute_sqlcreds=boto_get_creds(profile_name='default',region_name='us-west-2',session=None)sql=f''' copy my_schema.my_table from 's3://mybucket/unload/my_table/' credentials '{creds}' ignoreheader 1 gzip;'''redshift_execute_sql(sql=sql,env_var='REDSHIFT_CREDS',return_data=False,return_dict=False)
测试:
对于那些对nordata或分叉和编辑项目感兴趣的人来说,pytest是使用的测试框架。要运行测试,请创建一个虚拟环境,安装dev-requirements.txt
的内容,并从项目的根目录运行以下命令。测试脚本可以在test/
目录中找到。
$ pytest