Apache Airflow Operator将AWS Cost Explorer数据导出到本地文件或S3
airflow-aws-cost-explorer的Python项目详细描述
Airflow AWS成本浏览器插件
用于Apache Airflow的插件,它允许 您要导出AWS Cost Explorer 作为S3度量到 Parquet、JSON或CSV格式的本地文件或S3。在
系统要求
- 气流型
- 1.10.3或更新版本
- pyarrow或fastparquet(可选,用于编写拼花地板文件)
部署说明
- 在
安装插件
pip安装气流aws成本浏览器
在 - 在
可用于编写拼花地板文件-安装pyarrow或fastparquet
pip安装pyarrow
或者
pip安装快速拼花地板
在 - 在
重新启动Airflow Web服务器
在 - 在
配置AWS连接(Conn type='AWS')
在 - 在
S3可选-配置S3连接(Conn type='S3')
在
操作员
AWS浏览器3操作员
:param day: Date to be exported as string in YYYY-MM-DD format or date/datetime instance (default: yesterday) :type day: str, date or datetime :param aws_conn_id: Cost Explorer AWS connection id (default: aws_default) :type aws_conn_id: str :param region_name: Cost Explorer AWS Region :type region_name: str :param s3_conn_id: Destination S3 connection id (default: s3_default) :type s3_conn_id: str :param s3_bucket: Destination S3 bucket :type s3_bucket: str :param s3_key: Destination S3 key :type s3_key: str :param file_format: Destination file format (parquet, json or csv default: parquet) :type file_format: str or FileFormat :param metrics: Metrics (default: UnblendedCost, BlendedCost) :type metrics: list
AWSCestExplorerToLocalFileOperator
^{pr2}$awsbucketsizetos3运算符
:param day: Date to be exported as string in YYYY-MM-DD format or date/datetime instance (default: yesterday) :type day: str, date or datetime :param aws_conn_id: Cost Explorer AWS connection id (default: aws_default) :type aws_conn_id: str :param region_name: Cost Explorer AWS Region :type region_name: str :param s3_conn_id: Destination S3 connection id (default: s3_default) :type s3_conn_id: str :param s3_bucket: Destination S3 bucket :type s3_bucket: str :param s3_key: Destination S3 key :type s3_key: str :param file_format: Destination file format (parquet, json or csv default: parquet) :type file_format: str or FileFormat :param metrics: Metrics (default: bucket_size, number_of_objects) :type metrics: list
AWSBucketSizeToLocalFileOperator
:param day: Date to be exported as string in YYYY-MM-DD format or date/datetime instance (default: yesterday) :type day: str, date or datetime :param aws_conn_id: Cost Explorer AWS connection id (default: aws_default) :type aws_conn_id: str :param region_name: Cost Explorer AWS Region :type region_name: str :param destination: Destination file complete path :type destination: str :param file_format: Destination file format (parquet, json or csv default: parquet) :type file_format: str or FileFormat :param metrics: Metrics (default: bucket_size, number_of_objects) :type metrics: list
示例
#!/usr/bin/env python
import airflow
from airflow import DAG
from airflow_aws_cost_explorer import AWSCostExplorerToLocalFileOperator
from datetime import timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(1),
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=30)
}
dag = DAG('cost_explorer',
default_args=default_args,
schedule_interval=None,
concurrency=1,
max_active_runs=1,
catchup=False
)
aws_cost_explorer_to_file = AWSCostExplorerToLocalFileOperator(
task_id='aws_cost_explorer_to_file',
day='{{ yesterday_ds }}',
destination='/tmp/{{ yesterday_ds }}.parquet',
file_format='parquet',
dag=dag)
if __name__ == "__main__":
dag.cli()
链接
- 阿帕奇气流-https://github.com/apache/airflow
- 阿帕奇箭头-https://github.com/apache/arrow
- 快速拼花地板-https://github.com/dask/fastparquet
- AWS成本浏览器-https://aws.amazon.com/aws-cost-management/aws-cost-explorer/API Reference
- S3 CloudWatch度量-https://docs.aws.amazon.com/AmazonS3/latest/dev/cloudwatch-monitoring.html
- 项目
标签: