我有一个redshift的unload查询,它提供了所需的数据。我希望在python脚本中添加一个头行,该脚本将写入每个文件。提前谢谢。在
标题行:应该与此完全相同。 ~id~从~to~标签开始时间:日期结束时间:日期accountid:字符串srcaddr:字符串dstaddr:字符串srcport:字符串dsport:字符串协议:字符串数据包:Int字节:内景操作:字符串在
def handler(event, context):
now = datetime.utcnow().replace(second=0, microsecond=0)
past = now - timedelta(minutes=120)
cmd = """unload (SELECT 'pk' AS id,
'version' AS version,
'accountid' AS accountid,
'interfaceid' AS interfaceid,
'srcaddr' AS srcaddr,
'dstaddr' AS dstaddr,
'srcport' AS srcport,
'dstport' AS dstport,
'protocol' AS protocol,
'packets' AS packets,
'bytes' AS bytes,
'starttime' AS startime,
'endtime' AS endtime,
'action' AS action,
'logstatus' AS logstatus,
'fromeni' AS fromeni,
'toeni' AS toeni
UNION ALL SELECT
cast(pk AS varchar(255)) AS id,
cast(version AS varchar(255)) AS version,
cast(accountid AS varchar(255)) AS accountid,
cast(interfaceid AS varchar(255)) AS interfaceid,
cast(srcaddr AS varchar(255)) AS srcaddr,
cast(dstaddr AS varchar(255)) AS dstaddr,
cast(srcport AS varchar(255)) AS srcport,
cast(dstport AS varchar(255)) AS dstport,
cast(protocol AS varchar(255)) AS protocol,
cast(packets AS varchar(255)) AS packets,
cast(bytes AS varchar(255)) AS bytes,
cast(starttime AS varchar(255)) AS starttime,
cast(endtime AS varchar(255)) AS endtime,
action,
logstatus,
cast(fromeni AS varchar(255)) AS fromeni,
cast(toeni AS varchar(255)) AS toeni
FROM dimension.vpc_flow_logs)\
to 's3://xxxxxxxxxx/' \
iam_role 'arn:aws:iam::xxxxxxxxx:role/task' \
delimiter as ',' \
PARALLEL OFF \
ESCAPE \
manifest allowoverwrite"""
try:
cursor = conn.cursor()
cursor.execute(cmd)
conn.commit()
finally:
cursor.close()
conn.close()
if __name__ == '__main__':
handler({},{})
好消息!截至2018年10月2日,Redshift开始在
UNLOAD
命令中推出对HEADER
选项的支持,该命令将列标题导出为csv的标题。在以下是他们的声明: https://forums.aws.amazon.com/ann.jspa?annID=6136
文档也已更新: https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html
例如
我刚在一个有100-300列的数据库上试过,效果不错。
UNION
列定义的日子一去不复返了!为任何使用红移的数据科学家节省了很多时间,并且需要为第三方平台导出数据!在也可以使用默认的
PARALLEL ON
处理多个输出文件。在相关问题 更多 >
编程相关推荐