我想用pythonud在pig拉丁脚本中用0填充一个列值

2024-05-19 11:03:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我遵循下面的步骤零垫.py我的python脚本

!/usr/bin/python

from org.apache.pig.scripting import *

@outputSchema('time:int')

def zero():
    time.zfill(4)

======================================

咕哝>;注册'零垫.py'使用org.apache.pig.scripting.jython.JythonScriptEngine作为myfuncs

===============================

^{pr2}$

=======================================================

 airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek,myfuncs.zero.DepTime_actual AS DepTime_actual_new,myfuncs.zero.CRSDeptime AS CRSDeptime_new,myfuncs.zero.Arrtime_actual AS Arrtime_actual_new,myfuncs.zero.CRSArrtime AS CRSArrtime_new,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;

我得到以下错误

2017-02-26 19:37:19,606 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:

字段投影无效。架构中不存在投影字段[myfuncs]:年份:bytearray,月份:再见,D月:再见,星期一:再见,DepTime公司_实际:int,CRSDeptime:内景,Arrtime公司_实际:int,CRSArrtime:内景,统一UECARRAY公司:再见,拜蒂亚雷,尾号_飞机:bytearray,实际值apsedTime:再见,CRSEl公司apsedTime:再见,播放时间:再见,阿雷德:再见,延迟:再见,产地:bytearray,目的地:再见,距离:bytearray,出租车司机:再见,滑行:再见,取消:bytearray,取消TeabyArray代码,改道:bytearray,汽车里德莱:再见,Wea公司塞德尔:再见,纳斯德尔:再见,Secu丽泰:再见,LateAirc公司莱夫德莱:再见。在

想知道为什么我不能使用python函数来操作列值吗


Tags: orgnewapacheas公司intzeropig
2条回答

尝试使用以下语法:

airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek, myfuncs.zero(DepTime_actual) AS DepTime_actual_new,myfuncs.zero.CRSDeptime AS CRSDeptime_new,myfuncs.zero.Arrtime_actual AS Arrtime_actual_new,myfuncs.zero.CRSArrtime AS CRSArrtime_new,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;

开始工作了!!!以下是小的修正

#!/usr/bin/python

@outputSchema("num:int")

def zero(time):
        return time.zfill(4);


REGISTER '/home/Jig13517/zeropad.py' using jython AS func ;


airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek,func.zero(Airlines_data_schema.DepTime_actual) AS DepTime_actual_new:int,func.zero(Airlines_data_schema.CRSDeptime) AS CRSDeptime_new:int,func.zero(Airlines_data_schema.Arrtime_actual) AS Arrtime_actual_new:int,func.zero(Airlines_data_schema.CRSArrtime) AS CRSArrtime_new:int,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;

相关问题 更多 >

    热门问题