如何将一个值转换成hhmm格式,当它在pig拉丁语scrip中没有前导零时

2024-09-24 06:23:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图找出猪关系中两个不同时间域之间的区别。我可以使用pig的todate()方法,但它应该是hhmm格式。但是,它没有前导零。例如,如果这两个字段的值分别为1245和1425,我可以通过使用todate转换它们来找到差异。但是,如果值是945和823,则无法使用todate进行转换,因为没有前导零。你知道吗

不过,我编写了一个python udf,试图左填充一个0。请在下面找到代码

 @outputSchema("time:bytearray")


def zero(time):
        time = str(time)
        if len(time)<= 3:
                return '0'+ time
        else:
                return time

步骤1:注册python函数

REGISTER '/home/Jig13517/zeropad.py' using jython AS myfuncs ;

请找到下面的关系

Airlines_data_schema = LOAD '/user/Jig13517/pigsample/Airlines_data.csv' USING PigStorage('\t') AS (Year,Month,DayofMonth,DayofWeek,DepTime_actual,CRSDeptime,Arrtime_actual,CRSArrtime,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay);

=============================================

然后我试着用零填充列值

airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek,myfuncs.zero($4) AS DepTime_actual_new,myfuncs.zero($5) AS CRSDeptime_new,myfuncs.zero($6) AS Arrtime_actual_new,myfuncs.zero($7) AS CRSArrtime_new,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;

=======================================

应用python自定义项后的示例数据

 (2008,1,3,4,617,615,652,650,WN,11,N689SW,95,95,70,2,2,IND,MCI,451,6,19,0,,0,NA,NA,NA,NA,NA,,,,None,None,None,None,,,,,,,,,,,,,,,,,,,,,)

但我们可以看到上面它没有转换列值。我得到相同的领域不变。请让我知道我的自定义项有什么问题,或者有没有什么pig方法来完成这个任务。你知道吗


Tags: 方法nonenewdatatime关系asna
2条回答

^{}函数可能会有所帮助

你知道吗输入.txt你知道吗

1245
1425
945
823

猪_自定义项.py你知道吗

@outputSchema('time:chararray')
def lpad_time(time):
    return time.zfill(4)

时间_格式化程序.pig你知道吗

register pig_udfs.py using jython as myfuncs;
A = LOAD 'input.txt' USING PigStorage();
B = FOREACH A GENERATE myfuncs.lpad_time((chararray) $0);
\d B

输出

(1245)
(1425)
(0945)
(0823)

显然,可以让Python自己完成整个todate函数。。。你知道吗

另外,我不清楚你的问题,如果分钟是零填充。你知道吗


编辑

你知道吗航空公司.csv你知道吗

2008,1,3,4,617,615,652,650,WN,11,N689SW,95,95,70,2,2,IND,MCI,451,6,19,0,,0,NA,NA,NA,NA,NA,,,,None,None,None,None,,,,,,,,,,,,,,,,,,,,,

清管器代码

register pig_udfs.py using jython as myfuncs;
A = LOAD 'airlines.csv' USING PigStorage(',');
B = FOREACH A GENERATE $0 AS Year, $1 AS Month, $2 AS DayofMonth, $4 AS DayofWeek,myfuncs.lpad_time((chararray) $4) AS DepTime_actual_new,myfuncs.lpad_time((chararray) $5) AS CRSDeptime_new,myfuncs.lpad_time((chararray) $6) AS Arrtime_actual_new,myfuncs.lpad_time((chararray) $7) AS CRSArrtime_new,$8 AS UniqueCarrier,$9 AS FlightNum,$10 AS TailNum_Plane,$11 AS ActualElapsedTime, $12 AS CRSElapsedTime, $13 AS Airtime, $14 AS Arrdelay, $15 AS Depdelay, $16 AS Origin, $17 AS Dest, $18 AS Distance, $19 AS Taxiin, $20 AS Taxiout, $21 AS Cancelled, $22 AS CancellationCode, $23 AS Diverted, $24 AS CarrierDelay, $25 AS WeatherDelay, $26 AS NASDelay, $27 AS SecurityDelay, $28 AS LateAircraftDelay ;
\d B

输出

(2008,1,3,617,0617,0615,0652,0650,WN,11,N689SW,95,95,70,2,2,IND,MCI,451,6,19,0,,0,NA,NA,NA,NA,NA)

嘿@cricket\u007我成功了。我以bytearray的身份通过了列字段,这是我犯的错误。当我把模式改为chararray时,它开始填充零。谢谢。 请查看以下更正记录:

(2008,1,3,40617061506520650,WN,11,N689SW,95,95,70,2,2,IND,MCI,451,6,19,0,,0,不适用,不适用,不适用,不适用) (2008,1,3,406280620008040750,WN,448,N428WN,96,90,76,14,8,IND,BWI,515,3,17,0,,0,NA,NA,NA,NA,NA)

相关问题 更多 >