在Pandas中将持续时间的不同文本字段转换为秒

2024-10-05 10:42:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个dataframe,它以文本值的形式包含一次旅行的持续时间,如下所示的driving_duration_text列中所示。在

print df

                                              yelp_id driving_duration_text  \
0                    alexander-rubin-photography-napa        1 hour 43 mins   
1                             jumas-automotive-napa-2        1 hour 32 mins   
2                       larson-brothers-painting-napa        1 hour 30 mins   
3                            preferred-limousine-napa        1 hour 32 mins   
4                            cardon-y-el-tirano-miami        1 day  16 hours   
5                                    sweet-dogs-miami        1 day  3  hours 

如你所见,有些是用小时写的,有些是用天来写的。如何将此格式转换为秒?在


Tags: text文本dataframedf形式持续时间durationprint
2条回答

鉴于文本似乎遵循标准格式,这就相对简单了。我们需要把绳子分开,组合成相关的部分,然后处理它们。在

def parse_duration(duration):
    items = duration.split()
    words = items[1::2]
    counts = items[::2]
    seconds = 0
    for i, each in enumerate(words):
        seconds += get_seconds(each, counts[i])
    return seconds

def get_seconds(word, count):
    counts = {
        'second': 1,
        'minute': 60,
        'hour': 3600,
        'day': 86400
        # and so on
    }
    # Bit complicated here to handle plurals
    base = counts.get(word[:-1], counts.get(word, 0))
    return base * count

更新:

In [150]: df['seconds'] = (pd.to_timedelta(df['driving_duration_text']
   .....:                                    .str.replace(' ', '')
   .....:                                    .str.replace('mins', 'min'))
   .....:                    .dt.total_seconds())

In [151]: df
Out[151]:
                            yelp_id driving_duration_text   seconds
0  alexander-rubin-photography-napa        1 hour 43 mins    6180.0
1           jumas-automotive-napa-2        1 hour 32 mins    5520.0
2     larson-brothers-painting-napa        1 hour 30 mins    5400.0
3          preferred-limousine-napa        1 hour 32 mins    5520.0
4          cardon-y-el-tirano-miami       1 day  16 hours  144000.0
5                  sweet-dogs-miami       1 day  3  hours   97200.0

旧答案:

你可以这样做:

^{pr2}$

输出:

In [64]: df
Out[64]:
                            yelp_id driving_duration_text  seconds
0  alexander-rubin-photography-napa        1 hour 43 mins     6180
1           jumas-automotive-napa-2        1 hour 32 mins     5520
2     larson-brothers-painting-napa        1 hour 30 mins     5400
3          preferred-limousine-napa        1 hour 32 mins     5520
4          cardon-y-el-tirano-miami       1 day  16 hours   144000
5                  sweet-dogs-miami       1 day  3  hours    97200

相关问题 更多 >

    热门问题