如何有效地从格式为HHMM、HMM、MM和M的整数值列中提取小时和分钟？

from pandas import read_csv, to_datetime url = lambda year: f'ftp://sidads.colorado.edu/pub/DATASETS/NOAA/G00807/IIP_{year}IcebergSeason.csv' df = read_csv(url(2011)) def convert_float_column_to_int_column(df, *column_names): for column_name in column_names: try: df[column_name] = df[column_name].astype(int) except ValueError: df = df.dropna(subset=[column_name]).reset_index(drop=True) df[column_name] = df[column_name].astype(int) return df df2 = convert_float_column_to_int_column(df, 'ICEBERG_NUMBER', 'SIGHTING_TIME') df2['SIGHTING_TIME'] = to_datetime(df2['SIGHTING_TIME'].astype(str), format='%H%M')

1条回答

网友

1楼 · 发布于 2024-09-27 21:32:32

不需要if语句。Series.str.zfill将用正确的零数填充它，以获得正确的格式。然后使用pd.to_datetime，减去1900-01-01，这是当这些字段都不存在时将使用的日期：

输入数据

import pandas as pd
df = pd.DataFrame({'Time': [1, 12, 123, 1234]})
#   Time
#0     1
#1    12
#2   123
#3  1234

`pd.to_datetime`

df['Time'] = (pd.to_datetime(df.Time.astype(str).str.zfill(4), format='%H%M') 
              - pd.to_datetime('1900-01-01'))

#0   00:01:00
#1   00:12:00
#2   01:23:00
#3   12:34:00
#Name: Time, dtype: timedelta64[ns]

`pd.to_timedelta`

也可以使用，但由于无法指定格式参数，因此需要事先清除所有内容：

df['Time'] = df.Time.astype(str).str.zfill(4)

# Pandas .str methods are slow, use a list comprehension to speed it up
#df['Time'] = df.Time.str[0:2] + ':' + df.Time.str[2:4] + ':00'    
csize=2
df['Time'] = [':'.join(x[i:i+csize] for i in range(0, len(x), csize))+':00' for x in df.Time.values]

df['Time'] = pd.to_timedelta(df.Time)

#0   00:01:00
#1   00:12:00
#2   01:23:00
#3   12:34:00
#Name: Time, dtype: timedelta64[ns]

输入数据

`pd.to_datetime`

`pd.to_timedelta`

相关问题更多 >

编程相关推荐

热门问题

热门文章