从另一个数据帧创建数据帧的更快方法

df = pd.DataFrame(columns=['id', 'active_years']) ix = 0 for _, row in raw_dataset.iterrows(): st_yr = int(row['start_date'].split('-')[0]) # because dates are in the format yyyy-mm-dd end_yr = int(row['end_date'].split('-')[0]) for year in range(st_yr, end_yr+1): df.loc[ix, 'id'] = row['ID'] df.loc[ix, 'active_years'] = year ix = ix + 1

raw_dataset = pd.DataFrame({'ID':['a121','b142','cd3'],'start_date':['2019-10-09','2017-02-06','2012-12-05'],'end_date':['2020-01-30','2019-08-23','2016-06-18']}) print(raw_dataset) ID start_date end_date 0 a121 2019-10-09 2020-01-30 1 b142 2017-02-06 2019-08-23 2 cd3 2012-12-05 2016-06-18 # the desired dataframe should look like this print(desired_df) id active_years 0 a121 2019 1 a121 2020 2 b142 2017 3 b142 2018 4 b142 2019 5 cd3 2012 6 cd3 2013 7 cd3 2014 8 cd3 2015 9 cd3 2016

1条回答

网友

1楼 · 发布于 2024-09-30 02:24:32

动态增长的python列表比动态增长的numpy数组快得多（numpy数组是pandas数据帧的底层数据结构）。请参阅here以获取简要说明。记住这一点：

import pandas as pd

# Initialize input dataframe
raw_dataset = pd.DataFrame({
    'ID':['a121','b142','cd3'],
    'start_date':['2019-10-09','2017-02-06','2012-12-05'],
    'end_date':['2020-01-30','2019-08-23','2016-06-18'],
})

# Create integer columns for start year and end year
raw_dataset['start_year'] = pd.to_datetime(raw_dataset['start_date']).dt.year
raw_dataset['end_year'] = pd.to_datetime(raw_dataset['end_date']).dt.year

# Iterate over input dataframe rows and individual years
id_list = []
active_years_list = []
for row in raw_dataset.itertuples():
    for year in range(row.start_year, row.end_year+1):
        id_list.append(row.ID)
        active_years_list.append(year)

# Create result dataframe from lists
desired_df = pd.DataFrame({
    'id': id_list,
    'active_years': active_years_list,
})

print(desired_df)
# Output:
#     id  active_years
# 0  a121          2019
# 1  a121          2020
# 2  b142          2017
# 3  b142          2018
# 4  b142          2019
# 5   cd3          2012
# 6   cd3          2013
# 7   cd3          2014
# 8   cd3          2015
# 9   cd3          2016

相关问题更多 >

编程相关推荐

热门问题

热门文章