使用日期周和数据行内容重命名多个列

2024-05-10 05:42:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个宽格式的数据框,其中包含日期范围和空字符串作为列名,但第一行有一些预期的列标题,因此我需要一个代码,从标题推断周,然后从第一行选择列名并重命名它(即week1_数量、week1_销售额、week1_利润)

import pandas as pd
df = pd.DataFrame([
    {'Related Fields':'Description', 'Unnamed 1':'barcode',
        'Unnamed 2':'department', 'Unnamed 3':'section',
        'Unnamed 4':'reference', 'Sales: (06/07/2020,12/07/2020)':'Quantity',
        'Unnamed 6':'amount', 'Unnamed 7':'cost',
        'Unnamed 8':'% M/S', 'Unnamed 9': 'profit',
        'Sales: (29/06/2020,05/07/2020)': 'Quantity',
        'Unnamed 11':'amount', 'Unnamed 12':'cost',
        'Unnamed 13':'% M/S', 'Unnamed 14':'profit'},
    {'Related Fields':'cornflakes', 'Unnamed 1':'0001198',
        'Unnamed 2':'grocery', 'Unnamed 3':'breakefast',
        'Unnamed 4': '0001198', 'Sales: (06/07/2020,12/07/2020)': 60,
        'Unnamed 6': 6000, 'Unnamed 7':3000, 'Unnamed 8':50,
        'Unnamed 9':3000, 'Sales: (29/06/2020,05/07/2020)': 120,
        'Unnamed 11':12000, 'Unnamed 12':6000, 'Unnamed 13':50,
        'Unnamed 14':6000}
])

预期结果

df2 = pd.DataFrame([
    {'Description':'cornflakes', 'barcode':'0001198',
        'department':'grocery', 'section':'breakefast',
        'reference':'0001198', 'week28_quantity':60,
        'week28_amount':6000, 'week28_cost':3000,
        'week28_% M/S':50, 'week28_profit':3000,
        'week29_quantity':120, 'week29_amount':6000,
        'week29_cost':6000, 'week29_% M/S':50,
        'week28_profit':6000}
])

我试图手动更改名称,但希望自动解决方案


Tags: 标题dataframefieldsdescriptionamountbarcodepdrelated
1条回答
网友
1楼 · 发布于 2024-05-10 05:42:40

您可以通过使用datetime.strptime解析日期并使用datetime.isocalendar获取周数来解决此问题

from datetime import datetime

# get week numbers
wknums = [
    'week' + str(
        datetime.strptime(colname.split()[1][1:11], '%d/%m/%Y')
        .isocalendar()[1]
    ) + '_'
    if colname.startswith('Sales')
    else None
    for colname in df.columns
]

wknums = (
    pd.Series(wknums).ffill().fillna('') # forward fill week numbers
    + df.loc[0].to_numpy() # add text from first row
).str.lower() # change to lower case, use it only if it helps


df.columns = wknums # replace df column labels
df = df.iloc[1:].reset_index(drop=True) # drop first row

输出

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype
 -                           -
 0   description      1 non-null      object
 1   barcode          1 non-null      object
 2   department       1 non-null      object
 3   section          1 non-null      object
 4   reference        1 non-null      object
 5   week28_quantity  1 non-null      object
 6   week28_amount    1 non-null      object
 7   week28_cost      1 non-null      object
 8   week28_% m/s     1 non-null      object
 9   week28_profit    1 non-null      object
 10  week27_quantity  1 non-null      object
 11  week27_amount    1 non-null      object
 12  week27_cost      1 non-null      object
 13  week27_% m/s     1 non-null      object
 14  week27_profit    1 non-null      object
dtypes: object(15)
memory usage: 248.0+ bytes

相关问题 更多 >