如何在pandas dataframe中将yyyymmdd格式转换为mm-dd-yyyy?

2024-09-29 02:20:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用列incident history(例如Class II:O:20181119)yyyymmdd重新格式化为数据帧中的mm dd yyyyy,但需要注意的是,有些单元格与具有多个类的某些单元格不相等

enter image description here

我尝试过使用“分割并应用”,但我无法将其操纵到可以清理的程度

import pandas as pd
df = pd.read_excel('C:/Users/blablabla')

我试过这样做,但它只输出NaN:

^{pr2}$

也尝试过,但得到了TypeError: ("'float' object is not iterable", 'occurred at index 0')

def foo(c):
   for x in c['incident history']:
       return x        
df['incident history reformed'] = df.apply(foo, axis=1)
print (df['incident history reformed'])

enter image description here


Tags: 数据dffoohistoryddclassiipd
3条回答

假设您正在使用incident_history中的字符串,并且日期的格式是YYYYMMDD,并且日期总是显示在最后一个:之后的字符串末尾,那么您可以这样做:

import pandas as pd

df = pd.DataFrame(data={
    'incident_history': [
        'Class II:R:20180920',
        'Class II:O:20181119',
        'Class II:O:20181119',
        'Class O:D1:20170601',
        'Class O:D1:20190219',
    ],
})

def get_date(s):
    i = s.rfind(":")  # find last occurence of ":" in string
    date_string = s[i+1:]    
    return pd.to_datetime(date_string, format="%Y%m%d")

df.incident_history.apply(get_date)

您可以使用下面的一行:

^{pr2}$

看来你已经很接近了。以下几点对我有用:

import pandas as pd

data = ['Class II: R : 20180920','Class II: O : 20181119','Class II: D1: 20170601','Class O: D1: 20190219']

df = pd.DataFrame({"incident_history":data})

def extract_dt(dt_str):
    out_str = dt_str[dt_str.rfind(":")+1:].strip()
    return pd.to_datetime(out_str, format="%Y%m%d")

df['incident_history_reformed'] = df["incident_history"].apply(extract_dt)

其中incident_history_reformed将具有datetime64[ns]类型,从而为pandas提供的所有datetime功能打开了大门。在

我已经尽量使它可读。 您的日期似乎总是列中的最后8个字符Incident history。你可以按你所做的选择它们。我用负指数。在

然后我使用to_datetime(doc)将string类型的列转换为datetime

要对数据帧进行排序,可以使用sort_values,方法是精确地对列进行排序(doc)

df = pd.DataFrame([
                   ["Class II : R : 20180920"],
                   ["Class II : O : 20181109"],
                   ["Class O : D1 : 20170601"],
                   ["Class O : D1 : 20190219"]],
                  columns=["Incident history"])

print(df)
#        Incident history
# 0  Class II: R: 20180920
# 1  Class II: O: 20181109
# 2  Class O: D1: 20170601
# 3  Class O: D1: 20190219

# Create a string variable containing the date
df["date"] = df["Incident history"].str[-8:]
print(df)
#         Incident history date_string
# 0  Class II: R: 20180920    20180920
# 1  Class II: O: 20181109    20181109
# 2  Class O: D1: 20170601    20170601
# 3  Class O: D1: 20190219    20190219

# Transform the date column to the type "date"
df["date"] = pd.to_datetime(df["date"], format="%Y%m%d", yearfirst=True)
print(df)
#         Incident history       date
# 0  Class II: R: 20180920 2018-09-20
# 1  Class II: O: 20181109 2018-11-09
# 2  Class O: D1: 20170601 2017-06-01
# 3  Class O: D1: 20190219 2019-02-19

# Sort according to date
df = df.sort_values(by='date')
print(df)
#         Incident history       date
# 2  Class O: D1: 20170601 2017-06-01
# 0  Class II: R: 20180920 2018-09-20
# 1  Class II: O: 20181109 2018-11-09
# 3  Class O: D1: 20190219 2019-02-19

# Optional : remove the date from "Incident history"
df["Incident history"] = df["Incident history"].str[:-10]
print(df)
# Incident history       date
# 2    Class O: D1  2017-06-01
# 0    Class II: R  2018-09-20
# 1    Class II: O  2018-11-09
# 3    Class O: D1  2019-02-19

相关问题 更多 >