如何将1添加到数据框列中的所有数字==到0

2024-10-03 13:20:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正试图计算灾害的长度,以天为单位,然后使用这一列,即开始日期和结束日期之间的差值,使用groupby(我认为),以计算每年的灾害长度,因为我的数据集是从1960年到现在。最后,我还想按灾害类型对其进行分组,以了解特定灾害的持续时间是如何随时间而变化的,但一步一步

到目前为止,我已经将日期转换为pd.datetime格式,然后使用下面的代码创建两个日期不同的列


 #Create new Column == Disaster Length
df_time['Disaster_Length'] = (df_time.Start_Date_A - df_time.End_Date_A)

第2部分问题:

A.我该如何创建一个循环,循环的行是——对于列Start_Date_A==0中的I,添加+1——很抱歉,我对这一点不太熟悉,需要它来确保即使灾难在某一天开始和结束,它也算作1天而不是0

B.将灾难长度列G从一个系列更改为整数以便计算它们的最佳方法是什么

完整代码:

import numpy as np
import matplotlib.pyplot as plt 
import pandas as pd 
import seaborn as sns 

#Import Datased
df = pd.read_csv('database.csv')

df_time = (df[['County','Disaster Type','Start Date', 'End Date']][0: :])

           
#Number of NaN values          
df_nan = df[['County','Disaster Type','Start Date', 'End Date']].isna().sum()


#NaN values as a percentage as total 
df_nan_percent = (df_nan.sum(axis=0))
NAN_percentage = ['0.0116%']


#Remove NaN values
df_time.dropna(subset = ["County", 'End Date'], inplace=True)


#Set Date Format
df_time['Start_Date_A'] = pd.to_datetime(df['Start Date'], format='%m/%d/%Y')
df_time['End_Date_A'] = pd.to_datetime(df['End Date'], format='%m/%d/%Y')

#Create new Column == Disaster Length
df_time['Disaster_Length'] = (df_time.Start_Date_A - df_time.End_Date_A)


#Dropped Date Old Date Formats from df
df_time = df_time.drop(columns=['Start Date', 'End Date'], axis=1)

#Make County Column the Index, as NaN altered Index Consistency 
df_time.set_index('County', inplace=True)

可复制df

County,Disaster Type,Start_Date_A,End_Date_A,Disaster_Length
Clay County,Flood,1959-01-29,1959-01-29,0 days
Alpine County,Flood,1964-12-24,1964-12-24,0 days
Amador County,Flood,1964-12-24,1964-12-24,0 days
Butte County,Flood,1964-12-24,1964-12-24,0 days
Colusa County,Flood,1964-12-24,1964-12-24,0 days
Del Norte County,Flood,1964-12-24,1964-12-24,0 days
El Dorado County,Flood,1964-12-24,1964-12-24,0 days
Glenn County,Flood,1964-12-24,1964-12-24,0 days
Humboldt County,Flood,1964-12-24,1964-12-24,0 days
Lake County,Flood,1964-12-24,1964-12-24,0 days
Lassen County,Flood,1964-12-24,1964-12-24,0 days
Marin County,Flood,1964-12-24,1964-12-24,0 days
Mendocino County,Flood,1964-12-24,1964-12-24,0 days
Modoc County,Flood,1964-12-24,1964-12-24,0 days
Napa County,Flood,1964-12-24,1964-12-24,0 days
Nevada County,Flood,1964-12-24,1964-12-24,0 days
Placer County,Flood,1964-12-24,1964-12-24,0 days
Plumas County,Flood,1964-12-24,1964-12-24,0 days
Sacramento County,Flood,1964-12-24,1964-12-24,0 days
San Joaquin County,Flood,1964-12-24,1964-12-24,0 days
Shasta County,Flood,1964-12-24,1964-12-24,0 days
Sierra County,Flood,1964-12-24,1964-12-24,0 days
Siskiyou County,Flood,1964-12-24,1964-12-24,0 days
Solano County,Flood,1964-12-24,1964-12-24,0 days
Sonoma County,Flood,1964-12-24,1964-12-24,0 days
Stanislaus County,Flood,1964-12-24,1964-12-24,0 days
Sutter County,Flood,1964-12-24,1964-12-24,0 days
Tehama County,Flood,1964-12-24,1964-12-24,0 days
Trinity County,Flood,1964-12-24,1964-12-24,0 days
Tuolumne County,Flood,1964-12-24,1964-12-24,0 days
Yolo County,Flood,1964-12-24,1964-12-24,0 days
Yuba County,Flood,1964-12-24,1964-12-24,0 days
Baker County,Flood,1964-12-24,1964-12-24,0 days
Benton County,Flood,1964-12-24,1964-12-24,0 days
Clackamas County,Flood,1964-12-24,1964-12-24,0 days
Clatsop County,Flood,1964-12-24,1964-12-24,0 days
Columbia County,Flood,1964-12-24,1964-12-24,0 days
Coos County,Flood,1964-12-24,1964-12-24,0 days
Crook County,Flood,1964-12-24,1964-12-24,0 days
Curry County,Flood,1964-12-24,1964-12-24,0 days
Deschutes County,Flood,1964-12-24,1964-12-24,0 days
Douglas County,Flood,1964-12-24,1964-12-24,0 days
Gilliam County,Flood,1964-12-24,1964-12-24,0 days
Grant County,Flood,1964-12-24,1964-12-24,0 days
Harney County,Flood,1964-12-24,1964-12-24,0 days
Hood River County,Flood,1964-12-24,1964-12-24,0 days
Jackson County,Flood,1964-12-24,1964-12-24,0 days
Jefferson County,Flood,1964-12-24,1964-12-24,0 days
Josephine County,Flood,1964-12-24,1964-12-24,0 days
Klamath County,Flood,1964-12-24,1964-12-24,0 days
Lake County,Flood,1964-12-24,1964-12-24,0 days
Lane County,Flood,1964-12-24,1964-12-24,0 days
Lincoln County,Flood,1964-12-24,1964-12-24,0 days
Linn County,Flood,1964-12-24,1964-12-24,0 days
Malheur County,Flood,1964-12-24,1964-12-24,0 days
Marion County,Flood,1964-12-24,1964-12-24,0 days
Morrow County,Flood,1964-12-24,1964-12-24,0 days
Multnomah County,Flood,1964-12-24,1964-12-24,0 days
Polk County,Flood,1964-12-24,1964-12-24,0 days
Sherman County,Flood,1964-12-24,1964-12-24,0 days
Tillamook County,Flood,1964-12-24,1964-12-24,0 days
Umatilla County,Flood,1964-12-24,1964-12-24,0 days
Union County,Flood,1964-12-24,1964-12-24,0 days
Wallowa County,Flood,1964-12-24,1964-12-24,0 days
Wasco County,Flood,1964-12-24,1964-12-24,0 days
Washington County,Flood,1964-12-24,1964-12-24,0 days
Wheeler County,Flood,1964-12-24,1964-12-24,0 days
Yamhill County,Flood,1964-12-24,1964-12-24,0 days
Asotin County,Flood,1964-12-29,1964-12-29,0 days
Benton County,Flood,1964-12-29,1964-12-29,0 days
Clark County,Flood,1964-12-29,1964-12-29,0 days
Columbia County,Flood,1964-12-29,1964-12-29,0 days
Cowlitz County,Flood,1964-12-29,1964-12-29,0 days
Garfield County,Flood,1964-12-29,1964-12-29,0 days
Grays Harbor County,Flood,1964-12-29,1964-12-29,0 days
King County,Flood,1964-12-29,1964-12-29,0 days
Kittitas County,Flood,1964-12-29,1964-12-29,0 days
Klickitat County,Flood,1964-12-29,1964-12-29,0 days
Lewis County,Flood,1964-12-29,1964-12-29,0 days
Mason County,Flood,1964-12-29,1964-12-29,0 days


Tags: importdfdatetimeasnandaysstart
3条回答

首先将列创建代码更改为:

df_time['Disaster_Length'] = (df_time.Start_Date_A - df_time.End_Date_A).dt.days

那么这应该是可行的:

for i in range(len(df_time)):
    if df_time.iloc[i]['Disaster_Length'] == 0:
        df_time.iloc[i]['Disaster_Length'] = 1

这样,任何持续一天的灾难都将等于1,而不是0

使用^{}将TiemDelta转换为整数,然后使用replace

df_time['Disaster_Length'] = (df_time.Start_Date_A - df_time.End_Date_A).dt.days

df_time['Disaster_Length'] = df_time['Disaster_Length'].replace({0:1})

或通过掩码将值设置为1

df_time['Disaster_Length'] = (df_time.Start_Date_A - df_time.End_Date_A).dt.days

df_time.loc[df_time['Disaster_Length'].eq(0), 'Disaster_Length'] = 1

可以在apply方法上使用lambda函数,如下所示:

df_time['Disaster_Length'] = df_time['Disaster_Length'].apply(lambda x : 1 if x ==0 else x)

相关问题 更多 >