如何根据连续行中的字符串填写df中的空白单元格

2024-09-26 18:06:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个df,数据如下所示:

  Time      Value
            60.8
  Jul 2019  58.1
            58.8
            56.9
  Oct 2019  51.8
            54.6
            56.8
  Jan 2020  58.8
            54.2
            51.3
  Apr 2020  52.2
<>我想根据日历年填写^ {< CD1>}变量中的空白单元格。因此:

  Time      Value
  Jun 2019  60.8
  Jul 2019  58.1
  Aug 2019  58.8
  Sep 2019  56.9
  Oct 2019  51.8
  Nov 2019  54.6
  Dec 2019  56.8
  Jan 2020  58.8
  Feb 2020  54.2
  Mar 2020  51.3
  Apr 2020  52.2

我看到一篇文章,其中pandas可以用于fill in numeric values,但由于我的变量不一定是以数字方式定义的,所以我不完全确定如何在这种情况下应用它

在我看来,有两种方法可以做到这一点:1)在向df写信之前修改列表。2) 修改df

我更喜欢第一种解决方案,但不确定是否可行

谢谢

我的剧本:

totalmonth=['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']
totalvalue=['60.8', '58.1', '58.8', '56.9', '51.8', '54.6', '56.8', '58.8', '54.2', '51.3', '52.2', '48.7']

df = pd.DataFrame({'Time': totalmonth,
                   'Value': totalvalue})

Tags: 数据dftimevalue空白octaprjun
3条回答

首先使用^{}Time列转换为datetime系列t,然后使用^{}生成一个周期范围,该周期范围具有每月频率,起始周期等于计算的周期,周期数等于系列tlength,最后使用带有格式说明符%b %Y^{}要以所需格式返回period_range的字符串表示形式,请执行以下操作:

t = pd.to_datetime(df['Time'])
df['Time'] = pd.period_range(
    t.min().to_period('M') - t.idxmin(), periods=len(t), freq='M').strftime('%b %Y')

详情:

# print(t)
0           NaT
1    2019-07-01
2           NaT
3           NaT
4    2019-10-01
5           NaT
6           NaT
7    2020-01-01
8           NaT
9           NaT
10   2020-04-01
11          NaT
Name: Time, dtype: datetime64[ns]

# print(t.min(), t.idxmin())
Timestamp('2019-07-01 00:00:00'), 1

# print(t.min().to_period('M') - t.idxmin())
Period('2019-06', 'M') # starting period of the period range

结果:

# print(df)

        Time Value
0   Jun 2019  60.8
1   Jul 2019  58.1
2   Aug 2019  58.8
3   Sep 2019  56.9
4   Oct 2019  51.8
5   Nov 2019  54.6
6   Dec 2019  56.8
7   Jan 2020  58.8
8   Feb 2020  54.2
9   Mar 2020  51.3
10  Apr 2020  52.2
11  May 2020  48.7

好吧,我花了比我想承认的时间更长的时间。我为你的第一个答案解决了

输出:

***********************BEFORE********************************

['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']
        Time Value
0             60.8
1   Jul 2019  58.1
2             58.8
3             56.9
4   Oct 2019  51.8
5             54.6
6             56.8
7   Jan 2020  58.8
8             54.2
9             51.3
10  Apr 2020  52.2
11            48.7

***********************AFTER********************************

['Jun 2019', 'Jul 2019', 'Aug 2019', 'Sep 2019', 'Oct 2019', 'Nov 2019', 'Dec 2019', 'Jan 2020', 'Feb 2020', 'Mar 2020', 'Apr 2020', 'May 2020']
        Time Value
0   Jun 2019  60.8
1   Jul 2019  58.1
2   Aug 2019  58.8
3   Sep 2019  56.9
4   Oct 2019  51.8
5   Nov 2019  54.6
6   Dec 2019  56.8
7   Jan 2020  58.8
8   Feb 2020  54.2
9   Mar 2020  51.3
10  Apr 2020  52.2
11  May 2020  48.7

代码:

from datetime import datetime
from dateutil.relativedelta import relativedelta
totalmonth=['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']

new_totalmonth = [datetime.strptime(x,'%b %Y') for x in totalmonth if x != '' ]
index = totalmonth.index(min(new_totalmonth).strftime('%b %Y'))
new_totalmonth = [(min(new_totalmonth) + relativedelta(months=x)).strftime('%b %Y') for x in range(-index,len(totalmonth) - index)]
print(new_totalmonth)

细分

这行代码创建了一个所有有效日期的列表,并将它们以我可以运行min()函数的格式放置

new_totalmonth = [datetime.strptime(x,'%b %Y') for x in totalmonth if x != '' ]

这个打印出来的是什么

print(new_totalmonth)

[datetime.datetime(2019, 7, 1, 0, 0), datetime.datetime(2019, 10, 1, 0, 0), datetime.datetime(2020, 1, 1, 0, 0), datetime.datetime(2020, 4, 1, 0, 0)]

这将创建变量索引并为其分配totalmonth中最小日期的索引

index = totalmonth.index(min(new_totalmonth).strftime('%b %Y'))

min(new_totalmonth)  # this is finding the minimum date in new_totalmonth
print(min(new_totalmonth))

2019-07-01 00:00:00
min(new_totalmonth).strftime('%b %Y')  # This is putting that minimum in a format that matches what is in totalmonth so the function totalmonth.index() can get the correct index

print(min(new_totalmonth).strftime('%b %Y'))

Jul 2019

这是使用列表理解

new_totalmonth = [(min(new_totalmonth) + relativedelta(months=x)).strftime('%b %Y') for x in range(-index,len(totalmonth) - index)]

我正在使用totalmonth中最小日期的索引来操纵我要添加到totalmonth中最小月份的值范围(多少个月)

range(-index,len(totalmonth) - index)

print(list(range(-index,len(totalmonth) - index)))
[-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

由于最短月份(2019年7月)为指数1,我需要加上-1个月,以获得2019年6月之前的月份

因此,它可以被分解为:

(min(new_totalmonth) + relativedelta(months=-1)).strftime('%b %Y') = Jun 2019
(min(new_totalmonth) + relativedelta(months=0)).strftime('%b %Y') = Ju1 2019
(min(new_totalmonth) + relativedelta(months=1)).strftime('%b %Y') = Aug 2019
...
(min(new_totalmonth) + relativedelta(months=10)).strftime('%b %Y') = May 2019

取所有这些值并将它们放入每月的列表中

print(new_totalmonth)
['Jun 2019', 'Jul 2019', 'Aug 2019', 'Sep 2019', 'Oct 2019', 'Nov 2019', 'Dec 2019', 'Jan 2020', 'Feb 2020', 'Mar 2020', 'Apr 2020', 'May 2020']

“时间”列中的最小值减1是开始月份,“时间”列中的最大值加2是最后一个月,目标列用date_range()更新以获得连续值

df['Time'] = pd.to_datetime(df['Time'])
startM = datetime.datetime((df['Time'].min()).year,(df['Time'].min()).month-1,1)
endM = datetime.datetime((df['Time'].max()).year,(df['Time'].max()).month+2,1)
df['Time'] = pd.date_range(startM,endM, freq='1M')
df
    Time    Value
0   2019-06-30  60.8
1   2019-07-31  58.1
2   2019-08-31  58.8
3   2019-09-30  56.9
4   2019-10-31  51.8
5   2019-11-30  54.6
6   2019-12-31  56.8
7   2020-01-31  58.8
8   2020-02-29  54.2
9   2020-03-31  51.3
10  2020-04-30  52.2
11  2020-05-31  48.7

相关问题 更多 >

    热门问题