如何根据连续行中的字符串填写df中的空白单元格

3条回答

网友

1楼 · 编辑于 2024-09-26 18:06:03

首先使用^{}将Time列转换为datetime系列t，然后使用^{}生成一个周期范围，该周期范围具有每月频率，起始周期等于计算的周期，周期数等于系列t的length，最后使用带有格式说明符%b %Y的^{}要以所需格式返回period_range的字符串表示形式，请执行以下操作：

t = pd.to_datetime(df['Time'])
df['Time'] = pd.period_range(
    t.min().to_period('M') - t.idxmin(), periods=len(t), freq='M').strftime('%b %Y')

详情：

# print(t)
0           NaT
1    2019-07-01
2           NaT
3           NaT
4    2019-10-01
5           NaT
6           NaT
7    2020-01-01
8           NaT
9           NaT
10   2020-04-01
11          NaT
Name: Time, dtype: datetime64[ns]

# print(t.min(), t.idxmin())
Timestamp('2019-07-01 00:00:00'), 1

# print(t.min().to_period('M') - t.idxmin())
Period('2019-06', 'M') # starting period of the period range

结果:

# print(df)

        Time Value
0   Jun 2019  60.8
1   Jul 2019  58.1
2   Aug 2019  58.8
3   Sep 2019  56.9
4   Oct 2019  51.8
5   Nov 2019  54.6
6   Dec 2019  56.8
7   Jan 2020  58.8
8   Feb 2020  54.2
9   Mar 2020  51.3
10  Apr 2020  52.2
11  May 2020  48.7

网友

2楼 · 编辑于 2024-09-26 18:06:03

好吧，我花了比我想承认的时间更长的时间。我为你的第一个答案解决了

输出：

***********************BEFORE********************************

['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']
        Time Value
0             60.8
1   Jul 2019  58.1
2             58.8
3             56.9
4   Oct 2019  51.8
5             54.6
6             56.8
7   Jan 2020  58.8
8             54.2
9             51.3
10  Apr 2020  52.2
11            48.7

***********************AFTER********************************

['Jun 2019', 'Jul 2019', 'Aug 2019', 'Sep 2019', 'Oct 2019', 'Nov 2019', 'Dec 2019', 'Jan 2020', 'Feb 2020', 'Mar 2020', 'Apr 2020', 'May 2020']
        Time Value
0   Jun 2019  60.8
1   Jul 2019  58.1
2   Aug 2019  58.8
3   Sep 2019  56.9
4   Oct 2019  51.8
5   Nov 2019  54.6
6   Dec 2019  56.8
7   Jan 2020  58.8
8   Feb 2020  54.2
9   Mar 2020  51.3
10  Apr 2020  52.2
11  May 2020  48.7

代码：

from datetime import datetime
from dateutil.relativedelta import relativedelta
totalmonth=['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']

new_totalmonth = [datetime.strptime(x,'%b %Y') for x in totalmonth if x != '' ]
index = totalmonth.index(min(new_totalmonth).strftime('%b %Y'))
new_totalmonth = [(min(new_totalmonth) + relativedelta(months=x)).strftime('%b %Y') for x in range(-index,len(totalmonth) - index)]
print(new_totalmonth)

细分

这行代码创建了一个所有有效日期的列表，并将它们以我可以运行min（）函数的格式放置

new_totalmonth = [datetime.strptime(x,'%b %Y') for x in totalmonth if x != '' ]

这个打印出来的是什么

print(new_totalmonth)

[datetime.datetime(2019, 7, 1, 0, 0), datetime.datetime(2019, 10, 1, 0, 0), datetime.datetime(2020, 1, 1, 0, 0), datetime.datetime(2020, 4, 1, 0, 0)]

这将创建变量索引并为其分配totalmonth中最小日期的索引

index = totalmonth.index(min(new_totalmonth).strftime('%b %Y'))

min(new_totalmonth)  # this is finding the minimum date in new_totalmonth
print(min(new_totalmonth))

2019-07-01 00:00:00

min(new_totalmonth).strftime('%b %Y')  # This is putting that minimum in a format that matches what is in totalmonth so the function totalmonth.index() can get the correct index

print(min(new_totalmonth).strftime('%b %Y'))

Jul 2019

这是使用列表理解

new_totalmonth = [(min(new_totalmonth) + relativedelta(months=x)).strftime('%b %Y') for x in range(-index,len(totalmonth) - index)]

我正在使用totalmonth中最小日期的索引来操纵我要添加到totalmonth中最小月份的值范围（多少个月）

range(-index,len(totalmonth) - index)

print(list(range(-index,len(totalmonth) - index)))
[-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

由于最短月份（2019年7月）为指数1，我需要加上-1个月，以获得2019年6月之前的月份

因此，它可以被分解为：

(min(new_totalmonth) + relativedelta(months=-1)).strftime('%b %Y') = Jun 2019
(min(new_totalmonth) + relativedelta(months=0)).strftime('%b %Y') = Ju1 2019
(min(new_totalmonth) + relativedelta(months=1)).strftime('%b %Y') = Aug 2019
...
(min(new_totalmonth) + relativedelta(months=10)).strftime('%b %Y') = May 2019

取所有这些值并将它们放入每月的列表中

print(new_totalmonth)
['Jun 2019', 'Jul 2019', 'Aug 2019', 'Sep 2019', 'Oct 2019', 'Nov 2019', 'Dec 2019', 'Jan 2020', 'Feb 2020', 'Mar 2020', 'Apr 2020', 'May 2020']

网友

3楼 · 编辑于 2024-09-26 18:06:03

“时间”列中的最小值减1是开始月份，“时间”列中的最大值加2是最后一个月，目标列用date_range()更新以获得连续值

df['Time'] = pd.to_datetime(df['Time'])
startM = datetime.datetime((df['Time'].min()).year,(df['Time'].min()).month-1,1)
endM = datetime.datetime((df['Time'].max()).year,(df['Time'].max()).month+2,1)
df['Time'] = pd.date_range(startM,endM, freq='1M')
df
    Time    Value
0   2019-06-30  60.8
1   2019-07-31  58.1
2   2019-08-31  58.8
3   2019-09-30  56.9
4   2019-10-31  51.8
5   2019-11-30  54.6
6   2019-12-31  56.8
7   2020-01-31  58.8
8   2020-02-29  54.2
9   2020-03-31  51.3
10  2020-04-30  52.2
11  2020-05-31  48.7

相关问题更多 >

编程相关推荐

热门问题

热门文章