我有一个数据框,其中数据位于另一列中,我希望从该列中获取这些日期,并创建一个日期列并存储它们。这是我的样本数据
df=[['Monday, 13 January 2020','',''],['Task 1',13588,'Jack'],['','','Address 1'],['','','City 1'],['Task 2',13589,'Ammie'],['','','Address 2'],['','','City'],['Task 3',13589,'Amanda'],['','','Address 3'],['','','City 3'],['Tuesday, 14 January 2020','',''],['Task 4',13587,'Chelsea'],['','','Address 4'],['','','City 4'],['Task 5','13586','Ibrahim'],['','','Address 5'],['','','City 5'],['Task 6',13585,'Kate'],['','','Address 6'],['','','City 6']]
df=pd.DataFrame(df)
df.columns = ['Task','ID','Supervisor']
df=df.replace(np.nan,'')
df
Task ID Supervisor
0 Monday, 13 January 2020
1 Task 1 13588 Jack
2 Address 1
3 City 1
4 Task 2 13589 Ammie
5 Address 2
6 City
7 Task 3 13589 Amanda
8 Address 3
9 City 3
10 Tuesday, 14 January 2020
11 Task 4 13587 Chelsea
12 Address 4
13 City 4
14 Task 5 13586 Ibrahim
15 Address 5
16 City 5
17 Task 6 13585 Kate
18 Address 6
19 City 6
我想得到以下输出
Date Task ID Supervisor
0 Monday, 13 January 2020 Task 1 13588 Jack Address 1 City 1
1 Monday, 13 January 2020 Task 2 13589 Ammie Address 2 City
2 Monday, 13 January 2020 Task 3 13589 Amanda Address 3 City 3
3 Tuesday, 14 January 2020 Task 4 13587 Chelsea Address 4 City 4
4 Tuesday, 14 January 2020 Task 5 13586 Ibrahim Address 5 City 5
5 Tuesday, 14 January 2020 Task 6 13585 Kate Address 6 City 6
这是我的尝试
def rowMerger(a,b):
try:
rule1 = lambda x: x not in ['']
u = a.loc[a.iloc[:,0].apply(rule1) & a.iloc[:,1].apply(rule1) & a.iloc[:,2].apply(rule1)].index
print(u)
findMergerindexs = list(u)
findMergerindexs.sort()
a = pd.DataFrame(a)
tabcolumns = pd.DataFrame(a.columns)
totalcolumns = len(tabcolumns)
b = pd.DataFrame(columns = list(tabcolumns))
if (len(findMergerindexs) > 0):
for m in range(len(findMergerindexs))
if not (m == (len(findMergerindexs)-1)):
startLoop = findMergerindexs[m]
endLoop = findMergerindexs[m+1]
else:
startLoop = findMergerindexs[m]
endLoop = len(a)
listValues = []
for i in range(totalcolumns):
value = ' '
for n in range(startLoop,endLoop):
value = value + ' ' + str(a.iloc[n,i])
listValues.insert(i,(value.strip()))
b = b.append(pd.Series(listValues),ignore_index = True)
else:
print("File is not having a row for merging instances - Please check the file manually for instance - ")
return b
except:
print("Error - While merging the rows")
return b
这段代码给出了下面的输出
rowMerger(df,0)
0 1 2
0 Task 1 13588 Jack Address 1 City 1
1 Task 2 13589 Ammie Address 2 City
2 Task 3 Tuesday, 14 January 2020 13589 Amanda Address 3 City 3
3 Task 4 13587 Chelsea Address 4 City 4
4 Task 5 13586 Ibrahim Address 5 City 5
5 Task 6 13585 Kate Address 6 City 6
但问题是这段代码只会合并行。不确定如何在所需输出中所示的各行之间复制日期,并将其放在不同的列中。有谁能帮我实现这个目标吗
您可以尝试以下操作:
输出
解释:
筛选
Task
列:dates
和task id
task id
^{"Task\s+\d"
表示Task
+任何空格+数字李>从这个掩码中,我们可以提取
Date
和Tasks
。通过df.Task[task_mask]
task_mask
{}的提取稍微困难一些
Task
值或NaN
李>array
转换成一个^{NaN
行李>使用^{}
使用^{} 和
how="all"
使用^{}
Groupby} 聚合行。聚合函数基于^{} :
"Task", "ID"a and "Date"
并使用^{lambda x: " ".join(x)
使用^{} 从groupby重置索引
希望这是清楚的
代码+插图
@Alexandre的答案很好-这是一个替代方案,我可以避免正则表达式提取和移位:
因此,本质上我们使用lambda将日期与任务编号分开,并使用
pd.Series.fillna(method='ffill')
填充最后一个有效日期所以我们要添加以下几行:
相关问题 更多 >
编程相关推荐