以Pandas为单位计算作业完成时间

2024-09-29 03:37:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样的学生数据

date        student_name    tasks   remarks

2012-12-01  sarita          -100    Complete 100 tasks
2013-12-04  manu            -35     complete 35 taks
2013-01-15  sarita           10     completed 10 tasks
2013-02-13  sarita          -25     Complete 25 more tasks
2013-03-13  sarita           30     completed 30 taks
2013-03-12  manu             10     completed 10 tasks 

如何计算complete ondate completed

最终结果应该是

date        student_name    tasks   remarks                Completed  Completion Date'

2012-12-01  sarita          -100    Complete 100 tasks     Yes        2013-04-12
2013-12-04  manu            -35     complete 35 taks       No        'Not Completed Yet'
2013-01-15  sarita          10      completed 10 tasks     NaN        
2013-02-13  sarita         -25      complete 25 more tasks No        'Not Completed Yet'
2013-03-13  sarita         30       completed 30 taks      NaN
2013-03-12  manu           10       completed 10 tasks     NaN
2013-04-12 sarita          70       completed 70 tasks     NaN
2013-05-16 sarita          8        completed 8 tasks      NaN 

我想计算Completed&Completion Date列。 我应该为此创建单独的DF吗

Completed应根据用户迄今为止完成的积极任务数进行计算

迄今为止sarita已完成118项任务 因此,每当我运行DF时,Ondate2012-12-01Completed应该设置为yes&Completion Date应该设置为2013-04-12,因为-100个任务已经完成

date2013-02-13 student_name{}{}应设置为No,因为她只完成了接下来的18项任务。下一步的积极输入>;插入7个任务Completed应设置为yes&Completion Date应相应设置

希望,这能让事情有所好转


Tags: nonamedatenanstudenttaskscompletecompleted
1条回答
网友
1楼 · 发布于 2024-09-29 03:37:43

好了,这将满足您的要求:

import pandas as pd

#data
d = {'date':['2012-12-01', '2013-12-04', '2013-01-15', '2013-02-13', '2013-03-13', '2013-03-12', '2013-04-12', '2013-05-16'],
 'student_name':['sarita', 'manu', 'sarita', 'sarita', 'sarita', 'manu', 'sarita', 'sarita'],
 'tasks':[-100, -35, 10, -25, 30, 10, 70, 8],
 'remarks':['Complete 100 tasks', 'complete 35 taks', 'completed 10 tasks', 'Complete 25 more tasks', 'completed 30 taks', 'completed 10 tasks', 'completed 70 tasks', 'completed 8 tasks']}

#create dataframe
df = pd.DataFrame(data = d)
#convert string to date
df['date'] =  pd.to_datetime(df['date'], format='%Y/%m/%d')
#create new empty columns
df['Comleted'] = ''
df['Completion Date'] = ''

#get list of students
students = df['student_name'].unique().tolist()

#loop over stdents
for student in students:
    #get student record
    studentRecords = df.loc[df['student_name'] == student]
    #get assigned / completed tasks dfs
    assignedTasks = studentRecords.loc[~df['remarks'].str.contains('completed')].reset_index(drop=True).sort_values(by=['date'])
    completedTasks = studentRecords.loc[df['remarks'].str.contains('completed')].reset_index(drop=True).sort_values(by=['date'])
    #loop over assigned tasks
    for i, row in assignedTasks.iterrows():
        #get + tasks
        tasks = -assignedTasks.at[i, 'tasks']
        #get cumulative tasks sum
        completedTasks['cumsum'] = completedTasks['tasks'].cumsum()
        #flag where tasks have been completed
        completedTasks['finishedAssignemt'] = completedTasks['cumsum'].apply(lambda x: 1 if x >= tasks else 0)
        #if completed, dummy frame of needed info
        neededinfo = completedTasks[completedTasks.finishedAssignemt == 1].head(1)
        #if length is zero then tasks has not been completed
        if len(neededinfo) == 0: 
            #update records
            df['Comleted'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = 'No'
            df['Completion Date'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = 'Not Completed Yet'
        #if completed
        else:
            #get date of completion
            onDate = neededinfo.iloc[0]['date']
            #completed on the date
            tasksTillDate = neededinfo.iloc[0]['cumsum']
            #remove previous records
            completedTasks = completedTasks.loc[completedTasks['finishedAssignemt'] == 1].reset_index(drop=True)
            #update tasks with the new value (remving tasks that account for different assignment)
            completedTasks['tasks'].loc[(completedTasks['cumsum'] == tasksTillDate) & (completedTasks['date'] == onDate)] = tasksTillDate - tasks
            #update records
            df['Comleted'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = 'Yes'
            df['Completion Date'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = onDate

print(df)

date    student_name    tasks   remarks Comleted    Completion Date
2012-12-01  sarita  -100    Complete 100 tasks  Yes 2013-04-12 00:00:00
2013-12-04  manu    -35 complete 35 taks    No  Not Completed Yet
2013-01-15  sarita  10  completed 10 tasks      
2013-02-13  sarita  -25 Complete 25 more tasks  No  Not Completed Yet
2013-03-13  sarita  30  completed 30 taks       
2013-03-12  manu    10  completed 10 tasks      
2013-04-12  sarita  70  completed 70 tasks      
2013-05-16  sarita  8   completed 8 tasks       

相关问题 更多 >