有条件地填写某一列直到某一日期

2024-09-27 19:14:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧,看起来像这样

data = {'BugCatcher': ['Fred', 'Fred', 'George', 'George'],
        'Date': ['1/13/2020', '1/14/2020', '1/20/2020', '1/26/2020'],
        'BugsCaught': ['Spider','Spider', 'Butterfly', 'Butterfly'],
        'BugsFound': [1, 4, 5, 8],
        'BugsFoundRunningTotal': [1, 5, 5, 13]
        }

bug_df = pd.DataFrame(data, columns = ['BugCatcher', 'Date', 'BugsCaught', 'BugsFound', 'BugsFoundRunningTotal'])
bug_df

BugCatcher  Date        BugsCaught  BugsFound   BugsFoundRunningTotal

0   Fred    1/13/2020   Spider              1                       1

1   Fred    1/14/2020   Spider              4                       5

2   George  1/20/2020   Butterfly           5                       5

3   George  1/26/2020   Butterfly           8                      13



data2 = {'Name': ['Fred', 'Fred', 'George', 'George'],
        'Date': ['1/5/2020', '1/6/2020', '1/17/2020', '1/30/2020'],
        'NumberOfBooksReadOnCatchingBugs': [2, 3, 1, 3],
        }

book_df = pd.DataFrame(data2, columns = ['Name', 'Date', 'NumberOfBooksReadOnCatchingBugs'])
book_df


    Name    Date        NumberOfBooksReadOnCatchingBugs

0   Fred    1/5/2020                                  2

1   Fred    1/6/2020                                  3

2   George  1/17/2020                                 1

3   George  1/30/2020                                 3

我正在寻找一种有条件地填充或连接某些部分的方法,以便得到这样的输出。我们的想法是,我们加入这两个部分,但只有在特定的日期。我们的想法是最终绘制一个直线和柱形图,将运行中的bug总数显示为一条直线,然后将书籍作为列阅读。我试过加入它,但没有给出正确的结果

desired = {'BugCatcher': ['Fred', 'Fred', 'Fred', 'Fred', 'George', 'George', 'George', 'George'],
        'Date': ['1/5/2020', '1/6/2020','1/13/2020', '1/14/2020', '1/17/2020', '1/20/2020', '1/26/2020', '1/30/2020'],
        'NumberOfBooksReadOnCatchingBugs': [2, 3, 3, 3, 1, 1, 1, 3],
        'BugsType': ['Spider','Spider', 'Spider', 'Spider', 'Butterfly', 'Butterfly', 'Butterfly', 'Butterfly'],
        'QuantityFound': [0, 0, 1, 4, 0, 5, 8, 0],
        'BugsFoundRunningTotal': [0, 0, 1, 5, 0, 5, 13, 13]
        }

output = pd.DataFrame(desired, columns = ['BugCatcher', 'Date', 'NumberOfBooksReadOnCatchingBugs', 'BugsType', 'QuantityFound', 'BugsFoundRunningTotal'])
output

    BugCatcher  Date        NumberOfBooksReadOnCatchingBugs    BugsType    QuantityFound    BugsFoundRunningTotal

0   Fred        1/5/2020                                  2    Spider                  0                        0

1   Fred        1/6/2020                                  3    Spider                  0                        0

2   Fred        1/13/2020                                 3    Spider                  1                        1

3   Fred        1/14/2020                                 3    Spider                  4                        5

4   George      1/17/2020                                 1    Butterfly               0                        0

5   George      1/20/2020                                 1    Butterfly               5                        5

6   George      1/26/2020                                 1    Butterfly               8                       13

7   George      1/30/2020                                 3    Butterfly               0                       13

感谢您的帮助

谢谢


Tags: columnsdataframedfdatefredbugspiderpd
1条回答
网友
1楼 · 发布于 2024-09-27 19:14:59

试试这个:

bug_df['Date'] = pd.to_datetime(bug_df['Date'])
book_df['Date'] = pd.to_datetime(book_df['Date'])

final_df = bug_df.set_index(['BugCatcher', 'Date']).append(book_df.set_index(['Name', 'Date'])).sort_index()

final_df['BugsCaught'] = final_df.groupby(level=0)['BugsCaught'].fillna(method='bfill').fillna(method='ffill')
final_df['BugsFound'] = final_df['BugsFound'].fillna(0).astype(int)
final_df['BugsFoundRunningTotal'] = final_df.groupby(level=0)['BugsFoundRunningTotal'].fillna(method='ffill').fillna(0).astype(int)
final_df['NumberOfBooksReadOnCatchingBugs'] = final_df.groupby(level=0)['NumberOfBooksReadOnCatchingBugs'].fillna(method='ffill').fillna(method='bfill').astype(int)

final_df = final_df.reset_index()

输出:

    BugCatcher  Date    BugsCaught  BugsFound   BugsFoundRunningTotal   NumberOfBooksReadOnCatchingBugs
0   Fred    2020-01-05  Spider      0   0   2
1   Fred    2020-01-06  Spider      0   0   3
2   Fred    2020-01-13  Spider      1   1   3
3   Fred    2020-01-14  Spider      4   5   3
4   George  2020-01-17  Butterfly   0   0   1
5   George  2020-01-20  Butterfly   5   5   1
6   George  2020-01-26  Butterfly   8   13  1
7   George  2020-01-30  Butterfly   0   13  3

相关问题 更多 >

    热门问题