与Pandas杂交

2024-06-25 22:32:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我有如下数据:

^{tb1}$

还有一些类似的数据:

^{tb2}$

我想使用Pandas执行一项操作,产生以下结果:

^{tb3}$

我知道如何在Redshift中使用交叉连接,但无法使用pd.mergenp.dot确定语法。以下是示例数据帧:

revenue = pd.DataFrame({
    'Date': [
        '2021-01-01',
        '2021-01-15',
        '2021-01-01',
        '2021-01-01',
        '2021-02-01',
        '2021-02-01',
        '2021-02-01',
        ],
    'Vendor': [
        'Mickey Mouse',
        'Mickey Mouse',
        'Donald Duck',
        'Goofy',
        'Mickey Mouse',
        'Donald Duck',
        'Goofy',
    ],
    'Revenue': [100,150,100,100,200,200,200,]
        })
breakdown = pd.DataFrame({
    'Month': [
        'January 2021',
        'January 2021',
        'January 2021',
        'January 2021',
        'January 2021',
        'January 2021',
    ],
    'Vendor': [
        'Mickey Mouse',
        'Mickey Mouse',
        'Mickey Mouse',
        'Goofy',
        'Goofy',
        'Goofy',
    ],
    'Snack': [
        'Churros',
        'Funnel Cake',
        'Apples',
        'Churros',
        'Funnel Cake',
        'Water',
    ],
    'Percentage': [0.5,0.25,0.25,0.34,0.33,0.33]
})

Tags: 数据dataframepdvendorcakeduckdonaldmouse
2条回答

试试这个:

  • 转换revenue中的日期以反映breakdown中日期的字符串格式
  • 合并revenue和vendor上的框架(外部合并以获得交叉连接效果)
  • 将现有收入乘以百分比计算收入
  • 再选择相关列
out = (
    revenue
    .assign(
        Month=pd.to_datetime(revenue["Date"]).dt.strftime("%B %Y")
    )
    .merge(breakdown, on=["Month", "Vendor"], how="outer")
    .assign(Revenue=lambda d: d["Revenue"] * d["Percentage"].fillna(1))
    .filter(["Date", "Vendor", "Snack", "Revenue"])
)
​
print(out)
          Date        Vendor        Snack  Revenue
0   2021-01-01  Mickey Mouse      Churros     50.0
1   2021-01-01  Mickey Mouse  Funnel Cake     25.0
2   2021-01-01  Mickey Mouse       Apples     25.0
3   2021-01-15  Mickey Mouse      Churros     75.0
4   2021-01-15  Mickey Mouse  Funnel Cake     37.5
5   2021-01-15  Mickey Mouse       Apples     37.5
6   2021-01-01   Donald Duck          NaN    100.0
7   2021-01-01         Goofy      Churros     34.0
8   2021-01-01         Goofy  Funnel Cake     33.0
9   2021-01-01         Goofy        Water     33.0
10  2021-02-01  Mickey Mouse          NaN    200.0
11  2021-02-01   Donald Duck          NaN    200.0
12  2021-02-01         Goofy          NaN    200.0

尝试:

breakdown["Month"] = pd.to_datetime(breakdown["Month"])
revenue["Date"] = pd.to_datetime(revenue["Date"])

x = pd.merge(
    breakdown.assign(
        y=breakdown["Month"].dt.year, m=breakdown["Month"].dt.month
    ),
    revenue.assign(y=revenue["Date"].dt.year, m=revenue["Date"].dt.month),
    on=["y", "m", "Vendor"],
    how="outer",
)
x["Revenue"] *= x["Percentage"].fillna(1)
print(x[["Date", "Vendor", "Snack", "Revenue"]].fillna(""))

印刷品:

         Date        Vendor        Snack  Revenue
0  2021-01-01  Mickey Mouse      Churros     50.0
1  2021-01-15  Mickey Mouse      Churros     75.0
2  2021-01-01  Mickey Mouse  Funnel Cake     25.0
3  2021-01-15  Mickey Mouse  Funnel Cake     37.5
4  2021-01-01  Mickey Mouse       Apples     25.0
5  2021-01-15  Mickey Mouse       Apples     37.5
6  2021-01-01         Goofy      Churros     34.0
7  2021-01-01         Goofy  Funnel Cake     33.0
8  2021-01-01         Goofy        Water     33.0
9  2021-01-01   Donald Duck                 100.0
10 2021-02-01  Mickey Mouse                 200.0
11 2021-02-01   Donald Duck                 200.0
12 2021-02-01         Goofy                 200.0

相关问题 更多 >