我尝试了不同的pandas方法,如rank、qcut、quantile,但无法获得cume_dist()
的SQL等价物。如何在熊猫中获得以下结果
在这个网站上可以找到用SQL解决的完整问题:https://www.windowfunctions.com/questions/ranking/4
import numpy as np
import pandas as pd
df = pd.DataFrame({'name': ['Molly', 'Ashes', 'Felix', 'Smudge', 'Tigger', 'Alfie', 'Oscar', 'Millie', 'Misty', 'Puss', 'Smokey', 'Charlie'],
'breed': ['Persian', 'Persian', 'Persian', 'British Shorthair', 'British Shorthair', 'Siamese', 'Siamese', 'Maine Coon', 'Maine Coon', 'Maine Coon', 'Maine Coon', 'British Shorthair'],
'weight': [4.2, 4.5, 5.0, 4.9, 3.8, 5.5, 6.1, 5.4, 5.7, 5.1, 6.1, 4.8],
'color': ['Black', 'Black', 'Tortoiseshell', 'Black', 'Tortoiseshell', 'Brown', 'Black', 'Tortoiseshell', 'Brown', 'Tortoiseshell', 'Brown', 'Black'],
'age': [1, 5, 2, 4, 2, 5, 1, 5, 2, 2, 4, 4]})
select name, weight, ntile(4) over ( order by weight) as weight_quartile from cats order by weight
name weight percent
Tigger 3.8 8
Molly 4.2 17
Ashes 4.5 25
Charlie 4.8 33
Smudge 4.9 42
Felix 5.0 50
Puss 5.1 58
Millie 5.4 67
Alfie 5.5 75
Misty 5.7 83
Oscar 6.1 100
Smokey 6.1 100
有没有办法让我们只使用numpy和pandas
以下是Python(PySpark)版本:
创建spark df
在spark df中使用sql函数
输出
相关问题 更多 >
编程相关推荐