如何计算dataframe pandas-python中的条件概率值?

company model rating type 0 ford mustang A coupe 1 chevy camaro B coupe 2 ford fiesta C sedan 3 ford focus A sedan 4 ford taurus B sedan 5 toyota camry B sedan

Prob(rating=A) = 0.333333 Prob(rating=B) = 0.500000 Prob(rating=C) = 0.166667 Prob(type=coupe|rating=A) = 0.500000 Prob(type=sedan|rating=A) = 0.500000 Prob(type=coupe|rating=B) = 0.333333 Prob(type=sedan|rating=B) = 0.666667 Prob(type=coupe|rating=C) = 0.000000 Prob(type=sedan|rating=C) = 1.000000

3条回答

网友

1楼 · 编辑于 2024-09-22 20:27:24

对于缺少对的add0值，需要添加^{}：

mux = pd.MultiIndex.from_product([df['rating'].unique(), df['type'].unique()])
s = (df.groupby(['rating', 'type']).count() / df.groupby('rating').count())['model']
s = s.reindex(mux, fill_value=0)
print (s)
A  coupe    0.500000
   sedan    0.500000
B  coupe    0.333333
   sedan    0.666667
C  coupe    0.000000
   sedan    1.000000
Name: model, dtype: float64

还有另一个解决方案，谢谢Zero：

s.unstack(fill_value=0).stack()

网友

2楼 · 编辑于 2024-09-22 20:27:24

您可以使用.groupby()和内置的^{}：

rating_probs = df.groupby('rating').size().div(len(df))

rating
A    0.333333
B    0.500000
C    0.166667

还有条件问题：

df.groupby(['type', 'rating']).size().div(len(df)).div(rating_probs, axis=0, level='rating')

coupe  A         0.500000
       B         0.333333
sedan  A         0.500000
       B         0.666667
       C         1.000000

网友

3楼 · 编辑于 2024-09-22 20:27:24

您可以使用groupby：

In [2]: df = pd.DataFrame({'company': ['ford', 'chevy', 'ford', 'ford', 'ford', 'toyota'],
                     'model': ['mustang', 'camaro', 'fiesta', 'focus', 'taurus', 'camry'],
                     'rating': ['A', 'B', 'C', 'A', 'B', 'B'],
                     'type': ['coupe', 'coupe', 'sedan', 'sedan', 'sedan', 'sedan']})

In [3]: df.groupby('rating').count()['model'] / len(df)
Out[3]:
rating
A    0.333333
B    0.500000
C    0.166667
Name: model, dtype: float64

In [4]: (df.groupby(['rating', 'type']).count() / df.groupby('rating').count())['model']
Out[4]:
rating  type
A       coupe    0.500000
        sedan    0.500000
B       coupe    0.333333
        sedan    0.666667
C       sedan    1.000000
Name: model, dtype: float64

相关问题更多 >

编程相关推荐

热门问题

热门文章