如何根据值_计数重新标记类别，然后绘制数据

,uuid,name,brand,street,house_number,post_code,city,latitude,longitude,first_active,openingtimes_json,state,istFrei 0,0e18d0d3-ed38-4e7f-a18e-507a78ad901d,OIL! Tankstelle München,OIL!,EVERSBUSCHSTRASSE,33,80999,MÜNCHEN,48.1807,11.4609,1970-01-01 01:00:00+01,"{""openingTimes"":[{""applicable_days"":192,""periods"":[{""startp"":""07:00"",""endp"":""20:00""}]},{""applicable_days"":63,""periods"":[{""startp"":""06:00"",""endp"":""22:00""}]}]}",Bayern,True 1,44e2bdb7-13e3-4156-8576-8326cdd20459,bft Tankstelle,BFT TANKSTELLE,SCHELLENGASSE ,53,36304,ALSFELD,50.7520089,9.2790394,1970-01-01 01:00:00+01,"{""openingTimes"":[{""applicable_days"":63,""periods"":[{""startp"":""06:00"",""endp"":""22:00""}]},{""applicable_days"":64,""periods"":[{""startp"":""07:00"",""endp"":""21:00""}]}]}",Hessen,True 2,ad812258-94e7-473d-aa80-d392f7532218,bft Bonn-Bad Godesberg,BFT,GODESBERGER ALLEE,55,53175,BONN,50.6951,7.14276,1970-01-01 01:00:00+01,"{""openingTimes"":[{""applicable_days"":31,""periods"":[{""startp"":""06:00"",""endp"":""22:00""}]},{""applicable_days"":32,""periods"":[{""startp"":""07:00"",""endp"":""22:00""}]},{""applicable_days"":64,""periods"":[{""startp"":""08:00"",""endp"":""22:00""}]}]}",Nordrhein-Westfalen,True

1条回答

网友

1楼 · 发布于 2024-09-22 20:28:41

使用df.brand.value_counts()可以使用.merge将'total_count'列添加到df
使用布尔索引重命名任何'brand'的'total_count'小于.lt，20的'brand'
为'brand'获取新的.value_counts，并使用pandas.DataFrame.plot和kind='barh'绘制一个水平条。如果品牌不多，请使用kind='bar'并更改figsizekind='pie'可以使用，但是，虽然我喜欢pi和饼图，但我不喜欢或推荐pie图表。
- ^{使用饼图而不是条形图的主要目的是直观地指示一组值是分数或百分比，它们相加为一个整体。这一信息带来了相当大的代价：比较饼图的值比比较条形图更困难，因为对于观看者来说，比较两个圆弧所包含的角度比比较两个条形图的高度更困难。-Bergstrom，Carl T。；韦斯特，杰文·D。。胡说八道（第179页）。兰登书屋出版集团。Kindle版。}

使用pandas v1.2.4和matplotlib v3.4.2

import pandas as pd
import numpy as np  # for sample data

# sample data
data = ['Aloha Petroleum', 'Alon', 'American Gas', 'Amoco', 'ARCO', 'Billups', 'BP', "Buc-ee's", "Casey's General Stores", 'CEFCO', 'CENEX', 'Chevron', 'Circle K', 'Citgo', 'Clark Brands', 'Conoco', 'Costco', 'Crown', 'Cumberland Farms', 'Delta Sonic - Buffalo New York']

# probabilities for each brand
prob = [0.099, 0.099, 0.099, 0.0501, 0.0501, 0.0501, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.0009, 0.0009, 0.0009]

# sample dataframe
np.random.seed(2)
df = pd.DataFrame({'brand': np.random.choice(data, size=(16000,), p=prob)})

# add a column to the dataframe called total_count
df = df.merge(df.brand.value_counts(), left_on='brand', right_index=True).rename({'brand_y': 'total_count'}, axis=1)

# any brand with a total_count less than 20 is renamed
df.loc[df.total_count.lt(20), 'brand'] = 'Freie Tankstellen'

# plot the new value count with the updated brand name
df.brand.value_counts().plot(kind='barh', figsize=(7, 10))

与`kind='pie'`相比

df.brand.value_counts().plot(kind='pie', figsize=(7, 10))

与`kind='pie'`相比

相关问题更多 >

编程相关推荐

热门问题

热门文章