Python pandas value_counts无法正常工作

2024-05-09 01:54:08 发布

您现在位置:Python中文网/ 问答频道 /正文

基于堆栈上的append-columns-based-on-other-column-values-to-pandas-dataframe">thispost,我尝试了这样的值计数函数

df2 = df1.join(df1.genres.str.split(",").apply(pd.value_counts).fillna(0))

除了我的数据有22种独特的类型,在分割之后我得到了42个值,这当然不是唯一的。 数据示例:

     Action  Adventure   Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG     Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing Accounting  Action  Adventure   Animation & Modeling    Audio Production    Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing  nan
0   nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1.0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan

(我只贴了头和第一排)

我有一种感觉,问题是由我原来的数据。嗯,我的专栏(流派)是一个包含括号的列表列表

示例:[Action,Indie] 所以当python读取它时,它会将[Action and Action and Action]读为不同的值,结果是303个不同的值。 所以我所做的是:

^{pr2}$

Tags: to数据示例accessactionnanproductiondesign
1条回答
网友
1楼 · 发布于 2024-05-09 01:54:08

您必须通过函数^{}从列genres中删除第一个和最后一个[],然后用空字符串替换空格

import pandas as pd

df = pd.read_csv('test/Copy of AppCrawler.csv', sep="\t")


df['genres'] = df['genres'].str.strip('[]')
df['genres'] = df['genres'].str.replace(' ', '')

df = df.join(df.genres.str.split(",").apply(pd.value_counts).fillna(0))

#temporaly display 30 rows and 60 columns
with pd.option_context('display.max_rows', 30, 'display.max_columns', 60):
    print df
    #remove for clarity
print df.columns
Index([u'Unnamed: 0', u'appid', u'currency', u'final_price', u'genres',
       u'initial_price', u'is_free', u'metacritic', u'release_date',
       u'Accounting', u'Action', u'Adventure', u'Animation&Modeling',
       u'AudioProduction', u'Casual', u'Design&Illustration', u'EarlyAccess',
       u'Education', u'FreetoPlay', u'Indie', u'MassivelyMultiplayer',
       u'PhotoEditing', u'RPG', u'Racing', u'Simulation', u'SoftwareTraining',
       u'Sports', u'Strategy', u'Utilities', u'VideoProduction',
       u'WebPublishing'],
      dtype='object')

相关问题 更多 >