从由comm分隔的数据帧中计算值

网友

1楼 · 编辑于 2024-09-30 18:16:32

如果您使用的是pandas，那么即使在OP中没有说的内容也可以猜出来，您可以执行类似的操作：

from collections import Counter

// Code where you get trending variable

genreCount = Counter()
for row in trending.itertuples():
    genreCount.update(row[0].split(",")) // Change the 0 for the position where the genre column is

print(genreCount) // It works as a dict where keys are the genres and values the appearances
print(dict(genreCount)) // You can also turn it inot a dict but the Counter variable already works as one

网友

2楼 · 编辑于 2024-09-30 18:16:32

下面的代码假设您已经知道一行中的最大项数。这意味着您需要读取该文件一次并找到这些信息（在这里，根据您的示例，我们假设这个数字是3）。在

max_num_of_items_in_one_row = 3
cols = range(max_num_of_items_in_one_row)
df = pd.read_csv('genre.txt', names=cols, engine='python', skiprows=1)
df = df.applymap(lambda x: 'NA' if x==None else x)
all_ = df.values.flatten()
genres = np.unique(all_)
for y in genres:
    tmp = df.applymap(lambda x: 1 if x==y else 0)
    print(y, tmp.values.flatten().sum())

该代码将文件读入一个dataframe，去掉None值，在dataframe中找到所有惟一的值，并计算它们在dataframe中的出现次数。在

网友

3楼 · 编辑于 2024-09-30 18:16:32

我得到了答案：

genres = pd.DataFrame(genres.genre.str.split(',', expand=True).stack(), columns= ['genre'])  
genres = genres.reset_index(drop = True)  
genre_count = pd.DataFrame(genres.groupby(by = ['genre']).size(),columns = ['count'])  
genre_count = genre_count.reset_index()

相关问题更多 >

编程相关推荐

热门问题

热门文章

从由comm分隔的数据帧中计算值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >