基于列层次结构和另一列的值绘制数据框。

2024-09-29 21:36:09 发布

男 | 程序猿一只，喜欢编程写python代码。

我正试图绘制一个X/Y条形图，基于位置的前n个频率/影响词，Y轴是这些词的wordCount，而不是频率计算（tf_idf）-假设tf_idf是从这里开始最重要的词，有些词是代码，但仍然很重要。样本数据：

SITEID  word    wordCount   wordTotal   tf      SiteID  idf         tf_idf
CAK     hpci    328         187653      0.001   1       1.098       0.001920272

使用以下函数（我没有开发，但完全理解），我可以得到最重要的tf_idf字的一个很好的图表：

def pretty_plot_top_n(series, top_n=5, index_level=0):
    r = series\
    .groupby(level=index_level)\
    .nlargest(top_n)\
    .reset_index(level=index_level, drop=True)
    r.plot.bar()
    return r.to_frame()

pretty_plot_top_n(tf_idf['tf_idf'])

这张图片有完美的x轴-我想要SITEID和word。现在，我想“分层”它，使y轴成为每个单词的wordCount，而不是tf_idf

我尝试了两种不同的方法，包括（但不限于）：

#plot without function 
tf_idf.plot(x=['SITEID', 'tf_idf'], y='wordCount', kind="bar")

功能调整：

# second layer grouping by tf_idf

def pretty_plot_top_n(series, top_n=5, index_level=0, level2=7):
    r = series\
    .groupby(level=index_level)\
    .groupby(level=level2)\
    .nlargest(top_n)\
    .reset_index(level=index_level, drop=True)
    r.plot.bar()
    return r.to_frame()

和多行运行（这很明显为什么不起作用，但在某些情况下是这样的）

pretty_plot_top_n(tf_idf['tf_idf'])
pretty_plot_top_n(tf_idf['wordCount'])

我的输出总是以空白结束，或者我得到基于最高wordCount的单词。然而，tf_idf计算的全部要点是删除停止字

我不想要“to”和“of”。我想从上面说我的话。我如何将其分层以创建基于第一个tf_idf的图形，然后将y轴绘制为这些单词的wordCount？主要倾向于只调整我现有的功能

Tags： to index plot tf top pretty 绘制 bar

0条回答

目前没有回答

基于列层次结构和另一列的值绘制数据框。

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于列层次结构和另一列的值绘制数据框。

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >