Python comparativeExtraction包_程序模块 - PyPI

关键词提取

comparativeExtraction的Python项目详细描述

简介

本模块帮助您使用比较方法从语料库中提取关键术语和主题。在

安装

pip install --upgrade comparativeExtraction

使用

导入包

^{pr2}$

加载样本数据

importpandasaspdimportnumpyasnpPATH="/Users/xiaoma/Desktop/gitrepo/associate-term-search/data/switch_reviews.csv"data=pd.read_csv(PATH)label=[x<=3forxindata['stars']]

data.columns

Index(['stars', 'titles', 'reviews', 'dates'], dtype='object')

在这里，我们使用在线亚马逊评论任天堂交换机来说明模块的用法。在

该模块需要一个语料库和一组二进制标签作为输入。标签应该根据我们要回答的问题类型来创建。标签集的长度应该与语料库的长度相同。在

在这里，假设我们想知道人们为什么不喜欢这个产品，并找到相关的关键字。为了回答这个问题，我们将标签创建为一个二进制变量，指示评审者给出的是3星还是更低。在

用回顾语料库和标签初始化模块

kw_init=comparative_keyword_extraction(corpus=data['reviews'],labels=label)

提取关键字

kw=kw_init.get_distinguishing_terms(ngram_range=(1,3),top_n=10)

# Get the keywords that are mentioned significantly more in the less than or equal to 3 star reviewskw.incre_df

# Get the keywords that are mentioned significantly less in the less than or equal to 3 star reviewskw.decline_df

如果我们需要一个单词的更多上下文，或者我们需要更多可解释的主题，我们可以：

输出包含术语的评论
切换ngram_范围
使用补充功能模块

输出评论

假设我们想知道更多关于“工作”这个重要术语，我们可以直接输出包含该术语的所有评论。在

输出类“kw”包含一个热编码文档术语矩阵，其中包含从语料库中找到的所有术语。我们可以利用它找到每个学期的相应评论。在

# The binary_dtm provides a convenient way to extract reviews with specific termsprint(kw.binary_dtm[['work','not']])

      work  not
0        0    0
1        0    0
2        0    0
3        0    0
4        0    0
...    ...  ...
4995     1    0
4996     0    1
4997     0    0
4998     0    0
4999     0    0

[5000 rows x 2 columns]

reviews_contain_term_work=data['reviews'][[x==1forxinkw.binary_dtm['work']]]len(reviews_contain_term_work)

forxinpd.Series(reviews_contain_term_work).sample(1):print(x)

It's alright, only got it to give Nintendo another chance. It's a neat concept. Overall, it's aggressively mediocre, good for casual stuff, but will never get as much use as my ps4.Wi-Fi is god awful though. The worst I've dealt with. It's connection capabilities are atrocious compared with any other wireless device. Don't expect it to just work. Honestly, this singular problem is enough for me to rate it 1 star. I suppose they had to cut corners somewhere.

更改n-gram范围以排除uni-gram

kw=kw_init.get_distinguishing_terms(ngram_range=(2,4),top_n=10)kw.incre_df

kw.decline_df

使用补充功能

有时，当我们想深入到一个特定的术语时，我们可以利用内置的补充函数来查找包含该术语的相关n-gram

fromcomparativeExtraction.supplement_funcsimportget_ngrams_on_term

target_term="work"reviews_contain_term_work=data['reviews'][[x==1forxinkw.binary_dtm['work']]]related_ngrams=get_ngrams_on_term(target_term,reviews_contain_term_work,filter_by_extreme=False)

related_ngrams.related_ngrams.head()

在这里，计数也是一个文档频率

欢迎加入QQ群-->： 979659372

comparativeExtraction 0.0.7

comparativeExtraction的Python项目详细描述

简介

安装

使用

导入包

加载样本数据

用回顾语料库和标签初始化模块

提取关键字

输出评论

更改n-gram范围以排除uni-gram

使用补充功能

推荐PyPI第三方库

two_cents

sahara-plugin-mapr

plaitp

usernamegen

tectonic

collective.clamav

scrap2rst

vimp

serv

ggd

githooks

coopr.doc

oem-format-minimize-msgpack

wsnsimp

rds-log

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

comparativeExtraction 0.0.7

comparativeExtraction的Python项目详细描述

简介

安装

使用

导入包

加载样本数据

用回顾语料库和标签初始化模块

提取关键字

输出评论

更改n-gram范围以排除uni-gram

使用补充功能

推荐PyPI第三方库

two_cents

sahara-plugin-mapr

plaitp

usernamegen

tectonic

collective.clamav

scrap2rst

vimp

serv

ggd

githooks

coopr.doc

oem-format-minimize-msgpack

wsnsimp

rds-log

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签