Python urlcounter包_程序模块 - PyPI

对基于事件的语料库中的URL进行计数的一组函数。它假设您将数据划分为一系列基于事件的时段，其中包含社区检测到的模块/集线器。它还假设您已经取消了对URL数据的存储和清理。有关帮助，请参阅Deen Freelon的unspooler模块：https://github.com/dfreelon/unspooler。

urlcounter的Python项目详细描述

URL计数器

作者：克里斯·林德格伦chris.a.lindgren@gmail.com 根据BSD 3条款许可证分发。看到了吗许可证.txt或http://opensource.org/licenses/BSD-3-Clause获取详细信息。在

概述

urlcounter是一组函数，它为周期性、事件定义的社交媒体发布数据统计完整的和域的url。它假定您正在寻找以下有关链接共享的问题的答案：

在每个时期，每个组的前x个完整URL和域URL是什么？在
在每个时期，每个组模块（检测到的社区）的前x个完整URL和域URL是什么？在

要使用该模块，请导入并按照以下示例进行指导：

importurlcounterasurlcdict_url_counts=urlc.top_urls(df=cdf,#DataFrame of full corpusperiods=(1,10),#Tuple providing range of numbered periodshubs=(1,10),#Tuple providing range of numbered hubsperiod_dates=period_dates,#Dict of Lists with dates per periodlist_of_regex=[htg_btw,htg_fbt,htg_anti],#List of regex patterns defined for each grouphl=hub_lists,#Dict with keyed lists of hub usernames per periodcolumns=['cleaned_urls','retweets_count','hashtags','username','mentions'],#Provide a List of column names to use for search and countingurl_sample_size=50,#Desired sample size limit, e.g., Top 50verbose=True#Boolean. True prints out status messages, False prints nothing)

示例输出

它返回由用户定义的组名、句点范围和模块范围键控的Dict：

^{pr2}$

{'1':#Start period 1'fbt_domains_per_period':[#start period 1 totals for group keyed as 'fbt'('twitter.com',3003),('instagram.com',1001),('facebook.com',202)],'fbt_urls_per_period':[('https://twitter.com/user/status/example',202),('https://www.instagram.com/p/example/',202),...]},#end period 1 totals for group keyed as 'fbt'{'fbt':{#start period 1, module/hub 1'1':{'hub_domain_counts':[('example.com',178),('example2.go.lc',14),('example3.com',10),...],'hub_sample_size':103,'hub_tweet_sample_size':486,'hub_url_counts':[('https://example.com/politics/story-title-1/',120),('https://example.com/politics/story-title-2/',58),...]}}},#end period 1, module/hub 1...},#end period...

顶部URL（）

在语料库中统计网址。在

参数：

df=数据帧。要查询的语料库。在
columns=DF语料库中要引用的5个列名（字符串）列表。！IMP：顺序很重要：
1. 包含URL（字符串）的列，其中包含post/content中包含的URL列表：
- 示例：['https://time.com','https://and-time-again.com网站']. 该列表也可以是字符串“[]”，因为该函数可以转换文字。在
1. 列，其中包含共享帖子的次数（整数），例如Twitter上的转发。在
2. 包含组数据（字符串）的列，例如来自tweets的hashtags。在
3. 具有用户名（字符串）的列，例如tweet用户名
4. 包含目标内容数据（String）的列，例如来自模块的目标用户的tweet，或者目标用户的stringized列表，比如tweet-notions。在
在
url_sample_size=整数。所需样本限值。在
periods=元组。包含2个整数，用于定义句点的范围，例如（1,10）
hubs=元组。包含2个整数，定义模块/集线器的范围，例如（1,10）
period_dates=dect of list with dates per period:pd['1']=>；['2018-01-01'，'2018-01-01'，…]
list_of_regex=列表。包含：
1. 带有组标识符（如hashtags）的regex模式的列表
2. 字符串。组的密钥标识符。在
在
hl=Dict。包含社区检测到的用户名列表
verbose=布尔值。True打印出状态消息（推荐），False不打印任何内容

退货：

有关数据访问的输出详细信息，请参阅文档。在

url_counter（）

top_urls()的助手函数。它将传入的字符串列表转换为正则表达式字符串，以便于搜索。在

参数：

df：数据帧。要作为正则表达式字符串写入的字符串数组。在
columns：要从语料库中使用的4个列名的列表，但在此函数中只使用前两个：
1. 包含post/content中包含的URL列表的URL列的名称。在
2. 整数。一个帖子被分享的次数，例如在Twitter上的转发。在
在

退货：

一个List，包括：
- sorted_totals：包含2项的元组列表：
  - 字符串完整URL
  - 整数。URL实例总数（包括RTs）。在
  在
- sorted_domain_totals：
  - 字符串域URL
  - 整数。URL实例总数（包括RTs）。在
  在
在

regex_lister（）

用于top_urls()的帮助函数，但也可以用于单独创建组regex搜索参数。它将传入的字符串列表转换为正则表达式字符串，以便于搜索。在

参数：

the_list：列表。要作为正则表达式字符串写入的字符串数组。在
key：字符串。表示组名

退货：

keyed：元组；
- 'key'（字符串），表示组名
- 'listicle'（regex字符串）将被使用为了寻找
在

urlcounter函数只与python3.x一起使用，并且不向后兼容（尽管可以用最小的努力将2.x端口分支出去）。在

Warning：urlcounter不执行任何自定义错误处理，因此请确保输入的格式正确！如果您有任何问题，请通过电子邮件通知我。在

系统要求

熊猫

安装

pip install urlcounter

分发更新终端命令

欢迎加入QQ群-->： 979659372

urlcounter 0.0.3

urlcounter的Python项目详细描述

URL计数器

概述

示例输出

顶部URL（）

url_counter（）

regex_lister（）

系统要求

安装

分发更新终端命令

推荐PyPI第三方库

pyhere

thonny-error-explainer

pulumi-azuredevops

jcopvision

repertorio

texplain

django-sites-microsoft-auth

thebe

CodeProfiler

arbiter

akrikola

fetchmovie

ravencoinlib

stats-thinking-21-zh

timeflux-brainflow

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

urlcounter 0.0.3

urlcounter的Python项目详细描述

URL计数器

概述

示例输出

顶部URL（）

url_counter（）

regex_lister（）

系统要求

安装

分发更新终端命令

推荐PyPI第三方库

pyhere

thonny-error-explainer

pulumi-azuredevops

jcopvision

repertorio

texplain

django-sites-microsoft-auth

thebe

CodeProfiler

arbiter

akrikola

fetchmovie

ravencoinlib

stats-thinking-21-zh

timeflux-brainflow

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签