用于新闻存储和分析的多维数据集

cubicweb-semnews的Python项目详细描述


摘要

用于新闻存储和分析的多维数据集

此多维数据集提供了semnews的实现:

  • store news articles and tweets.
  • extract and synthetize information.
  • provide semantic useful and original visualisation.
  • analytics tools and datamining/machine learning processings.

安装

实例的创建:

  • Create an instance using: cubicweb-ctl create semnews <name-of-instance>
  • Create the instance’s database using: cubicweb-ctl db-create <name-of-instance>

添加文章源

可以使用以下方法创建文章源:

  • Blogs/RSS feeds:

    session.create_entity('CWSource', name=<name of the source>, type=u'datafeed',
                          parser=u'rss-parser', lang=<lang of the source>,
                          url=<url of the blog/rss feed>,
                          config=u'synchronization-interval=120min')
    
  • Tweet:

    session.create_entity('CWSource', name=<name of the source>, type=u'datafeed',
                          parser=u'tweet-parser', lang=<lang of the source>,
                          url=<url of the blog/rss feed>,
                          config=u'synchronization-interval=120min')
    

同步间隔可以设置为更具体的值,或者设置为“否”进行手动同步 只有。

semnews附带了一些预先定义的博客/推文/rss提要:

  • Some french political blogs. You can add them using:

    cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_blogs_fr.py
    
  • Some international english newspapers. You can add them using:

    cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_newspapers.py
    
  • Some french newspapers. You can add them using:

    cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_newspapers_fr.py
    
  • Some french politician tweets. You can add them using:

    cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_twitters_fr.py
    

添加命名实体源

semnews基于命名实体流程,您必须定义该流程:

session.create_entity('NerProcess', name=<name of process>, host=<appid or sparql endpoint url>,
                      type=<rql or sparql>, lang=<optional lang of the ner source>,
                      request=<request to be performed>)

有关详细信息,请参阅ner多维数据集的文档。 来源示例:

session.create_entity('NerProcess', name=u'dbpedia38-en', host=u'ner',
                      type=u'rql', lang=u'en',
                      request=u'Any U WHERE X label %(token)s, X cwuri U, '
                               'X ner_source NS, NS name "dbpedia38-en"')

命令

semnews提供给命令:

  • A command to extract named entities from articles:

    cubicweb-ctl process-ner <name-of-instance>
    
  • A command to cleanup recognized entities according to some Dbpedia categories (see entities/external_resources.py):

    cubicweb-ctl cleanup-ner <name-of-instance>
    

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java是数据线。getMicrosecondPosition()线程安全?   java我可以设置多个。whereEqualTo在firestore查询中指向文档中的字段?   java Intellij 14 Glassfish服务器未连接。部署不可用   java JPA。如何返回null而不是LazyInitializationException   java TarsosDSP Clap检测   比较基于字符串的java枚举   java空指针异常日历。设定时间   java Hystrix在运行时忽略超时   将数据从Java RESTful服务器推送到Android手机上进行通知   java Jnotify delete vs shift delete问题   java安装失败\u没有匹配\u ABIS res113   TreeJava:传递未实例化的对象引用是如何工作的?   java如何使用Android ringtone manager从资产文件夹播放铃声?   java在Dropwizard的不同状态下使用不同的模拟