点态希尔伯特-施密特独立准则(PHSIC)

phsic-cli的Python项目详细描述


点态hilbert窶鉄chmidt独立准则(phsic)

利用相似性计算两个对象之间的共现

例如,给定一致的句子对:

XY
They had breakfast at the hotel.They are full now.
They had breakfast at ten.I'm full.
She had breakfast with her friends.She felt happy.
They had breakfast with their friends at the Japanese restaurant.They felt happy.
He have trouble with his homework.He cries.
I have trouble associating with others.I cry.

PHSIC可以根据给定的配对给一致的配对以高分:

XYscore
They had breakfast at the hotel.They are full now.0.1134
They had breakfast at an Italian restaurant.They are stuffed now.0.0023
I have dinner.I have dinner again.0.0023

安装

$ pip install phsic

这将在您的环境中安装phsic命令:

$ phsic --help

基本用法

下载预先训练过的WordVecs(如FastText):

$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M.vec.zip
$ unzip crawl-300d-2M.vec.zip

准备数据集:

$ TAB="$(printf '\t')"
$ cat << EOF > train.txt
They had breakfast at the hotel.${TAB}They are full now.
They had breakfast at ten.${TAB}I'm full.
She had breakfast with her friends.${TAB}She felt happy.
They had breakfast with their friends at the Japanese restaurant.${TAB}They felt happy.
He have trouble with his homework.${TAB}He cries.
I have trouble associating with others.${TAB}I cry.
EOF
$ cut -f 1 train.txt > train_X.txt
$ cut -f 2 train.txt > train_Y.txt
$ cat << EOF > test.txt
They had breakfast at the hotel.${TAB}They are full now.
They had breakfast at an Italian restaurant.${TAB}They are stuffed now.
I have dinner.${TAB}I have dinner again.
EOF
$ cut -f 1 test.txt > test_X.txt
$ cut -f 2 test.txt > test_Y.txt

然后,训练并预测:

$ phsic train_X.txt train_Y.txt --kernel1 Gaussian 1.0 --encoder1 SumBov FasttextEn --emb1 crawl-300d-2M.vec --kernel2 Gaussian 1.0 --encoder2 SumBov FasttextEn --emb2 crawl-300d-2M.vec --limit_words1 10000 --limit_words2 10000 --dim1 3 --dim2 3 --out_prefix toy --out_dir out --X_test test_X.txt --Y_test test_Y.txt
$ cat toy.Gaussian-1.0-SumBov-FasttextEn.Gaussian-1.0-SumBov-FasttextEn.3.3.phsic
1.134489336180434238e-01
2.320408776101631244e-03
2.321869174772554344e-03

引文

@InProceedings{D18-1203,
  author = 	"Yokoi, Sho
        and Kobayashi, Sosuke
        and Fukumizu, Kenji
        and Suzuki, Jun
        and Inui, Kentaro",
  title = 	"Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions",
  booktitle = 	"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1763--1775",
  location = 	"Brussels, Belgium",
  url = 	"http://aclweb.org/anthology/D18-1203"
}

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何注册和引用回调接口?   java重新绘制框架将删除以前绘制的形状   如何在一条语句中链接多个java方法调用?   java如何为我的GridView包含SearchView?   java根据加载的配置文件有条件地加载外部spring引导自动配置   java将任务标记为延迟   java如何在插入一行之后获取序列id?   java如何从另一个类执行异步方法   日期仅重置时间,并在java中将其转换为utc   多线程使用Java中的线程将目录中的所有文件相互比较   java中的多线程非阻塞缓冲区   java查找层次结构   将LinkedList前置到另一个的本机Java方法?   rubygems JRuby+Java:如何在我的jar中找到本地安装的Gems   无法更新Firebase Java中的完整子对象   javaprintln:Windows与Linux