从UD到增强UD表示的python转换器

ud2ude-aryehgigi的Python项目详细描述


乌德乌德

这个(名字不好的)UD2UDE项目(代表通用依赖性到通用依赖性增强)是一个“u2dude”项目系列的主要项目,所有这些都与我论文的主要目标有关。

  1. Converter:(当前项目,)对enhancedUD converter的通用依赖性,旨在将core nlp的Java转换器移植到python(3.6)转换器中,并嵌入我研究过的插件(“extra或Aryeh的增强”)。在
  2. Model:基于UD(和PENN转换为UD)数据集的spaCy训练模型。在
  3. Demo:{将在稳定时移动到public}JS和python代码,利用转换器。只需检查一下here

Converter

The converter converts UD (v1.4) to enhancedUD, enhancedUD++, and extra-enhancements (discovered as part of my thesis). It supports Conll-U and Odin formats (and some conversions between them).

Generally, I tried to maintain the same behavior (mentioned here,这是由{a8})尽可能合理地实现的。

转换器包括以下转换:

paper (or here)UD formal guidelines (v2)coreNLP codeConverternotes
nmod/acl/advcl case infoeUDeUD (under 'obl' for v2)eUDeUD1. Even though multi-word prepositions are processed only under eUD++, it is still handled under eUD to add it in the case information.
2. Lowercased (and not lemmatized - important for MWP)
Passive agent--eUDeUDOnly if the nmod both has a "by" son and has an 'auxpass' sibling, then instead of nmod:by we fix to nmod:agent
conj case infoeUDeUDeUDeUD1. Adds the type of conjunction to all conjunct relations
2. Some multi-word coordination markers are collapsed to conj:and or conj:negcc
Process Multi-word prepositionseUD++eUD (?)eUD++eUD++Predetermined lists of 2w and 3w preps.
Demote quantificational modifiers (A.K.A Partitives and light noun constructions)eUD++(see here)eUD++eUD++Predetermined list of the quantifier or light noun.
Conjoined prepositions and prepositional phraseseUD++-eUD++eUD++
Propagated governors and dependentseUD (A, B, C)eUD (A, B, C, D)eUD (A, B, C)eUD (A, B, C)1. This includes: (A) conjoined noun phrases, (B) conjoined adjectival phrases, (C) subjects of conjoined verbs, and (D) objects of conjoined verbs.
2. Notice (D) is relevant to be added theoretically but was omitted for practical uncertainty (see 4.2 at the paper).
Subjects of controlled verbseUDeUDeUDeUD1. Includes the special case of 'to' with no following verb ("he decided not to").
2. Heuristic for choosing the propagated subject (according to coreNLP docu): if the control verb has an object it is propagated as the subject of the controlled verb, otherwise they use the subject of the control verb.
Subjects of controlled verbs - when 'to' marker is missing??-extra1. Example: "I started reading the book"
2. For some reason not included in the coreNLP code, unsure why
Relative pronounseUD++eUD (?)eUD++eUD++
Reduced relative clause-eUD (?)-extra
Subjects of adverbial clauses---extraHeuristic for choosing the propagated entity:
1. If the marker is "to", the object (if it is animated - but for now we don’t enforce it) of the main clause is propagated as subject, otherwise the subject of the main clause is propagated.
2. Else, if the marker is not one of "as/so/when/if" (this includes no marker at all which is mostly equivalent to "while" marker), both the subject and the object of the main clause are equivalent options (unless no object found, then the subject is propagated).
Noun-modifying participles(see here)--extra
Correct possible subject of Noun-modifying participles---extra1. This is a correctness of the subject decision of the previous bullet.
2. If the noun being modified is an object/modifier of a verb with some subject, then that subject might be the subject of the Noun-modifying participle as well. (it is uncertain, and seems to be correct only for the more abstract nouns, but that’s just a first impression).
Propagated modifiers (in conjunction constructions)---extraHeuristics and assumptions:
1. Modifiers that appear after both parts of the conjunction may (the ratio should be researched) refer to both parts. Moreover, If the modifiers father is not the immediate conjunction part, then all the conjunction parts between the father and the modifier are (most probably) modified by the modifier.
2. If the modifier father is the immediate conjunction part, we propagate the modifier backward only if the new father, doesn't have any modifiers sons (this is to restrict a bit the amount of false-positives).
3. We don’t propagate modifier forwardly (that is, if the conjunct part appears after the modifier, we assume they don’t refer).
4. Should be tested for cost/effectiveness as it may bring many false-positives.
Locative and temporal adverbial modifier propagation (indexicals)---extra1. Rational: If a locative or temporal adverbial modifier is stretched away from the verb through a subject/object/modifier(nmod) it should be applied as well to the verb itself.
2. Example: "He was running around, in these woods here".
Subject propagation of 'dep'---extraRational: 'dep' is already problematic, as the parser didn't know what relation to assign it. In case the secondary clause doesn't have a subject, most probably it should come from the main clause. It is probably an advcl/conj/parataxis/or so that was missing some marker/cc/punctuation/etc.
Apposition propagation(see here)--extra
nmod propagation through subj/obj/nmod---extraFor now we propagate only modifiers cased by 'like' or 'such_as' prepositions (As they imply reflexivity), and we copy their heads' relation (that is, obj for obj subj for subj and nmod for nmod with its corresponding case).
possessive---extraShare possessive modifiers through conjunctions (e.g. My father and mother went home -> My father and (my) mother...
Expanding multi word prepositions---extraAdd an nmod relation when advmod+nmod is observed while concatinating the advmod and preposition to be the new modifiers preposition (this expands the closed set of eUD's 'Process Multi-word preposition').
Active-passive alteration(see here)--extraInvert subject and object of passive construction (while keeping the old ones).
Copula alteration---extraAdd a verb placeholder, reconstruct the tree as if the verb was there.
Hyphen alteration---extraAdd subject and modifier relations to the verb in the middle of an noun-verb adjectival modifing another noun (e.g. a Miami-based company).

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java希望直接扫描要上传到网页的文档   java JavaFX使对象可见,但不使用(忽略)点击   java LibGdx奇怪的纹理行为   java JavaFx ComboBox在第一次单击时未获得正确的值   api WebDav服务器库最好使用Java   java在Android中维护应用程序状态   java保存在Android中拖放RecyclerView后所做的更改我已经阅读了所有内容   无法读取项目中所需库tomcatembedwebsocket的java存档,或者该存档不是有效的ZIP文件   c#类在系统中有哪些关联?如何在UML中最好地表示它们?   java如何使用bazaar访问两台计算机上的同一文件夹   Java易失性与缓存一致性   java如何解决http11。校长:例外?   java liferaymavenplugin从maven Liferay 6.2.5(6.2 ga6)部署war   安卓的jar for java在哪里。木卫一。文件   java邻接矩阵中BFS的顺序是什么?   虚拟机如何从java启动linux虚拟机?   向DefaultListModel添加元素时java JList未更新   java JMSListener批注无法与REST服务一起使用   java DirectoryStream返回路径的顺序是什么?文件名,上次修改,文件大小?   java DropWizard Hibernate doc想说什么?