如何从文本文件中的每一行提取子字符串?

2024-10-02 22:36:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文本文件,格式如下(1行):

[NN] ||| transplant ||| transplantation ||| PPDB2.0Score=5.24981 PPDB1.0Score=3.295900 -logp(LHS|e1)=0.18597 -logp(LHS|e2)=0.14031 -logp(e1|LHS)=11.83583 -logp(e1|e2)=1.80507 -logp(e1|e2,LHS)=1.46728 -logp(e2|LHS)=11.47593 -logp(e2|e1)=1.49083 -logp(e2|e1,LHS)=1.10738 AGigaSim=0.63439 Abstract=0 Adjacent=0 CharCountDiff=5 CharLogCR=0.40547 ContainsX=0 Equivalence=0.371472 Exclusion=0.000344 GlueRule=0 GoogleNgramSim=0.03067 Identity=0 Independent=0.078161 Lex(e1|e2)=9.64663 Lex(e2|e1)=59.48919 Lexical=1 LogCount=4.67283 MVLSASim=NA Monotonic=1 OtherRelated=0.372735 PhrasePenalty=1 RarityPenalty=0 ForwardEntailment=0.177287 SourceTerminalsButNoTarget=0 SourceWords=1 TargetComplexity=0.98821 TargetFormality=0.98464 TargetTerminalsButNoSource=0 TargetWords=1 UnalignedSource=0 UnalignedTarget=0 WordCountDiff=0 WordLenDiff=5.00000 WordLogCR=0 ||| 0-0 ||| OtherRelated

我想要的是提取transplanttransplantation。你会怎么做?对于|||分隔符之间的值,文本文件中的每一行的长度都不同。下面是第二个示例:

[VBZ] ||| reflects ||| understand ||| PPDB2.0Score=3.50769 PPDB1.0Score=21.844910 -logp(LHS|e1)=0.01251 -logp(LHS|e2)=10.87470 -logp(e1|LHS)=6.91653 -logp(e1|e2)=11.53225 -logp(e1|e2,LHS)=4.29729 -logp(e2|LHS)=16.55913 -logp(e2|e1)=10.31266 -logp(e2|e1,LHS)=13.93988 AGigaSim=0.54532 Abstract=0 Adjacent=0 CharCountDiff=2 CharLogCR=0.22314 ContainsX=0 Equivalence=0.006535 Exclusion=0.022332 GlueRule=0 GoogleNgramSim=0 Identity=0 Independent=0.456621 Lex(e1|e2)=62.90141 Lex(e2|e1)=62.90141 Lexical=1 LogCount=0 MVLSASim=NA Monotonic=1 OtherRelated=0.404562 PhrasePenalty=1 RarityPenalty=0.36788 ForwardEntailment=0.109950 SourceTerminalsButNoTarget=0 SourceWords=1 TargetComplexity=0.99354 TargetFormality=1.00000 TargetTerminalsButNoSource=0 TargetWords=1 UnalignedSource=0 UnalignedTarget=0 WordCountDiff=0 WordLenDiff=2.00000 WordLogCR=0 ||| 0-0 ||| Independent

这里的目标词是reflectsunderstands。你知道吗


Tags: abstractscore文本文件adjacentindependentlhse2transplant