如何从文本文件中的每一行提取子字符串？

2024-10-02 22:36:20 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个文本文件，格式如下（1行）：

[NN] ||| transplant ||| transplantation ||| PPDB2.0Score=5.24981 PPDB1.0Score=3.295900 -logp(LHS|e1)=0.18597 -logp(LHS|e2)=0.14031 -logp(e1|LHS)=11.83583 -logp(e1|e2)=1.80507 -logp(e1|e2,LHS)=1.46728 -logp(e2|LHS)=11.47593 -logp(e2|e1)=1.49083 -logp(e2|e1,LHS)=1.10738 AGigaSim=0.63439 Abstract=0 Adjacent=0 CharCountDiff=5 CharLogCR=0.40547 ContainsX=0 Equivalence=0.371472 Exclusion=0.000344 GlueRule=0 GoogleNgramSim=0.03067 Identity=0 Independent=0.078161 Lex(e1|e2)=9.64663 Lex(e2|e1)=59.48919 Lexical=1 LogCount=4.67283 MVLSASim=NA Monotonic=1 OtherRelated=0.372735 PhrasePenalty=1 RarityPenalty=0 ForwardEntailment=0.177287 SourceTerminalsButNoTarget=0 SourceWords=1 TargetComplexity=0.98821 TargetFormality=0.98464 TargetTerminalsButNoSource=0 TargetWords=1 UnalignedSource=0 UnalignedTarget=0 WordCountDiff=0 WordLenDiff=5.00000 WordLogCR=0 ||| 0-0 ||| OtherRelated

我想要的是提取transplant和transplantation。你会怎么做？对于|||分隔符之间的值，文本文件中的每一行的长度都不同。下面是第二个示例：

[VBZ] ||| reflects ||| understand ||| PPDB2.0Score=3.50769 PPDB1.0Score=21.844910 -logp(LHS|e1)=0.01251 -logp(LHS|e2)=10.87470 -logp(e1|LHS)=6.91653 -logp(e1|e2)=11.53225 -logp(e1|e2,LHS)=4.29729 -logp(e2|LHS)=16.55913 -logp(e2|e1)=10.31266 -logp(e2|e1,LHS)=13.93988 AGigaSim=0.54532 Abstract=0 Adjacent=0 CharCountDiff=2 CharLogCR=0.22314 ContainsX=0 Equivalence=0.006535 Exclusion=0.022332 GlueRule=0 GoogleNgramSim=0 Identity=0 Independent=0.456621 Lex(e1|e2)=62.90141 Lex(e2|e1)=62.90141 Lexical=1 LogCount=0 MVLSASim=NA Monotonic=1 OtherRelated=0.404562 PhrasePenalty=1 RarityPenalty=0.36788 ForwardEntailment=0.109950 SourceTerminalsButNoTarget=0 SourceWords=1 TargetComplexity=0.99354 TargetFormality=1.00000 TargetTerminalsButNoSource=0 TargetWords=1 UnalignedSource=0 UnalignedTarget=0 WordCountDiff=0 WordLenDiff=2.00000 WordLogCR=0 ||| 0-0 ||| Independent

这里的目标词是reflects和understands。你知道吗

Tags： abstract score 文本文件 adjacent independent lhs e2 transplant

1条回答

网友

1楼 · 发布于 2024-10-02 22:36:20

是否按“| |”拆分？你知道吗

your_text.split(' ||| ')会给您一个元素列表，用“| | |”分隔

所以呢

your_text.split(' ||| ')[1:3]将返回['reflects','understands']

如何从文本文件中的每一行提取子字符串？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从文本文件中的每一行提取子字符串？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >