如何修复这个正则表达式以捕获字符串的特定字符？

very_largeString= ''' Hola hola I 1 compis compis NCMS000 0.500006 ! ! Fat 1 esta este DD0FS0 0.986779 y y CC 0.999962 es ser VSIP3S0 1 que que CS 0.437483 es ser VSIP3S0 1 muy muy RG 1 sencilla sencillo AQ0FS0 1 de de SPS00 0.999984 utilizar utilizar VMN0000 1 , , Fc 1 que que CS 0.437483 si si CS 0.99954 nos nos PP1CP000 0.935743 ponen poner VMIP3P0 1 facilidad facilidad NCFS000 1 con con SPS00 1 las el DA0FP0 0.970954 tareas tarea NCFP000 1 de de SPS00 0.999984 la el DA0FS0 0.972269 casa casa NCFS000 0.979058 pues pues CS 0.998047 mejor mejor AQ0CS0 0.873665 que que PR0CN000 0.562517 mejor mejor AQ0CS0 0.873665 , , Fc 1 pero pero CC 0.999764 tan tan RG 1 antigua antiguo AQ0FS0 0.953488 que que CS 0.437483 según según SPS00 0.995943 mi mi DP1CSS 0.999101 madre madre NCFS000 1 era ser VSII1S0 0.491262 de de SPS00 0.999984 carga carga NCFS000 0.952569 superior superior AQ0CS0 0.992424 '''

2条回答

网友

1楼 · 编辑于 2024-09-28 21:33:52

我猜你想做一些自然语言处理，你想从一些西班牙语语料库中提取由a noun和a qualifier组成的对。已有用于此类任务的工具。你知道吗

我建议您看看Python Natural Language Tool Kit（NLTK）。你知道吗

另外，我不得不说，在语料库上执行这些操作并不是一项常见的任务，而是在完全自然的文本上执行这些操作。我认为你应该解释一下你的意图，也许你试图达成的解决方案并不是解决你实际问题的最佳方案。你知道吗

帮助我们帮助你。你知道吗

网友

2楼 · 编辑于 2024-09-28 21:33:52

from pprint import pprint
import re
result = re.findall(r'''
    (?mx)              # Muti-line, verbose
    ^                  # Align to beginning of a line
    (\S+)\s+           # Grab first word
    \S+\s+             # Don't care about 2nd word
    (NC\S+)\s+         # 3rd word must have NC
    \S+\n              # End of first line
    ^                  # Next line is identical in form
    (\S+)\s+           # to the first line
    \S+\s+       
    (AQ\S+)\s+         # except 3rd word must have AQ
    \S+\n
''', very_largeString)
pprint (result)

相关问题更多 >

编程相关推荐

热门问题

热门文章