如何使用匹配文本的确切单词注册公司

2024-10-01 22:37:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我想找到一个模式重新编译)对于这样的字眼

想象一下这样的话[以太,高原,远古,西部]

哪种捕捉单词或带标点符号的单词,在某种程度上,我可以在空间中使用它,我使用了这个,但它不起作用






regex_patterns = [

re.compile(r'aether?,|altitude?,|aphelion?,|apside?,|apsis?,|ascension?,|autumnal equinox?,|east?.|eastward?,|eclipse?,|ecliptic?,|elliptical?,|epicycle?,|equinoctical?,|exquinox?,|fixed star?,|latitude?,|longitude?s|mean ecliptic?,|meridian?,|mobile star?,|node?,|nodes?,|north?,|octant?,|orbit?,|\borbital?,|\bparallax?,|\brays?,|\bretrograde?,|rise?,|sidereal?,|sidereal position?,|solstice?,|south?,|star?,|vernal equinox?,|west?,')
                                          ]

如果regex捕获“word”和“word”(word+标点符号),那就太好了 就像这句话

“西边,可以看看”

结果应该是

西边


Tags: re模式空间单词regexpatternswordstar
3条回答

如果我们希望匹配特定的单词,我们可能希望从以下类似的表达式开始:

(aether|altitude|aphelion|apside|apsis|ascension|autumnal equinox|east|eastward|eclipse|ecliptic|elliptical|epicycle|equinoctical|exquinox|fixed star|latitude|longitudes?|mean ecliptic|meridian|mobile star|nodes?|north|octant|orbit|\borbital\b|\bparallax\b|\brays\b|\bretrograde\b|rise|sidereal|sidereal position|solstice|south|star|vernal equinox|west),?

Demo 1

然后通过在char类中添加所需的标点来修改它:

[,:;\.]?

我们的表达方式可能是:

(aether|altitude|aphelion|apside|apsis|ascension|autumnal equinox|east|eastward|eclipse|ecliptic|elliptical|epicycle|equinoctical|exquinox|fixed star|latitude|longitudes?|mean ecliptic|meridian|mobile star|nodes?|north|octant|orbit|\borbital\b|\bparallax\b|\brays\b|\bretrograde\b|rise|sidereal|sidereal position|solstice|south|star|vernal equinox|west)[,:;\.]?

Demo 2

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(aether|altitude|aphelion|apside|apsis|ascension|autumnal equinox|east|eastward|eclipse|ecliptic|elliptical|epicycle|equinoctical|exquinox|fixed star|latitude|longitudes?|mean ecliptic|meridian|mobile star|nodes?|north|octant|orbit|\borbital\b|\bparallax\b|\brays\b|\bretrograde\b|rise|sidereal|sidereal position|solstice|south|star|vernal equinox|west),?"

test_str = ("aether\n"
    "altitude\n"
    "aphelion\n"
    "apside\n"
    "apsis\n"
    "ascension\n"
    "autumnal equinox\n"
    "east?.\n"
    "eastward\n"
    "eclipse\n"
    "ecliptic\n"
    "elliptical\n"
    "epicycle\n"
    "equinoctical\n"
    "exquinox\n"
    "fixed star\n"
    "latitude\n"
    "longitude\n"
    "longitudes\n"
    "mean ecliptic\n"
    "meridian\n"
    "mobile star\n"
    "node\n"
    "nodes\n"
    "north\n"
    "octant\n"
    "orbit\n"
    "orbital\n"
    "parallax\n"
    "rays\n"
    "retrograde\n"
    "rise\n"
    "sidereal\n"
    "sidereal position\n"
    "solstice\n"
    "south\n"
    "star\n"
    "vernal equinox\n"
    "west")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

试试这个正则表达式:

'(word|other|foo|bar)+[\,\.]?'

这将匹配wordword,foofoo,以及其他包含或不包含标点字符,.或其他您添加的单词。你知道吗

label = "PLAN"
texts = texts 
regex_patterns = [
re.compile(r'(\bAldebaran\b|\bAlphard\b|\bAntares\b|\bArcturus\b|\bBack of Leo\b|\bBeta Leonis\b|\bBeta Scorpii\b|\bBeta Tauri\b|\bBetelgeuse\b|\bcanis\b|\bCanis Minor\b|\bCor Leonis,7\b|\bCor Leonis\b|\bCor Scorpii,10\b|\bCor Scorpii\b|\bDenebola\b|\bdog\b|\bEpsilon Virginis\b|\bErichthonius\b|\bAldebaran\b|\bAlphard|\bAntares\b|\bArcturus\b|\bBack of Leo\b|\bBeta Leonis\b|\bBeta Scorpii\b|\b\Beta Tauri\b|\bBetelgeuse\b|\bcanis\b|\bCanis Minor\b|\bCor Leonis,7\b|\bCor Leonis\b|\bCor Scorpii\b|\bDenebola\b|\bdog\b|\bEpsilon Virginis\b|\bErichthonius\b|\bDenebola\b|\bdog\b|\bEpsilon Virginis\b|\bErichthonius\b|\bHeart of Hydra,8|\bHeart of Hydra\b|\bHydrae\b|\bKappa Geminorum\b|\bLambda Leonis\b|\bNeck of Leo\b|\bOrion\b|\bPalilicium\b|\bPolaris\b|\bPollux\b|\bProcyon\b|\bRegulus\b|\bSpica Virginis\b|\bTail of Leo\b|\bUrsa\b|\bUrsa Major\b|\bVindemiatrix\b|\bZeta Leonis\b)[:,]?')
]

当我面对双标点符号(或前后不一致)时,我捕捉到如下所有单词:

\bCor Scorpii,10

相关问题 更多 >

    热门问题