为什么芬德尔先生会这样？（python正则表达式）

idx = 1 while True: try: hxp1 = "(//h3[@class='entry-title td-module-title']/a)[" + str(idx) + "]" text = WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.XPATH, hxp1))) # info = eg) 'Michael Jackson - Beat it [FLAC, MP3, WAV]' info = text.get_attribute('title') # get 'info' as string # ARTIST = eg) 'Michael Jackson' regex = ARTIST + ' - ' match = re.findall(regex, info) # or use re.search # do something with 'match'... idx += 1 except: # do something... break

1条回答

网友

1楼 · 发布于 2024-05-19 08:11:10

看来你需要确保你匹配

任何Unicode空格（即Python 3.x中的\s，或Python 2.x中的(?u)\s，请参见^{} documentation：“匹配Unicode空格字符（包括[ \t\n\r\f\v]，以及许多其他字符，例如许多语言的排版规则强制使用的不间断空格）。”
任何Unicode连字符（请参见Searching for all Unicode variation of hyphens in Python）

将所有这些合并到您的正则表达式中：

Minami\s[\u002D\u058A\u05BE\u1400\u1806\u2010-\u2015\u2E17\u2E1A\u2E3A\u2E3B\u2E40\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]\s

在您的情况下，如果您只需要支持en-dash/em-dash/hyhen字符和任何Unicode空白字符，您可以使用

Minami\s[-—–]\s

相关问题更多 >

编程相关推荐

热门问题

热门文章