替换字符串python中的特定字符串模式

import re pattern = re.compile(r"^U0001") sentence = 'U0001f308 U0001f64b The dark clouds disperse the hail subsides and one neon lit rainbow with a faint second arches across the length of the A u2026' print(pattern.match(sentence).group()) #this prints U0001 every time but what i want is ['U0001f308'] matches = re.findall(r"^\w+", sentence) print(matches) # This only prints the first match which is 'U0001f308'

1条回答

网友

1楼 · 发布于 2024-10-04 01:36:27

'U0001f30'是而不是表情符号码点！它是一个9个字符的字符串，以字母“U”开头

输入超过4个十六进制字符的unicode密码点的方法是\U0001f308。同样，输入4个十六进制字符的代码点：\u0001

但不能像查找常规字符串一样查找以“0001”开头的代码点。在我看来，您可能正在查找4个十六进制字符的代码点\u0001或范围\U00010000 - \U0001FFFF内的任何内容：

import re

sentence = '\U0001f308 \U0001f64b The dark clouds disperse the hail subsides and one neon lit rainbow with a faint second arches across the length of the A \u2026'

matches = re.findall('[\u0001\U00010000-\U0001FFFF]', sentence)
print(matches)

matches -> ['\U0001f308', '\U0001f64b']

如果出于某种原因，您确实有以“U”开头的字符串，而不是实际的代码点，那么：

matches = re.findall('U0001(?:[0-9a-fA-F]{4})?', sentence)

我还假设emojis可以位于字符串中的任何位置，并与任何其他字符相邻

相关问题更多 >

编程相关推荐

热门问题

热门文章