在unicode和tex上使用re拆分python字符串 - 问答 - Python中文网

在unicode和tex上使用re拆分python字符串

2024-09-29 21:39:18 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试拆分一个基于unicode和文本（中文）标点符号的长字符串。我该怎么做？你知道吗

def split1(s):
    temp1 = re.split(r"(;|:|•|。|；|：)", s)
    temp = re.split(u"([\u3002|\uFF01|\uFF1F])", temp1)
    i = iter(temp)

更新：我希望根据常规文本和unicode文本分割字符串。你知道吗

Tags：字符串文本 re def unicode temp 常规 split

1条回答

网友

1楼 · 发布于 2024-09-29 21:39:18

你可以用

def split1(s): 
    return re.split(ur"([\u3002\uFF01\uFF1F;:•。；：])", s)

拆分这两种模式没有意义，因为使用它们的唯一目的是将字符串标记为与regex匹配的字符和不匹配的字符。你知道吗

捕获的文本也将成为结果列表的一部分，因为整个模式都用捕获组包装，请参见^{} docs：

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list

注意u前缀，它将告诉python2.x正确处理字符串中的Unicode代码单元。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章