Python使用regex提取twitter文本数据中的@user和url链接

tweet_text = ['@galaxy5univ I like you', 'RT @BestOfGalaxies: Let's sit under the stars ...', '@jonghyun__bot .........((thanks)', 'RT @yosizo: thanks.ddddd <https://yahoo.com>', 'RT @LDH_3_yui: #fam, ccccc https://msn.news.com']

2条回答

网友

1楼 · 编辑于 2024-10-01 15:38:03

请注意，pn = re.compile(r'@(\S+)')正则表达式将捕获@之后的任何1+个非空白字符。在

要排除匹配的:，需要将速记\S类转换为[^\s]非字符类等价物，并向其添加:：

pn = re.compile(r'@([^\s:]+)')

现在，它将停止捕获第一个:之前的非空白符号。请参见regex demo。在

如果您需要捕获到最后一个:，您可以在捕获组之后添加::pn = re.compile(r'@(\S+):')。在

对于匹配regex的URL，有many on the Web，just choose最适合您。在

这是一个example code：

^{pr2}$

网友

2楼 · 编辑于 2024-10-01 15:38:03

如果用户名不包含特殊字符，则可以使用：

@([\w]+)

见Live demo

相关问题更多 >

编程相关推荐

热门问题

热门文章