使用Regex从URL中提取文件名需要排除一些字符

{"url": "http://res1.icourses.cn/share/process17//mp4/2017/3/17/6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4", "name": "1-课程导学"}, {"url": "http://res2.icourses.cn/share/process17//mp4/2017/3/17/a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4", "name": "2-计算机网络的定义与分类"}

3条回答

网友

1楼 · 编辑于 2024-10-03 13:30:15

基于您提供的字符串，您可以迭代字典，获取“url”的值并使用以下regex

([^\/]*)$

说明：

() - defines capturing group
[^\/] - Match a single character not present after the ^
\/ - matches the character / literally (case sensitive)
* - Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ - asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

例如：

^{pr2}$

在本例中，我们利用了文件名出现在字符串末尾的事实。使用$锚点使唯一有效的匹配终止字符串。在

如果您想对作为字符串转换的字典执行此操作，则可以通过更改结束条件来。就像这样([^\/]*?)\",。现在",终止匹配（注意\来转义"。见https://regex101.com/r/k9VwC6/25

最后，如果我们没有那么幸运，捕获组在字符串的末尾（这意味着我们不能使用$），我们可以使用一个否定的后面看。你可以读一读那些here。在

网友
2楼 · 编辑于 2024-10-03 13:30:15

您可以使用短regex [^/]*$
代码：
import re s = [{"url": "http://res1.icourses.cn/share/process17//mp4/2017/3/17/6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4", "name": "1-课程导学"}, {"url": "http://res2.icourses.cn/share/process17//mp4/2017/3/17/a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4", "name": "2-计算机网络的定义与分类"}] filenames = [re.findall('[^/]*$', i['url'])[0] for i in s] print(filenames)`
输出：
['6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4', 'a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4']
检查regex-https://regex101.com/r/k9VwC6/30

网友
3楼 · 编辑于 2024-10-03 13:30:15

您可以使用re.findall：

import re
s = [{"url": "http://res1.icourses.cn/share/process17//mp4/2017/3/17/6332c641-28b5-43a0-894c-972bd804f4e1_SD.mp4", "name": "1-课程导学"}, {"url": "http://res2.icourses.cn/share/process17//mp4/2017/3/17/a21902b6-8680-4bdf-8f47-4f99d1354475_SD.mp4", "name": "2-计算机网络的定义与分类"}]
filenames = [re.findall('(?<=/)[\w\-\_]+\.mp4', i['url'])[0] for i in s]

输出：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章