使用bs4查找链接

2024-09-30 22:16:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用bs4从脚本标记获取链接

这是我想从中删除链接的标签

html = """<script type="text/javascript">var player = new Clappr.Player({
    sources: ["https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkdqatbvqbc5axyv4dpuq/v.mp4","https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkyyarbvqbc5dtluomera/v.mp4"]

    poster: "image.jpg",
    width: "100%",
height: "100%",
disableVideoTagContextMenu: true,
    parentId: "#vplayer",
    events: {
    onReady: function() {  },
    }"""

如何获取“源”中的链接

link1 = "https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkdqatbvqbc5axyv4dpuq/v.mp4"

link2 = "https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkyyarbvqbc5dtluomera/v.mp4"

链接匹配,所以我只需要其中一个

注: doamin名称每次都会更改 所以我无法搜索example.com


Tags: texthttps标记脚本com链接examplehtml
1条回答
网友
1楼 · 发布于 2024-09-30 22:16:32
import re
html = """<script type="text/javascript">var player = new Clappr.Player({
    sources: ["https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkdqatbvqbc5axyv4dpuq/v.mp4","https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkyyarbvqbc5dtluomera/v.mp4"]

    poster: "image.jpg",
    width: "100%",
height: "100%",
disableVideoTagContextMenu: true,
    parentId: "#vplayer",
    events: {
    onReady: function() {  },
    }"""

match = re.findall(r"https.+?mp4", html)

print(match)

输出:

['https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkdqatbvqbc5axyv4dpuq/v.mp4', 'https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkyyarbvqbc5dtluomera/v.mp4']

match = re.search(r"sources: (\[.+\])", html).group(1)

print(match)

输出:

["https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkdqatbvqbc5axyv4dpuq/v.mp4","https://example.com/zx5x4vxkb52dxcne4zwsbbn6rpafhxnsodnlcjifkyyarbvqbc5dtluomera/v.mp4

相关问题 更多 >