我有一些网址如下:
imsges =
<img class="wni-logo" src="https://smtgvs.weathernews.jp/s/topics/img/wnilogo_kana@2x.png"/>
<img alt="top" id="top_img" src="//smtgvs.weathernews.jp/s/topics/img/201808/201808170115_top_img_A.jpg?1534474260" style="width: 100%;"/>
<img alt="box0" id="box_img0" src="//smtgvs.weathernews.jp/s/topics/img/201808/201808170115_box_img0_A.png?1534474573" style="width:100%"/>
<img alt="box1" class="lazy" data-original="https://smtgvs.weathernews.jp" id="box_img1" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" style="width: 100%; display: none;"/>
<img alt="recommend thumb0" height="70" src="https://smtgvs.weathernews.jp/s/topics/thumb/article/201808080245_top_img_A_320x240.jpg?1534473603" width="100px"/>
我想得到如下结果:
['https://smtgvs.weathernews.jp/s/topics/img/201808/201808170115_top_img_A.jpg']
['https://smtgvs.weathernews.jp/s/topics/img/201808/201808170115_box_img0_A.png']
我试过这个代码:
for image in images:
imageURL = re.findall('https://smtgvs.weathernews.jp/s/topics/img/.+', urljoin(baseURL, image['src']))
if imageURL:
print(imageURL)
我得到了那些结果,你能帮我纠正一下吗
['https://smtgvs.weathernews.jp/s/topics/img/201808/201808170115_top_img_A.jpg?1534474260']
['https://smtgvs.weathernews.jp/s/topics/img/201808/201808170115_box_img0_A.jpg?1534474573']
['https://smtgvs.weathernews.jp/s/topics/img/dummy.png']
您可以直接使用捕获组更改regex
编辑:获取原始数据而不是src字段:
相关问题 更多 >
编程相关推荐