如何在python中查找两个日期之间的URL

2024-09-30 01:23:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用imageURL = re.findall("(https://smtgvs.weathernews.jp/s/topics/img/[0-9]+/.+)\?[0-9]+", urljoin(baseURL, image['src']))来解析URL,如下所示

<img style="width:100%" id="box_img1" alt="box1" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201807/201807240125_box_img1_A.jpg?1534992203">
<img id="top_img" alt="top" style="width: 100%;" src="//smtgvs.cdn.weathernews.jp/s/topics/img/201808/201808010125_top_img_A.jpg?1534994171">
<img style="width:100%" id="box_img1" alt="box1" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201808/201808220125_box_img1_A.jpg?1534992203">

https://smtgvs.weathernews.jp/s/topics/img/201808/201808010125_top_img_A.jpg
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220125_box_img1_A.jpg

如果我想在20180801和20180831之间解析,如何修改关于芬德尔()以上?你知道吗


Tags: httpssrcboxidimgstyletopalt
1条回答
网友
1楼 · 发布于 2024-09-30 01:23:51

破案

>>> import re
>>> links = '''
    <img style="width:100%" id="box_img1" alt="box1" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201807/201807240125_box_img1_A.jpg?1534992203">
    <img id="top_img" alt="top" style="width: 100%;" src="//smtgvs.cdn.weathernews.jp/s/topics/img/201808/201808010125_top_img_A.jpg?1534994171">
    <img style="width:100%" id="box_img1" alt="box1" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201808/201808220125_box_img1_A.jpg?1534992203">
    '''
>>> re.findall("(https://smtgvs.weathernews.jp/s/topics/img/[0-9]+/.+)\?[0-9]+",links)

['https://smtgvs.weathernews.jp/s/topics/img/201807/201807240125_box_img1_A.jpg','https://smtgvs.weathernews.jp/s/topics/img/201808/201808220125_box_img1_A.jpg']

新案例:在20180801和20180831之间解析

两大变化:

  1. /.\.转义点字符
  2. 匹配由201808匹配的YYYYMM表达式

所以,新代码:

>>> links = '''
    <img style="width:100%" id="box_img1" alt="box1" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201807/201807240125_box_img1_A.jpg?1534992203">
    <img id="top_img" alt="top" style="width: 100%;" src="//smtgvs.cdn.weathernews.jp/s/topics/img/201808/201808010125_top_img_A.jpg?1534994171">
    <img style="width:100%" id="box_img1" alt="box1" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201808/201808220125_box_img1_A.jpg?1534992203">
    '''
>>> re.findall("([htps:]*?\/\/[^:\/\s]+[^#?\s]+\/s\/topics\/img\/201808\/[0-9a-zA-z]+\.+\w+)\?[0-9]+",links)

['//天气新闻.jp/s/topics/img/201808/2018010125\u top\u img\u A.jpg','https://smtgvs.weathernews.jp/s/topics/img/201808/201808220125_box_img1_A.jpg']

相关问题 更多 >

    热门问题