如何用Python获取包含.ics文件的HTML href属性的URL？

import urllib2 import requests import bs4 def get_ics_url(url): #page = requests.get('https://meded.hms.harvard.edu/calendar').content page = requests.get(url).content soup = bs4.BeautifulSoup(page, 'lxml') links = soup.find_all('a') for link in links: if link.get('href')[-4:]=='.ics': endout = type(link.get('href')) print endout break

1条回答

网友

1楼 · 发布于 2024-06-15 06:15:51

代码中的break将在一次迭代后停止脚本，您需要再次缩进它，将其放入if（或者改用return）。目前，无论if的结果如何，它都会破坏for。你知道吗

第二个问题是有<a>元素没有href属性，这将导致脚本在到达任何.ics链接之前失败：

if link.get('href')[-4:]=='.ics':
TypeError: 'NoneType' object has no attribute '__getitem__'

例如：

<a name="main-content"></a>
<a class="cal-export" title="Note: Past events are not included">Export</a>

在对其执行数组操作之前，可以通过检查链接中的link.get('href') != None来修复此问题。你知道吗

固定代码：

import urllib2
import requests
import bs4

def get_ics_url(url):
    page = requests.get(url).content
    soup = bs4.BeautifulSoup(page, 'lxml')

    links = soup.find_all('a')

    for link in links:
        if link.get('href') != None and link.get('href')[-4:]=='.ics':
            endout = link.get('href')
            return endout

print get_ics_url('https://meded.hms.harvard.edu/calendar')

相关问题更多 >

编程相关推荐

热门问题

热门文章