我正在写一个程序,通过这个网站查找当天的国家美食节:https://foodimentary.com/today-in-national-food-holidays/may-holidays/
到目前为止,我一直都能得到带有当前日期的标签,但我很难将其作为获取相关食物日的基本参考。以下是我目前掌握的情况:
month = date.today().strftime('%b') # Get month
day = date.today().strftime('%d') # Get day
date = f'{month.lower()}-{day}' # Format date
# Get HTML from home page
url = 'https://foodimentary.com/today-in-national-food-holidays/todayinfoodhistorycalenderfoodnjanuary/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser') # Parse HTML with Beautiful Soup
# Get the current month URL
months = soup.find('ul', id='menu-months', class_='menu') # Isolate the months table
monthUrl = months.find('a', href=True, string=month)['href'] # Get the month URL for the current month
# Get HTML from month page, parse
r = requests.get(monthUrl)
soup = BeautifulSoup(r.text, 'html.parser')
# Find tag with URL that contains formatted date
holidayTag = soup.select_one(f'a[href*={date}]')
print(holidayTag)
# TODO: Get the name of the food day based on holidayTag
使用我的浏览器的开发人员控制台,将日期与食品假日名称关联起来的最一致模式似乎是假日始终是日期标记后的下一个文本实例。下面是一个HTML示例:
<div style="text-align:center;">
<strong><a title="May 29" href="https://foodimentaryguy.wordpress.com/2011/05/29/may-29/">May 29</a></strong><br>
<span style="color:#000000;"><a style="color:#000000;" href="https://foodimentary.com/2017/02/12/february-12th-is-national-biscotti-day/">National Biscuit Day</a></span>
<div style="text-align:center;"><strong><a title="May 28" href="https://foodimentaryguy.wordpress.com/2011/05/28/may-28/">May 28</a></strong><br>
<span style="color:#000000;"><a style="color:#000000;" href="https://foodimentary.com/2016/05/28/may-28-is-national-brisket-day/">National Brisket Day</a></span>
</div>
</div>
我的问题是:我怎样才能用美丽的汤从日期标签上得到节日的名称
此文本非常无结构(很可能是手工编写的,而不是机器生成的)。我建议使用
re
模块进行主解析:印刷品:
相关问题 更多 >
编程相关推荐