如何使用Beauty Soup在HTML中查找下一个文本实例?

2024-07-05 08:07:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在写一个程序,通过这个网站查找当天的国家美食节:https://foodimentary.com/today-in-national-food-holidays/may-holidays/

到目前为止,我一直都能得到带有当前日期的标签,但我很难将其作为获取相关食物日的基本参考。以下是我目前掌握的情况:

month = date.today().strftime('%b') # Get month
day = date.today().strftime('%d') # Get day
date = f'{month.lower()}-{day}' # Format date 

# Get HTML from home page
url = 'https://foodimentary.com/today-in-national-food-holidays/todayinfoodhistorycalenderfoodnjanuary/'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser') # Parse HTML with Beautiful Soup

# Get the current month URL
months = soup.find('ul', id='menu-months', class_='menu') # Isolate the months table
monthUrl = months.find('a', href=True, string=month)['href'] # Get the month URL for the current month

# Get HTML from month page, parse
r = requests.get(monthUrl)
soup = BeautifulSoup(r.text, 'html.parser')

# Find tag with URL that contains formatted date
holidayTag = soup.select_one(f'a[href*={date}]')
print(holidayTag)

# TODO: Get the name of the food day based on holidayTag

使用我的浏览器的开发人员控制台,将日期与食品假日名称关联起来的最一致模式似乎是假日始终是日期标记后的下一个文本实例。下面是一个HTML示例:



<div style="text-align:center;">
   <strong><a title="May&nbsp;29" href="https://foodimentaryguy.wordpress.com/2011/05/29/may-29/">May 29</a></strong><br>
   <span style="color:#000000;"><a style="color:#000000;" href="https://foodimentary.com/2017/02/12/february-12th-is-national-biscotti-day/">National Biscuit Day</a></span>
   <div style="text-align:center;"><strong><a title="May&nbsp;28" href="https://foodimentaryguy.wordpress.com/2011/05/28/may-28/">May 28</a></strong><br>
      <span style="color:#000000;"><a style="color:#000000;" href="https://foodimentary.com/2016/05/28/may-28-is-national-brisket-day/">National Brisket Day</a></span>
   </div>
</div>

我的问题是:我怎样才能用美丽的汤从日期标签上得到节日的名称


Tags: thehttpscomgettodaydatestylehtml
1条回答
网友
1楼 · 发布于 2024-07-05 08:07:17

此文本非常无结构(很可能是手工编写的,而不是机器生成的)。我建议使用re模块进行主解析:

import re
from bs4 import BeautifulSoup

url = 'https://foodimentary.com/today-in-national-food-holidays/may-holidays/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
txt = soup.select_one('section[role="main"]').text

out = {}
for day, names in re.findall(r'^([A-Z][^\n]+\d\s*)$(.*?)\n\n', txt, flags=re.DOTALL|re.M):
    out[day.strip()] = [name.replace('\xa0', ' ') for name in names.strip().split('\n')]

# pretty print on screen:
from pprint import pprint
pprint(out)

印刷品:

{'May 1': ['National Chocolate Parfait Day'],
 'May 10': ['National Liver and Onions Day'],
 'May 11': ['National “Eat What You Want” Day'],
 'May 12': ['National Nutty Fudge Day'],
 'May 13': ['National Apple Pie Day',
            'National Fruit Cocktail Day',
            'National Hummus Day'],
 'May 14': ['National Brioche Day', 'National Buttermilk Biscuit Day'],
 'May 15': ['National Chocolate Chip Day'],
 'May 16': ['National Barbecue Day'],
 'May 17': ['National Cherry Cobbler Day'],
 'May 18': ['National Cheese Souffle Day', 'I love Reese’s Day'],
 'May 19': ['National Devil’s Food Cake Day'],
 'May 2': ['National Chocolate Truffle Day'],
 'May 20': ['National Quiche Lorraine Day', 'National Pick Strawberries Day'],
 'May 21': ['National Strawberries and Cream Day'],
 'May 22': ['National Vanilla Pudding Day'],
 'May 23': ['National Taffy Day'],
 'May 24': ['National Escargot Day'],
 'May 25': ['National Brown-Bag-It Day', 'National Wine Day'],
 'May 26': ['National Blueberry Cheesecake Day', 'National Cherry Dessert Day'],
 'May 27': ['National Italian Beef Day', 'National Grape Popsicle Day'],
 'May 28': ['National Brisket Day'],
 'May 29': ['National Biscuit Day'],
 'May 3': ['National Raspberry Popover Day',
           'National Raspberry Tart Day',
           'National Chocolate Custard Day'],
 'May 30': ['National Mint Julep Day'],
 'May 31': ['National Macaroon Day'],
 'May 4': ['National Candied Orange Peel Day',
           'National Homebrew Day',
           'National Hoagie Day'],
 'May 5': ['National Enchilada Day – Happy Cinco de Mayo!'],
 'May 6': ['National Crepe Suzette Day'],
 'May 7': ['National Roast Leg of Lamb Day'],
 'May 8': ['National Coconut Cream Pie Day'],
 'May 9': ['National Shrimp Day', 'National Foodies Day*']}

相关问题 更多 >