迭代更改列表长度并附加到另一个列表

2024-09-25 06:35:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我想迭代一个beautifulsoup对象,该对象根据找到的与HTML标记匹配的元素的数量更改长度

driver.get('https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418')
page_source = driver.page_source

soup = BeautifulSoup(page_source, 'html.parser')
recall_details = soup.find('table', class_ = 'table table-bordered table-condensed')

recalled_products = recall_details.find_all('td')
recalled_products

输出:

[<td>One Ocean</td>,
 <td>Sliced Smoked  Wild Sockeye Salmon</td>,
 <td>300 g</td>,
 <td>6 25984 00005 3</td>,
 <td>11253</td>]

我想迭代每个td元素并将其附加到如下列表中:

brands = []
products = []
sizes = []
upcs = []
codes = []

brand = recalled_products[0].text
product = recalled_products[1].text
size = recalled_products[2].text
upc = recalled_products[3].text
code = recalled_products[4].text
brands.append(brand)
products.append(product)
sizes.append(size)
upcs.append(upc)
codes.append(code)

print(brands)
print(products)
print(sizes)
print(upcs)
print(codes)

输出:

['One Ocean']
['Sliced Smoked  Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']

我尝试了以下代码,但没有得到预期的结果。我想我需要某种柜台

for i in range(len(recalled_products)):
    brand = recalled_products[i].text
    product = recalled_products[i].text
    size = recalled_products[i].text
    upc = recalled_products[i].text
    code = recalled_products[i].text
    brands.append(brand)
    products.append(product)
    sizes.append(size)
    upcs.append(upc)
    codes.append(code)

print(brands)
print(products)
print(sizes)
print(upcs)
print(codes)
```

Output:

```
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']

这是该网站的html代码示例 enter image description here

提前感谢您提供的任何帮助


Tags: textonetdproductsprintoceanappendsalmon
3条回答

这就是我获取标记的方式

from bs4 import BeautifulSoup
import requests

URL = "https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418"

brands = []
products = []
sizes = []
upcs = []
codes = []

page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

recall_details = soup.find("table", class_="table table-bordered table-condensed")

body = recall_details.find("tbody")

rows = body.find_all("tr")

for row in rows:
    data = row.find_all("td")
    brands.append(data[0].text)
    products.append(data[1].text)
    sizes.append(data[2].text)
    upcs.append(data[3].text)
    codes.append(data[4].text)

印刷品

['One Ocean']
['Sliced Smoked  Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']

我确实认为dict是比多个列表更好的数据结构,但当然这取决于您的用例

如果您想这样做,可以如下更改代码:


recalled = []

...

for row in rows:
    data = row.find_all("td")
    item = {
        "brand": data[0].text,
        "products": data[1].text,
        "sizes": data[2].text,
        "upcs": data[3].text,
        "codes": data[4].text,
    }
    recalled.append(item)

印刷品

[{'brand': 'One Ocean', 'products': 'Sliced Smoked  Wild Sockeye Salmon', 'sizes': '300\xa0g', 'upcs': '6\xa025984\xa000005\xa03', 'codes': '11253'}]

关于数据的问题是从

recalled_products = recall_details.find_all('td') 

A = [[<td>beef</td>,
     <td>250g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>],
     [<td>Salmon</td>,
     <td>300 g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>]]

b = [<td>beef</td>,
     <td>250g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>,
     <td>Salmon</td>,
     <td>300 g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>]

对于二维阵列,您希望使用索引二维阵列

for i in range(len(recalled_products)):
    brand = recalled_products[i][0].text
    product = recalled_products[i][1].text

对于B,您希望在迭代中使用一个步骤

    for i in range(0,len(recalled_products),4):
      brand = recalled_products[i].text
      product = recalled_products[i+1].text

在我看来,这似乎需要构建一个电子表格来保存需要存储的数据。您可以使用名为openpyxl的库来执行此操作,然后为品牌、产品、尺寸、UPC和代码创建列。然后将来自beautifulsoup对象的结果存储到电子表格中

相关问题 更多 >