迭代更改列表长度并附加到另一个列表

driver.get('https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418') page_source = driver.page_source soup = BeautifulSoup(page_source, 'html.parser') recall_details = soup.find('table', class_ = 'table table-bordered table-condensed') recalled_products = recall_details.find_all('td') recalled_products

brands = [] products = [] sizes = [] upcs = [] codes = [] brand = recalled_products[0].text product = recalled_products[1].text size = recalled_products[2].text upc = recalled_products[3].text code = recalled_products[4].text brands.append(brand) products.append(product) sizes.append(size) upcs.append(upc) codes.append(code) print(brands) print(products) print(sizes) print(upcs) print(codes)

for i in range(len(recalled_products)): brand = recalled_products[i].text product = recalled_products[i].text size = recalled_products[i].text upc = recalled_products[i].text code = recalled_products[i].text brands.append(brand) products.append(product) sizes.append(size) upcs.append(upc) codes.append(code) print(brands) print(products) print(sizes) print(upcs) print(codes) ``` Output: ``` ['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253'] ['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253'] ['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253'] ['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253'] ['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']

3条回答

网友

1楼 · 编辑于 2024-09-25 06:35:45

这就是我获取标记的方式

from bs4 import BeautifulSoup
import requests

URL = "https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418"

brands = []
products = []
sizes = []
upcs = []
codes = []

page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

recall_details = soup.find("table", class_="table table-bordered table-condensed")

body = recall_details.find("tbody")

rows = body.find_all("tr")

for row in rows:
    data = row.find_all("td")
    brands.append(data[0].text)
    products.append(data[1].text)
    sizes.append(data[2].text)
    upcs.append(data[3].text)
    codes.append(data[4].text)

印刷品

['One Ocean']
['Sliced Smoked  Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']

我确实认为dict是比多个列表更好的数据结构，但当然这取决于您的用例

如果您想这样做，可以如下更改代码：


recalled = []

...

for row in rows:
    data = row.find_all("td")
    item = {
        "brand": data[0].text,
        "products": data[1].text,
        "sizes": data[2].text,
        "upcs": data[3].text,
        "codes": data[4].text,
    }
    recalled.append(item)

印刷品

[{'brand': 'One Ocean', 'products': 'Sliced Smoked  Wild Sockeye Salmon', 'sizes': '300\xa0g', 'upcs': '6\xa025984\xa000005\xa03', 'codes': '11253'}]

网友

2楼 · 编辑于 2024-09-25 06:35:45

关于数据的问题是从

recalled_products = recall_details.find_all('td') 

A = [[<td>beef</td>,
     <td>250g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>],
     [<td>Salmon</td>,
     <td>300 g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>]]

或

b = [<td>beef</td>,
     <td>250g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>,
     <td>Salmon</td>,
     <td>300 g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>]

对于二维阵列，您希望使用索引二维阵列

for i in range(len(recalled_products)):
    brand = recalled_products[i][0].text
    product = recalled_products[i][1].text

对于B，您希望在迭代中使用一个步骤

    for i in range(0,len(recalled_products),4):
      brand = recalled_products[i].text
      product = recalled_products[i+1].text

网友

3楼 · 编辑于 2024-09-25 06:35:45

在我看来，这似乎需要构建一个电子表格来保存需要存储的数据。您可以使用名为openpyxl的库来执行此操作，然后为品牌、产品、尺寸、UPC和代码创建列。然后将来自beautifulsoup对象的结果存储到电子表格中

相关问题更多 >

编程相关推荐

热门问题

热门文章