擅长:python、mysql、java
<p>假设你的所有网页都有相同的结构,你可以用这段代码解析html。它将查找id为<code>product_bullets_section</code>的第一个div。html中的id应该是唯一的,但是给定的网站有两个相等的id,因此我们通过切片获得第一个id,并将解析后的div转换回包含html的字符串。在</p>
<pre><code>import csv
import urllib.request
from bs4 import BeautifulSoup
with open("urls.csv", "r", newline="", encoding="cp1252") as f_input:
csv_reader = csv.reader(f_input, delimiter=";", quotechar="|")
header = next(csv_reader)
items = [row[0] for row in csv_reader]
items = ['https://www.kramerav.com/de/Product/VM-2N']
with open("results.csv", "w", newline="") as f_output:
csv_writer = csv.writer(f_output, delimiter=";")
for item in items:
html = urllib.request.urlopen(item).read()
the_div = str(BeautifulSoup(html).select('div#product_bullets_section')[0])
</code></pre>