回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我已经在多个地方看到了这个答案,但在我下面的脚本中似乎无法让它发挥作用,我想在这里解析几个页面,直到最后:</p>
<p>我的脚本应该在页面循环中,但是每当我把它放在里面时,就会出现缩进错误。这是否意味着我需要缩进整个脚本?或者是循环不适合我的脚本?在</p>
<pre><code>from bs4 import BeautifulSoup
import requests
page = 1
urldes = "https://www.johnpyeauctions.co.uk/lot_list.asp?saleid=4808&siteid=1&h=0&pageno={page}"
#"https://www.johnpyeauctions.co.uk/lot_list.asp?saleid=4740&siteid=1&h=0&pageno=14"
# add header
mozila_agent = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64)\
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'
headers = {'User-Agent': mozila_agent}
with requests.Session() as session:
while True:
response = session.get(urldes.format(page=page), headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
########## HOW TO parse the pages and collect the results here ?
if page is 3 : #soup.find('u') is None:
break # last page
page += 1
############################################################
the_whole_table = soup.find('table', width='97%')
datalist = []
for tr in the_whole_table.find_all('tr')[1:]:
# you want to start from the 1st item not the 0th so [1:]
# Because the first is the thead i.e. Lot no, Picture, Lot Title...
index_num = tr.find('td', width='8%')
picture_link = index_num.next_sibling.a['data-img']
text_info = tr.find('td', width='41%')
current_bid = tr.find('td', width='13%')
time_left = tr.find('td', width='19%')
datalist.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>([index_num.text, picture_link,
text_info.text, current_bid.text, time_left.text])
# for pic do ... print(picture_link) as for partial text only first 20
# characters
index = datalist[0][0]
picture = datalist[0][1]
info = datalist[0][2]
bid = datalist[0][3]
time = datalist[0][4]
df = ['Index Number', 'Picture', 'Informational text',
'Current BID', 'Time Left now']
theads = BeautifulSoup('<table style="width:50%; color: blue; font-family: verdana; font-size: 60%;"></table>', 'lxml')
thekeys = BeautifulSoup('<thead style="color: blue; font-family: verdana; font-size: 60%;"></thead>', 'html.parser')
#counter = 0
for i in df:
tag = theads.new_tag('th')
tag.append(i)
thekeys.thead.append(tag)
theads.table.append(thekeys)
###############################################################
# The code above will initiate a table
# after that the for loop will create and populate the first row (thead)
for i in datalist:
# thedata = BeautifulSoup('<tr style="color: blue; font-family: verdana; font-size: 50%;"></tr>', 'html.parser')
thedata = BeautifulSoup('<tr></tr>', 'html.parser')
# we loop through the data we collected
# initiate a <td> </td> tag everytime we finish with one collection
for j in i:
if j.startswith('https'):
img_tag = theads.new_tag('img', src=j, width='300')
td_tag = theads.new_tag('td')
td_tag.append(img_tag)
thedata.append(td_tag)
# counter += 1
else:
# tag = theads.new_tag('td', style="color: blue; font-family: verdana; font-size: 50%;")
tag = theads.new_tag('td')
tag.append(j)
thedata.append(tag)
# counter += 1
# if counter is 5:
# counter = 0
theads.table.append(thedata)
#print(counter)
css = "<style>{color: blue; font-family: verdana; font-size: 50%;}</style>"
#css.string = css
with open('asdf.html', 'w+') as f:
f.write(theads.prettify())
print(css)
# each of these if you print them you'll get a information that you can store
# to test do print(index_num.text, text_info.text)
</code></pre>
<p><a href="https://i.stack.imgur.com/ppzDD.jpg" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/ppzDD.jpg" alt="enter image description here"/></a></p>
<p>关于模板.htmlcss我可以看到我需要用收集到的数据填充一个模板,但是举例来说,如果我想分配“auction”的“value”中的5个元素,我似乎找不到一种分别分配每个元素的方法。当前的代码似乎循环通过每个值,但在我需要给每个值一个不同的类标记的情况下,我需要区分所有5个元素,我不知道怎么做。在</p>
^{pr2}$
<p>正如您在这里看到的,我可以附加标签,但相同的值是重复的,而不是循环通过每个值。在</p>