用BeautifulSoup解析多个页面？问题的回答

用BeautifulSoup解析多个页面？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>我已经在多个地方看到了这个答案，但在我下面的脚本中似乎无法让它发挥作用，我想在这里解析几个页面，直到最后：</p> <p>我的脚本应该在页面循环中，但是每当我把它放在里面时，就会出现缩进错误。这是否意味着我需要缩进整个脚本？或者是循环不适合我的脚本？在</p> <pre><code>from bs4 import BeautifulSoup import requests page = 1 urldes = "https://www.johnpyeauctions.co.uk/lot_list.asp?saleid=4808&siteid=1&h=0&pageno={page}" #"https://www.johnpyeauctions.co.uk/lot_list.asp?saleid=4740&siteid=1&h=0&pageno=14" # add header mozila_agent = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64)\ AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36' headers = {'User-Agent': mozila_agent} with requests.Session() as session: while True: response = session.get(urldes.format(page=page), headers=headers) soup = BeautifulSoup(response.content, "html.parser") ########## HOW TO parse the pages and collect the results here ? if page is 3 : #soup.find('u') is None: break # last page page += 1 ############################################################ the_whole_table = soup.find('table', width='97%') datalist = [] for tr in the_whole_table.find_all('tr')[1:]: # you want to start from the 1st item not the 0th so [1:] # Because the first is the thead i.e. Lot no, Picture, Lot Title... index_num = tr.find('td', width='8%') picture_link = index_num.next_sibling.a['data-img'] text_info = tr.find('td', width='41%') current_bid = tr.find('td', width='13%') time_left = tr.find('td', width='19%') datalist.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>([index_num.text, picture_link, text_info.text, current_bid.text, time_left.text]) # for pic do ... print(picture_link) as for partial text only first 20 # characters index = datalist[0][0] picture = datalist[0][1] info = datalist[0][2] bid = datalist[0][3] time = datalist[0][4] df = ['Index Number', 'Picture', 'Informational text', 'Current BID', 'Time Left now'] theads = BeautifulSoup('<table style="width:50%; color: blue; font-family: verdana; font-size: 60%;"></table>', 'lxml') thekeys = BeautifulSoup('<thead style="color: blue; font-family: verdana; font-size: 60%;"></thead>', 'html.parser') #counter = 0 for i in df: tag = theads.new_tag('th') tag.append(i) thekeys.thead.append(tag) theads.table.append(thekeys) ############################################################### # The code above will initiate a table # after that the for loop will create and populate the first row (thead) for i in datalist: # thedata = BeautifulSoup('<tr style="color: blue; font-family: verdana; font-size: 50%;"></tr>', 'html.parser') thedata = BeautifulSoup('<tr></tr>', 'html.parser') # we loop through the data we collected # initiate a <td> </td> tag everytime we finish with one collection for j in i: if j.startswith('https'): img_tag = theads.new_tag('img', src=j, width='300') td_tag = theads.new_tag('td') td_tag.append(img_tag) thedata.append(td_tag) # counter += 1 else: # tag = theads.new_tag('td', style="color: blue; font-family: verdana; font-size: 50%;") tag = theads.new_tag('td') tag.append(j) thedata.append(tag) # counter += 1 # if counter is 5: # counter = 0 theads.table.append(thedata) #print(counter) css = "<style>{color: blue; font-family: verdana; font-size: 50%;}</style>" #css.string = css with open('asdf.html', 'w+') as f: f.write(theads.prettify()) print(css) # each of these if you print them you'll get a information that you can store # to test do print(index_num.text, text_info.text) </code></pre> <p><a href="https://i.stack.imgur.com/ppzDD.jpg" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/ppzDD.jpg" alt="enter image description here"/></a></p> <p>关于模板.htmlcss我可以看到我需要用收集到的数据填充一个模板，但是举例来说，如果我想分配“auction”的“value”中的5个元素，我似乎找不到一种分别分配每个元素的方法。当前的代码似乎循环通过每个值，但在我需要给每个值一个不同的类标记的情况下，我需要区分所有5个元素，我不知道怎么做。在</p> ^{pr2}$ <p>正如您在这里看到的，我可以附加标签，但相同的值是重复的，而不是循环通过每个值。在</p>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

用BeautifulSoup解析多个页面？

1 个回答

相关Python问题