擅长:python、mysql、java
<p>假设标题始终是每个表的第一行,则只需跳过每个表中的该行,但不包括第一行。一种简单的方法是将要处理的第一行存储在初始化为0的变量中,并在处理函数中将其设置为1。可能代码:</p>
<pre><code>def cpap_spider(max_pages):
page=1
start_row = 0
while page<=max_pages:
...
for link in soup.findAll("a", {"class":"facets-item-cell-grid-title"}):
...
each_item(href, start_row)
start_row = 1 # only first call to each_item will get start_row=1
print(href)
#print(title)
page+=1
...
def each_item(item_url, start_row):
...
table_rows= table.find_all('tr')
for row in table_rows[start_row:]: # skip first row if start_row==1
...
</code></pre>