<p>正如Joseph所提到的,<code>find_all</code>返回html元素的列表,循环遍历这些列表中的每个元素,然后对每个项应用<code>.text</code>方法。你知道吗</p>
<p>下面我使用列表理解来循环并应用<code>.text</code>方法。使用<code>strip()</code>删除任何尾随字符,如\t、\n等。。。你知道吗</p>
<p><strong>最终代码:</strong></p>
<pre><code>from urllib.request import urlopen
from bs4 import BeautifulSoup
import os, csv
from time import sleep
pages = ['https://www.dell.com/community/Inspiron/bd-p/Inspiron',
'https://www.dell.com/community/Inspiron/bd-p/Inspiron/page/2',
'https://www.dell.com/community/Inspiron/bd-p/Inspiron/page/3',
'https://www.dell.com/community/Inspiron/bd-p/Inspiron/page/4',
'https://www.dell.com/community/Inspiron/bd-p/Inspiron/page/5'
]
import requests
data = []
for page in pages:
r = requests.get(page)
soup = BeautifulSoup(r.content, 'html.parser')
rows = soup.select('tbody tr')
for row in rows:
d = dict()
d['title'] = [i.text.strip() for i in soup.find_all ('a', attrs = {'class': 'page-link lia-link-navigation lia-custom-event'})]
d['author'] = [i.text.strip() for i in soup.find_all ('span', attrs = {'class': 'login-bold'})]
d['time'] = [i.text.strip() for i in soup.find_all ('span', attrs = {'class': 'local-time'})]
d['kudos'] = [i.text.strip() for i in soup.find_all ('div', attrs = {'class': 'lia-component-messages-column-message-kudos-count'})]
d['messages'] = [i.text.strip() for i in soup.find_all ('div', attrs = {'class': 'lia-component-messages-column-message-replies-count'})]
d['views'] = [i.text.strip() for i in soup.find_all ('div', attrs = {'class': 'lia-component-messages-column-topic-views-count'})]
d['solved'] = [i.text.strip() for i in soup.find_all ('td', attrs = {'aria-label': 'triangletop lia-data-cell-secondary lia-data-cell-icon'})]
d['latest']= [i.text.strip() for i in soup.find_all ('span', attrs = {'cssclass': 'lia-info-area-item'})]
data.append(d)
sleep(10)
print(data[0])
</code></pre>
<p><strong>编辑:</strong>将其包含在代码中以将词典另存为csv。你知道吗</p>
<pre><code>import pandas as pd
pd.DataFrame.from_dict(data)
pd.head() # confirm if the data is correct
pd.to_csv('name.csv', index=False)
</code></pre>