<p><a href="https://i.stack.imgur.com/AkqiQ.jpg" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/AkqiQ.jpg" alt="enter image description here"/></a>我试图从链接中获取一些数据:<a href="http://www.airlinequality.com/airline-reviews/vietjetair/?sortby=post_date%3ADesc&pagesize=100" rel="nofollow noreferrer">http://www.airlinequality.com/airline-reviews/vietjetair/?sortby=post_date%3ADesc&pagesize=100</a>
例如,我试图用beauthoulsoup提取每个评审员的姓名,但这不起作用。我以前尝试过用其他网站美化组,它工作完美!我不知道发生了什么。你能帮助我吗。代码如下:</p>
<pre><code>from bs4 import BeautifulSoup
import os
import urllib.request
file1 = open(os.path.expanduser(r"~/Desktop/Skytrax Reviews1.csv"), "wb")
file1.write(b"Reviewer" + b"\n")
WebSites = ["http://www.airlinequality.com/airline-reviews/vietjetair/?sortby=post_date%3ADesc&pagesize=100"]
# looping through each site until it hits a break. I will create a loop. It is not ready yet
for theurl in WebSites:
thepage = urllib.request.urlopen(theurl)
print(thepage)
soup = BeautifulSoup(thepage,'lxml')
print(soup) #<-------This is the main problem
#Maybe it is not correct too but the main problem is at the above lines
for Reviewer in soup.findAll(attrs={"class": "text_sub_header userStatusWrapper"}).text:
print(Reviewer)
Record1 = Reviewer
file1.write(bytes(Record1, encoding="ascii", errors='ignore') + b"\n")
file1.close()
</code></pre>