擅长:python、mysql、java
<p>既然您正在使用<code>lxml</code>,为什么不以更直接的方式使用它(人们认为<code>lxml</code>比<code>BeautifulSoup</code>快):</p>
<pre><code>import requests
from lxml import html
url='http://boxes.mysubscriptionaddiction.com/box/julep-maven'
source_code=requests.get(url)
tree = html.fromstring(source_code.content) #parses the html
paras = tree.xpath('//div[@class="box-information"]/p') #gets the para elements
# This loop prints the desired para elements' text.
for ele in paras[1:]:
print(ele.text_content())
</code></pre>
<p>输出:</p>
<pre><code>Listed in:
Nail Polish Subscription Boxes, Subscription Boxes for Beauty Products, Subscription Boxes for Women
Tagged in:
Makeup, Beauty, Nail polish
</code></pre>
<p>注意:该站点受captcha保护,因此您可能需要将源html作为字符串从浏览器的开发工具中复制出来,并在<code>tree = html.fromstring(copied_string)</code>中使用它以使此代码正常工作。你知道吗</p>