如何在lxi元素内部访问ml？

# get the raw HTML fruitsWebsite = lxml.html.parse( "http://pagetoscrape.com/data.html" ) # get all divs with class fruit fruits = fruitsWebsite.xpath( '//div[@class="fruit"]' ) # Print the name of this fruit (obtained from an <em> in the fruit div) for fruit in fruits: print fruit.xpath('//li[@class="fruit"]/em')[0].text

1条回答

网友

1楼 · 发布于 2024-10-04 09:25:25

以下代码适用于我的测试文件。在

#test.py
import lxml.html

# get the raw HTML
fruitsWebsite = lxml.html.parse('test.html')

# get all divs with class fruit 
fruits = fruitsWebsite.xpath('//div[@class="fruit"]')

# Print the name of this fruit (obtained from an <em> in the fruit div)
for fruit in fruits:
    #Use a relative path so we don't find ALL of the li/em elements several times. Note the .//
    for item in fruit.xpath('.//li[@class="fruit"]/em'):
        print(item.text)


#Alternatively
for item in fruit.xpath('//div[@class="fruit"]//li[@class="fruit"]/em'):
    print(item.text)

这是我用来再次测试的html文件。如果这对您再次测试的html不起作用，那么您需要发布一个示例文件，该文件失败了，正如我在上面的注释中所要求的那样。在

^{pr2}$

使用最初发布的代码，您肯定会得到太多的结果（内部循环将搜索整个树，而不是子树中的每个“水果”）。你所描述的错误没有多大意义，除非你的输入与我理解的不同。在

相关问题更多 >

编程相关推荐

热门问题

热门文章