python中从链接中提取文本

2条回答

网友

1楼 · 编辑于 2024-10-02 08:16:28

您需要修改xpath，因为并不是所有的td元素都有{}。请尝试以下xpath表达式：//td//text()。在

import urllib
from lxml import etree

budgeturl = "http://www.the-numbers.com/movie/budgets/all"
s = urllib.urlopen(budgeturl).read()
htmlpage = etree.HTML(s)
htmltable = htmlpage.xpath("//td//text()")

输出：

网友

2楼 · 编辑于 2024-10-02 08:16:28

import urllib

budgeturl = "http://www.the-numbers.com/movie/budgets/all"
s = urllib.urlopen(budgeturl).read()

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""

s = find_between(s, '<table>', '</table>')

print s[:500]
print '.............................................................'
print s[-250:]

Find string between two substrings

退货：

^{pr2}$

.........................................

I need the text not the link.

通过http://www.convertcsv.com/html-table-to-csv.htm

Release Date,Movie,Production Budget,Domestic Gross,Worldwide Gross
1,12/18/2009,Avatar,"$425,000,000","$760,507,625","$2,783,918,982"
8/5/2005,My Date With Drew,"$1,100","$181,041","$181,041"

您可以使用beautifulsoup执行相同操作，请参见：

beautifulSoup html csv

相关问题更多 >

编程相关推荐

热门问题

热门文章

python中从链接中提取文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >