使用beauthoulsoup更新HTML文件

# making the soup htmlDoc = open('test.html', "r+") soup = BeautifulSoup(htmlDoc) i = 0 #initialize counter for tag in soup.findAll(href=re.compile("data")): #match for href's with keyword data i += 1 print i print tag.get_text() text = tag.get_text() + "applications" g = pygoogle(text) g.pages = 1 # print '*Found %s results*'%(g.get_result_count()) if "http" in g.get_first_url(): print g.get_first_url() new_tag = soup.new_tag("a", href=g.get_first_url()) new_tag.string = tag.get_text() print new_tag tag.replace_with(new_tag) print "Remaining" print i htmlDoc.close() html = soup.prettify(soup.original_encoding) with open("test.html", "wb") as file: file.write(html)

1条回答

网友

1楼 · 发布于 2024-09-25 04:26:48

您已经创建了一个新的标记new_tag = soup.new_tag("a", href=g.get_first_url())，但实际上并没有将new_tag插入HTML代码中，而是将其分配给一个变量new_tag。在

您需要使用BeatifulSoup提供的insert()或{}方法，才能将标记实际放置在html中。在

或者，您可以使用以下命令重新分配链接的'href'：

htmlDoc = open('test.html', "r+")
soup = BeautifulSoup(htmlDoc)

i = 0 #initialize counter

for tag in soup.findAll(href=re.compile("data")): #match for href's with keyword data
    i += 1
    print i
    print tag.get_text()    
    text = tag.get_text() + "applications"
    g = pygoogle(text)
    g.pages = 1
    # print '*Found %s results*'%(g.get_result_count())
    if "http" in g.get_first_url(): 
        print g.get_first_url()
        new_tag['href'] = g.get_first_url()

相关问题更多 >

编程相关推荐

热门问题

热门文章