以及如何保存每一行的新文本或读取每一行的CSV - 问答 - Python中文网

以及如何保存每一行的新文本或读取每一行的CSV

2024-09-27 17:38:28 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我有一个独特的问题，我以为我已经解决了，直到我撞到墙使用While循环来控制这个程序的流程。你知道吗

简介：

我有一个平面文件（CSV或文本），其中有一些我想删除的URL，使用BeautifulSoup在HTML中添加一个新的标记（有效），然后将每个删除的URL保存到一个新的文件名中。你知道吗

我需要的是：

迭代每行
获取URL
刮取每个URL的页面
附加新的HTML标记
保存文件，如果可能，使用HTML文件的名称
重新启动同一程序，使其转到下一行。你知道吗

我很确定这和我不能理解基本的东西有关，我仍然在努力解决这个问题。这是我的密码：

怎么了：

使用Python3，代码实际上是可以工作的，我使用Jupyter逐行观察代码和一系列print语句，以查看While循环运行时get返回了什么。你知道吗

问题是只保存了一个文件，文件末尾的URL是唯一保存的内容。其他URL被刮除。你知道吗

在进入下一行之前，如何对每一行进行迭代和刮取以唯一地保存？我是否错误地使用了这些结构？你知道吗

网址：

https://www.imgacademy.com/media/headline/img-academy-alumna-jacqueline-bendrick-ready-tee-against-men-golfbc-championship

https://www.imgacademy.com/media/headline/img-academy-u19-girls-win-fysa-state-cup-u19-championship

https://www.imgacademy.com/media/headline/img-academy-celebrates-largest-commencement-ceremony-date-200-ascenders-earn

代码：

import csv
import requests
from bs4 import BeautifulSoup as BS

filename = 'urls.csv'

with open(filename, 'r+') as file:


    while True:

        line = file.readline()

        user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0'

        headers = {'User-Agent':user_agent}

        response = requests.get(line, headers)

        print(response)

        soup = BS(response.content, 'html.parser')

        html = soup

        title = soup.find('title')
        meta = soup.new_tag('meta')
        meta['name'] = "robots"
        meta['content'] = "noindex, nofollow"
        title.insert_after(meta)

        for i 
        with open('{}'".txt".format("line"), 'w', encoding='utf-8') as f:
            outf.write(str(html))

            if (line) == 0:
                break

Tags：文件代码 https com url img html www

1条回答

网友

1楼 · 发布于 2024-09-27 17:38:28

filename = 'urls.csv'

with open(filename, 'r+') as file:

    #line = line.replace('\n', '')

    print(line)

    for index, line  in enumerate(file):

        user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0'

        headers = {'User-Agent':user_agent}

        print(headers)

        response = requests.get(line, headers)

        print(response)

        soup = BS(response.content, 'html.parser')

        html = soup

        title = soup.find('title')
        meta = soup.new_tag('meta')
        meta['name'] = "robots"
        meta['content'] = "noindex, nofollow"
        title.insert_after(meta)

        with open('{}.html'.format(line[41:]), 'w', encoding='utf-8') as f:
            f.write(str(html))

相关问题更多 >

编程相关推荐

热门问题

热门文章