Python在for循环中给变量一个列表的每个索引的内容？

2024-05-17 02:52:40 发布

您现在位置：Python中文网/ 问答频道 /正文

965

网友

男 | 程序猿一只，喜欢编程写python代码。

每个人

我已经为我的英语道歉了。我有一个pythonwebscraper，一方面可以在单词列表中编写整个网站的文本，但也可以对网站的每个子域执行相同的操作。我设法读出了所有的子域和主页的文本，但无法读出子域的文本

我将所有子域打包成一个列表domains，然后想用for循环来更改url，每个过程都有一个不同的子域。但不是这样的

（您不必注意代码的下半部分，它只用于格式化文本！）

我希望他们理解我的问题：）

我的代码：

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = "https://test-domain.com/"
html = urlopen(url).read()
main_html = BeautifulSoup(html, features="html.parser")
subdomains = []
domains = [url]

for link in main_html.find_all("a"):
    subdomains.append(link.get("href"))

domains.extend(subdomains)

for x in domains:

    url = x
    print(url)

    # kill all script and style elements
    for script in main_html(["script", "style"]):
        script.extract()    # rip it out

    # get text
    text = main_html.get_text()


    # break into lines and remove leading and trailing space on each
    lines = (line.strip() for line in text.splitlines())
    # break multi-headlines into a line each
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    # drop blank lines
    text = '\n'.join(chunk for chunk in chunks if chunk)

    sonderzeichen = [",","...","!","?",".","[","]","{","}","|","#","&","*","/",":",";","+","-","_","=","<",">"]

    word_list = text.split()
    for elem in list(word_list):
        for x in sonderzeichen:
            if elem == x:
                word_list.remove(elem)

    word_list = [
        word[:-1] if word[-1] in sonderzeichen else word
        for word in word_list
    ]

    with open("word_list.txt", "w") as f:
        for elem in list(word_list):
            f.write("%s\n" % elem)  

    print(word_list)
    print("\nWordlist successfully generated!")

Tags：子域 text in 文本 url for main html

0条回答

目前没有回答

Python在for循环中给变量一个列表的每个索引的内容？

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python在for循环中给变量一个列表的每个索引的内容？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >