文件的Python名称不正确

2024-09-30 20:24:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个脚本,它读取一个html文件,并从这个文件中提取相关的行。但是我在打印文件名时遇到了一个问题。文件名为source1.html source2.html和source3.html。而是打印source2.html source3.html、source4.html。你知道吗

from bs4 import BeautifulSoup
import re
import os.path

n = 1
filename = "source"+str(n)+".html"
savefile = open('OUTPUT.csv', 'w')

while os.path.isfile(filename):
    n = n+1
    strjpgs = "Extracted Layers: \n \n"
    file = open(filename, "r")
    filename = "source"+str(n)+".html"


    soup = BeautifulSoup (file, "html.parser")

    thedata = soup.find("div", class_="cplayer")
    strdata = str(thedata)

    DoRegEx = re.compile('/([^/]+)\.jpg')
    jpgs = DoRegEx.findall(strdata)
    strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
    savefile.write(filename + '\n')
    savefile.write(strjpgs)

    print(filename)
    print(strjpgs)

savefile.close()
print "done"

Tags: 文件pathimportresourceoshtmlfilename
3条回答

您犯了一个逻辑错误,因为您在存储变量n之前增加了它。最简单的解决方案是将变量定义为0而不是1。下一个错误是,您从不关闭html文件,因此使用with open(“filename”,'w')as file:这会在超出范围时自动关闭您的文件,这更像python。你知道吗

from bs4 import BeautifulSoup
import re
import os.path

n = 1
filename = "source"+str(n)+".html"
savefile = open('OUTPUT.csv', 'w')

if os.path.isfile(filename):

    strjpgs = "Extracted Layers: \n \n"
    while True:
        with open(filename, "r") as file:
            filename = "source"+str(n)+".html"

            # parsing things...

            savefile.write(filename + '\n')
            savefile.write(strjpgs)

            print(filename)
            print(strjpgs)

        if filename == "source3.html":
            break
        else:
            n+=1

savefile.close()
print ("done")

将n定义为1,然后在函数中立即将其增量为2。当您到达打印(文件名)时,n为2,文件名已更改为“Source2.html”。移动打印或移动变量增量。你知道吗

您只需在循环开始时移动print语句,在循环结束时移动增量,就可以进行下一次迭代:

while os.path.isfile(filename):
    print(filename)

    strjpgs = "Extracted Layers: \n \n"
    file = open(filename, "r")
    filename = "source"+str(n)+".html"
    soup = BeautifulSoup (file, "html.parser")

    thedata = soup.find("div", class_="cplayer")
    strdata = str(thedata)

    DoRegEx = re.compile('/([^/]+)\.jpg')
    jpgs = DoRegEx.findall(strdata)
    strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
    savefile.write(filename + '\n')
    savefile.write(strjpgs)

    n = n+1
    print(strjpgs)

相关问题 更多 >