使用Python将位于一个文件夹中的多个HTML文件解析为一个或多个CSV

2024-10-01 11:36:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一段Python代码,它解析来自多个HTML文件的数据,然后我需要将它们写入一个CSV文件(多个文件没关系)。但问题是,只有上一个HTML文件中的数据被保存到CSV中,我认为这是由于覆盖,因为它是同一个CSV文件,您能帮我解决这个问题吗

folder = "Folder Path"
for filename in os.listdir(folder):
    if filename.endswith('.html'):
        fname = os.path.join(folder, filename)
        print('Filename: {}'.format(fname))

        with open (fname, 'r') as f:
            soup = BeautifulSoup(f.read(), 'html.parser')
            info = soup.find_all('div' ,class_= 'panel-body')

            dataName = []
            dataPhone = []
            dataAdd = []
            dataCity = []

            for i in info:
                name = i.find('h2')
                address = i.find('p')
                city = i.find('strong')
                phone = i.find('label')
                
                dataName.append(name.string) 
                dataAdd.append(address.string)  
                dataCity.append(city.string)
                dataPhone.append(phone.string)

                print(dataName)
                print(dataPhone)
                print(dataAdd)
                print(dataCity)

                df = pd.DataFrame({"Name and Surname": dataName, "Address": dataAdd, "City": dataCity, "Number": dataPhone})
                df.to_csv('DataText.csv')

Tags: 文件csv数据stringhtmlfolderfindfilename
1条回答
网友
1楼 · 发布于 2024-10-01 11:36:21

完全未经测试,因为我现在很忙,但有一个在下面的尝试。 基本上,在循环之前声明列表,然后在退出循环之后写出CSV

dataName = []
dataPhone = []
dataAdd = []
dataCity = []

folder = "Folder Path"
for filename in os.listdir(folder):
    if filename.endswith('.html'):
        fname = os.path.join(folder, filename)
        print('Filename: {}'.format(fname))

        with open (fname, 'r') as f:
            soup = BeautifulSoup(f.read(), 'html.parser')
            info = soup.find_all('div' ,class_= 'panel-body')

            for i in info:
                name = i.find('h2')
                address = i.find('p')
                city = i.find('strong')
                phone = i.find('label')
                
                dataName.append(name.string) 
                dataAdd.append(address.string)  
                dataCity.append(city.string)
                dataPhone.append(phone.string)

df = pd.DataFrame({"Name and Surname": dataName, "Address": dataAdd, "City": dataCity, "Number": dataPhone})
df.to_csv('DataText.csv')

相关问题 更多 >