尝试获取多个excel电子表格,提取特定数据,将它们全部添加到一个数据框,并将其保存为csv文件

2024-05-18 05:14:10 发布

您现在位置:Python中文网/ 问答频道 /正文

非常新,所以请对我放松:)

尝试获取多个excel电子表格,从特定单元格提取特定数据,将它们全部添加到一个数据框,并将其保存为csv文件

csv输出仅包含上一个excel文件中的数据。请你帮忙好吗

 import pandas as pd
 import os
 from pathlib import Path

 ip = "//NETWORKLOCATION/In"
 op = "//NETWORKLOCATION/Out"

 file_exist = False
 dir_list = os.listdir(ip)
 print(dir_list)

 for xlfile in dir_list:
     if xlfile.endswith('.xlsx') or xlfile.endswith('.xls'):
         file_exist = True
         str_file = os.path.join(ip, xlfile)
         df1 = pd.read_excel(str_file)

         columns1 = {*VARIOUSDATA -* 
                     }

         #creates an empty dataframe for the data to all sequentially be added into
         df1a = pd.DataFrame([])

         #appends the array to the new dataframe df1a
         df1a = df1a.append(pd.DataFrame(columns1, columns = ['*VARIOUS COLUMNS*]))

         if not file_exist:
                 print('cannot find any valid excel file in the folder ' + ip)

                 print(str_file)

 df1a.to_csv('//NETWORKLOCATION/Out/Test.csv')
 print(df1a)

Tags: csvthe数据importiposdirexcel
3条回答

我想你应该说:

#creates an empty dataframe for the data to all sequentially be added into
df1a = pd.DataFrame([])

for xlfile in dir_list:循环不在循环内。 否则df1a会在每次文件迭代时重新创建空文件

有几件事。首先,您永远不会遇到:

if not file_exist:
                 print('cannot find any valid excel file in the folder ' + ip)

                 print(str_file)

正如所写的那样,因为它是一个嵌套的if语句,所以file_exists在到达它之前总是设置为true

  1. 您正在for循环中创建df1a。所以你总是把它设回空
  2. 为什么要导入路径,然后使用os.Path和os.listdir? 为什么不直接使用Path(ip).glob(“.xls”)

这看起来像:

import pandas as pd
import os
from pathlib import Path

ip = "//NETWORKLOCATION/In"
op = "//NETWORKLOCATION/Out"

#creates an empty dataframe for the data to all sequentially be added into
df1a = pd.DataFrame([])

for xlfile in Path(ip).glob('*.xls*'):
    df1 = pd.read_excel(xlfile)

    columns1 = {"VARIOUSDATA"}

    #appends the array to the new dataframe df1a
    df1a = df1a.append(pd.DataFrame(columns1, columns = ['VARIOUS_COLUMNS']))

if df1a.empty:
    print('cannot find any valid excel file in the folder ' + ip)
    print(str_file)
else:
    df1a.to_csv(op+'/Test.csv')
    print(df1a)

The csv output only contains the data from the last excel file.

for loop内创建df1a{}。每次读一个新的xlfile时,都会创建一个新的空DataFrame

在循环之前,必须将df1a = pd.DataFrame([])放在脚本的第9行

相关问题 更多 >