使用pandas在python中遍历多个excel文件

# create for loop for File in FileList: for x in File: # Import the excel file and call it xlsx_file xlsx_file = pd.ExcelFile(File) xlsx_file # View the excel files sheet names xlsx_file.sheet_names # Load the xlsx files Data sheet as a dataframe df = xlsx_file.parse('Data',header= None) # select important rows, df_NoHeader = df[4:] #then It does some more reformatting. '

2条回答

网友

1楼 · 编辑于 2024-05-21 09:04:16

我解决了我的问题。我没有使用glob函数，而是使用os.listdir读取所有excel工作表，遍历每个excel文件，重新格式化，然后将最终数据追加到表的末尾。

#first create empty appended_data table to store the info.
appended_data = []


for WorkingFile in os.listdir('C:\ExcelFiles'):
     if os.path.isfile(WorkingFile):

        # Import the excel file and call it xlsx_file
        xlsx_file = pd.ExcelFile(WorkingFile)
        # View the excel files sheet names
        xlsx_file.sheet_names
        # Load the xlsx files Data sheet as a dataframe
        df = xlsx_file.parse('sheet1',header= None)

        #.... do so reformating, call finished sheet reformatedDataSheet
        reformatedDataSheet
        appended_data.append(reformatedDataSheet)
appended_data = pd.concat(appended_data)

就这样，它做了我想要的一切。

网友

2楼 · 编辑于 2024-05-21 09:04:16

你需要改变

os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
         print(FileList)

只是

os.chdir('C:\ExcelWorkbooksFolder')
FileList = glob.glob('*.xlsx')
print(FileList)

为什么会这样？glob返回单个列表。既然你放了for FileList in glob.glob(...)，你就要一个一个地遍历这个列表，并将结果放入FileList。在循环的末尾，FileList是一个文件名-一个字符串。

执行此代码时：

for File in FileList:
    for x in File:

第一行将把File赋给最后一个文件名的第一个字符（作为字符串）。第二行将把x赋给File的第一个（也是唯一的）字符。这不太可能是有效的文件名，因此会引发错误。

相关问题更多 >

编程相关推荐

热门问题

热门文章