从python pandas中多个目录中的多个excel文件中提取数据

import os import pandas as pd import numpy as np # Find file names in the specified directory loc = 'E:\Data Science\Macros\ZBILL_Dump\Apr17\\' files = os.listdir(loc) # Find the ONLY Excel files files_xlsx = [f for f in files if f[-4:] == 'xlsx'] # Create empty dataframe and read in new data zbill = pd.DataFrame() for f in files_xlsx: New_data = pd.read_excel(os.path.normpath(loc + f), 'Sheet1') zbill = zbill.append(New_data) zbill.head()

2条回答

网友

1楼 · 编辑于 2024-10-04 05:33:40

使用glob及其递归功能搜索子目录：

import glob
files = glob.glob('E:\Data Science\Macros\ZBILL_Dump\**\*.xlsx', recursive=True)

文档：https://docs.python.org/3/library/glob.html

网友

2楼 · 编辑于 2024-10-04 05:33:40

你可以用glob。在

import glob
import pandas as pd

# grab excel files only
pattern = 'E:\Data Science\Macros\ZBILL_Dump\Apr17\\*.xlsx'

# Save all file matches: xlsx_files
xlsx_files = glob.glob(pattern)

# Create an empty list: frames
frames = []

#  Iterate over csv_files
for file in xlsx_files:

    #  Read xlsx into a DataFrame
    df = pd.read_xlsx(file)

    # Append df to frames
    frames.append(df)

# Concatenate frames into dataframe
zbill = pd.concat(frames)

如果要查找不同的子目录，可以使用regex。使用'filepath/*/*.xlsx'搜索下一级。此处提供更多信息https://docs.python.org/3/library/glob.html

相关问题更多 >

编程相关推荐

热门问题

热门文章