<p>下面的解决方案只是对上面@DavidHagan的答案的一个小改动。</p>
<p>这一个包含一个列,用于标识读取的文件,不象<strong>F0、F1等。</strong>
每个文件的单号为<strong>S0、S1等。</strong>
这样我们就可以知道排是从哪里来的。</p>
<pre><code>import numpy as np
import pandas as pd, datetime as dt
import glob, os
import sys
runDir = r'c:\blah\blah'
if os.getcwd() != runDir:
os.chdir(runDir)
files = glob.glob(r'*.*xls*')
df = pd.DataFrame()
#fno is 0, 1, 2, ... (for each file)
for fno, each in enumerate(files):
sheets = pd.ExcelFile(each).sheet_names
# sno iss 0, 1, 2, ... (for each sheet)
for sno, sheet in enumerate(sheets):
FileNo = 'F' + str(fno) #F0, F1, F2, etc.
SheetNo = 'S' + str(sno) #S0, S1, S2, etc.
# print FileNo, SheetNo, each, sheet #debug info
#header = None if you don't want header or take this out.
#dfxl is dataframe of each xl sheet
dfxl = pd.read_excel(each, sheet, header=None)
#add column of FileNo and SheetNo to the dataframe
dfxl['FileNo'] = FileNo
dfxl['SheetNo'] = SheetNo
#now add the current xl sheet to main dataframe
df = df.append(dfxl)
</code></pre>
<p>在做了以上操作之后。。i、 e.将多个XL文件和工作表读入单个数据帧(df)。。。你可以这么做。。要从每个文件、工作表组合中获取样本行。。示例将在dataframe(dfs1)中可用。</p>
<pre><code>#get unique FileNo and SheetNo in dft2
dft2 = df.loc[0,['FileNo', 'SheetNo']]
#empty dataframe to collect sample from each of the read file/sheets
dfs1 = pd.DataFrame()
#loop through each sheet and fileno names
for row in dft2.itertuples():
#get a sample from each file to view
dfts = df[(df.FileNo == row[1]) & (df.SheetNo ==row[2])].sample(1)
#append the 1 sample to dfs1. this will have a sample row
# from each xl sheet and file
dfs1 = dfs1.append(dfts, ignore_index = True)
dfs1.to_clipboard()
</code></pre>