尝试获取多个excel电子表格，提取特定数据，将它们全部添加到一个数据框，并将其保存为csv文件

import pandas as pd import os from pathlib import Path ip = "//NETWORKLOCATION/In" op = "//NETWORKLOCATION/Out" file_exist = False dir_list = os.listdir(ip) print(dir_list) for xlfile in dir_list: if xlfile.endswith('.xlsx') or xlfile.endswith('.xls'): file_exist = True str_file = os.path.join(ip, xlfile) df1 = pd.read_excel(str_file) columns1 = {*VARIOUSDATA -* } #creates an empty dataframe for the data to all sequentially be added into df1a = pd.DataFrame([]) #appends the array to the new dataframe df1a df1a = df1a.append(pd.DataFrame(columns1, columns = ['*VARIOUS COLUMNS*])) if not file_exist: print('cannot find any valid excel file in the folder ' + ip) print(str_file) df1a.to_csv('//NETWORKLOCATION/Out/Test.csv') print(df1a)

3条回答

网友

1楼 · 编辑于 2024-05-18 05:14:10

我想你应该说：

#creates an empty dataframe for the data to all sequentially be added into
df1a = pd.DataFrame([])

前for xlfile in dir_list:循环不在循环内。否则df1a会在每次文件迭代时重新创建空文件

网友

2楼 · 编辑于 2024-05-18 05:14:10

有几件事。首先，您永远不会遇到：

if not file_exist:
                 print('cannot find any valid excel file in the folder ' + ip)

                 print(str_file)

正如所写的那样，因为它是一个嵌套的if语句，所以file_exists在到达它之前总是设置为true

您正在for循环中创建df1a。所以你总是把它设回空
为什么要导入路径，然后使用os.Path和os.listdir？为什么不直接使用Path（ip）.glob（“.xls”）

这看起来像：

import pandas as pd
import os
from pathlib import Path

ip = "//NETWORKLOCATION/In"
op = "//NETWORKLOCATION/Out"

#creates an empty dataframe for the data to all sequentially be added into
df1a = pd.DataFrame([])

for xlfile in Path(ip).glob('*.xls*'):
    df1 = pd.read_excel(xlfile)

    columns1 = {"VARIOUSDATA"}

    #appends the array to the new dataframe df1a
    df1a = df1a.append(pd.DataFrame(columns1, columns = ['VARIOUS_COLUMNS']))

if df1a.empty:
    print('cannot find any valid excel file in the folder ' + ip)
    print(str_file)
else:
    df1a.to_csv(op+'/Test.csv')
    print(df1a)

网友

3楼 · 编辑于 2024-05-18 05:14:10

The csv output only contains the data from the last excel file.

在for loop内创建df1a{}。每次读一个新的xlfile时，都会创建一个新的空DataFrame

在循环之前，必须将df1a = pd.DataFrame([])放在脚本的第9行

相关问题更多 >

编程相关推荐

热门问题

热门文章