如何将多行excel数据合并为一行？

import openpyxl from openpyxl import Workbook path = "sample.xlsx" wb = openpyxl.load_workbook(path) ws = wb.active path2 = "output.xlsx" wb2 = Workbook() ws2 = wb2.active listab = [] rows = ws.max_row columns = ws.max_column for i in range (1, rows+1): listab.append([]) cellValue = " " prevCell = " " for c in range (1, rows+1): for r in range(1, columns+1): cellValue = ws.cell(row=r, column=c).value if cellValue == prevCell: listab[r-1].append(prevCell) elif cellValue == "NULL": listab[r-1].append(prevCell) elif cellValue != prevCell: listab[r-1].append(cellValue) prevCell = cellValue for r in range(1, rows+1): for c in range (1, columns+1): j = ws2.cell(row = r, column=c) j.value = listab[r-1][c-1] print(listab) wb2.save("output.xlsx")

3条回答

网友

1楼 · 编辑于 2024-10-02 04:32:32

我建议使用pandas库来实现这一点，然后您可以轻松地进行任何类型的转换。在

import pandas as pd

exceldata = pd.read_excel('tmp.xlsx', index_col=0) 

print(exceldata)

您可以轻松删除null/na value，也可以替换它并将其导出为excel格式。在

帮助参考：

Reading Excel

Drop Na Value

Replace NA Value

网友

2楼 · 编辑于 2024-10-02 04:32:32

老实说，我认为您已经被数据结构弄糊涂了，并想出了比您需要的复杂得多的东西。在

一种适合的方法是为每个服务使用Python字典，逐行更新它们。在

wb = load_workbook("sample.xlsx")
ws = wb.active
objs = {}
headers = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
for row in ws.iter_rows(min_row=2, values_only=True):
    if row[0] not in objs:
        obj = {key:value for key, value in zip(headers, row)}
        objs[obj['ods_service_id']] = obj
    else:# update dict with non-None values
        extra = {key:value for key, value in zip(headers[3:], row[3:]) if value != "NULL"}
        obj.update(extra)

# write to new workbook
wb2 = Workbook()
ws2 = wb2.active
ws2.append(headers)
for row in objs.values(): # do they need sorting?
    ws2.append([obj[key] for key in headers])

请注意如何不用计数器就可以做任何事情。在

网友

3楼 · 编辑于 2024-10-02 04:32:32

就我个人而言，我会选择pandas。在

import pandas as pd

#Loading into pandas
df_data = pd.read_excel('sample.xlsx')
df_data.fillna("NO DATA",inplace=True)  ## Replaced nan values with "NO DATA"
unique_ids = df_data.ods_service_ids.unique()

#Storing pd into a list
records_list = df_data.to_dict('records') 
keys_to_check = ['service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']
processed = {}

#Go through unique ids
for key in unique_ids:
    processed[key] = {}

    #Get related records
    matching_records = [y for y in records_list if y['ods_service_ids'] == key]
    #Loop through records
    for record in matching_records:
        #For each key to check, save in dict if non null
        processed[key]['ods_service_ids'] = key
        for detail_key in keys_to_check:
            if record[detail_key] != "NO DATA" :
                processed[key][detail_key] = record[detail_key]
        ##Note : doesn't handle duplicate values for different keys so far


#Records are put back in list
output_data = [processed[x] for x in processed.keys()]
# -> to Pandas
df = pd.DataFrame(output_data)[['ods_service_ids','service_name', 'service_plan_name', 'CPU','RAM','NIC','DRIVE']]

#Export to Excel
df.to_excel("output.xlsx",sheet_name='Sheet_name_1', index=False)

上面的方法应该行得通，但是我不太确定您希望如何保存相同id的重复记录。您是否希望将它们存储为DRIVE_0、DRIVE_1、DRIVE_2？在

编辑：

df可以用不同的方式导出。将下面的#export to Excel替换为以下内容：

^{pr2}$

编辑2：

由于没有输入数据，很难看到任何数据流。用假数据更正了上面的代码

编辑：

编辑2：

相关问题更多 >

编程相关推荐

热门问题

热门文章