将多个 DataFrame 附加到多个现有的 Excel 工作表

Posted

技术标签:

【中文标题】将多个 DataFrame 附加到多个现有的 Excel 工作表【英文标题】:Append multiple DataFrame to multiple existing excel sheets 【发布时间】:2021-11-21 17:15:18 【问题描述】:

我与多个送货和多个地址有关系。

我为每个区域(5 个区域)制作了一个数据透视表列表

在 jupyter notebook 中使用“for”,每个列表中的每个项目都显示为一个单独的数据透视表,一个在另一个之上,就像我需要的那样。

但是如何将它们保存在 5 张的 Excel 中?

我已经尝试了所有方法,只保存使用每个区域列表创建的最后一个数据透视,或者保存现有数据并删除所有数据。

我目前为每个区域创建了一个空电子表格,在 D 列第 1 行中只有一个标题。

expedition.xlsx(内有5张,北、东北、中西、东南、南)

当我尝试保存时,它最终会删除其他人并仅保留“北”

我设置了一个规则来识别D列是否有一个空单元格,如果它被填充,则通过向下跳过一行再试一次,如果它是空的,理论上它应该用列表的数据框填充。

in this image it is like jupyter notebook display, as I would like it to be saved in excel (one pivot below the other, with 2 spaces)

Using openpyxl I managed to make the rule work and fill the first column below the spacing with an example 'aaaaaa' without needing to delete the other sheets

如何将一个数据透视表填充到另一个下方?对于每个区域和列表项。

代码:https://pastebin.com/Cx3Zvf6D

import pandas as pd
import openpyxl
 
writer = pd.ExcelWriter("expedition.xlsx", engine='xlsxwriter')
 
# Creating the base sheet for each region, empty
pivot1 = pd.DataFrame('Lista de Romaneio para Região Norte':  [' '])
pivot2 = pd.DataFrame('Lista de Romaneio para Região Nordeste':  [' '])
pivot3 = pd.DataFrame('Lista de Romaneio para Região Centro Oeste':  [''])
pivot4 = pd.DataFrame('Lista de Romaneio para Região Sudeste':  [' '])
pivot5 = pd.DataFrame('Lista de Romaneio para Região Sul':  [' '])
 
# creating a sheet in the spreadsheet for each region, with the title in column D, row 1
pivot1.to_excel(writer, sheet_name='Norte', index=False, startcol=3, freeze_panes=(1,0))
pivot2.to_excel(writer, sheet_name='Nordeste', index=False, startcol=3, freeze_panes=(1,0))
pivot3.to_excel(writer, sheet_name='Centro Oeste', index=False, startcol=3, freeze_panes=(1,0))
pivot4.to_excel(writer, sheet_name='Sudeste', index=False, startcol=3, freeze_panes=(1,0))
pivot5.to_excel(writer, sheet_name='Sul', index=False, startcol=3, freeze_panes=(1,0))
 
writer.close()
 
# List with the "keys" of each pivot table for each region
norte = ['PA_BEL', 'TO_PMW', 'AC_RBR']
nordeste = ['AL_MCZ', 'PB_JPA', 'BA_SSA', 'RN_NAT', 'PE_REC', 'CE_FOR', 'MA_IMP', 'MA_THE', 'PI_THE', 'BA_FEC']
centro_oeste = ['GO_GYN', 'DF_BSB', 'GO_BSB', 'MT_CGB', 'MS_CGR']
sudeste = ['ES_SRR', 'MG_BHZ', 'SP_PNM', 'SP_JDI', 'RJ_RIO', 'MG_UDI']
sul = ['RS_POA', 'PR_CWB', 'SC_CCM', 'RS_RIA', 'SC_FLN']
 
 
# example for the north (norte) region
if len(norte) > 0:
    frete_expresso_norte = 0
    for filial in norte:
        # creating a pivot table for each flilial(key)
        pivot1 = df[df.Filial_Transportador == filial].pivot_table(
                         index=['BU', 'Sold to Region', 'Filial_Transportador', 'Sold_to_Name', 'Sold to City', 'Delivery'],
                         values=['Quantidade','Volume', 'Palete', 'Net Value'], aggfunc='sum',
                         margins=True)
        # reorders columns and renames ALL of pivot table to Total
        ordem_das_colunas = ['Quantidade', 'Volume', 'Palete', 'Net Value']
        pivot1 = pivot1[ordem_das_colunas].rename(index=dict(All='Total Romaneio'))    
        # creating subtotals and finding express shipping (if any)
        total_palete_norte = pivot1.groupby('Filial_Transportador')['Palete'].sum()[1]
        total_net_value_norte = pivot1.groupby('Filial_Transportador')['Net Value'].sum()[1]
        # save the pivot table (pivot1) in excel in the north sheet where it is blank
        # Here's where I want to put the code below saving the pivot table before starting the creation of the next one.
        # code under construction
        # after saving it continues normally
        if total_palete_norte >= 29 or total_net_value_norte >= 2500000:
            frete_expresso_norte = frete_expresso_norte + 1
        else:
            pass
        display(pivot1)
        
else:
    expedicao_norte = 'Não há volume para ser expedido à região Norte'
 
 
# Code under construction to insert inside the loop:
import openpyxl
 
# opening the spreadsheet with specific name
n = 0 # 0 = Norte / 1 = Nordeste / 2 = Centro Oeste / 3 = Sudeste / 4 = Sul
planilha_cx = openpyxl.load_workbook("Expedition.xlsx")
folhas = planilha_cx.sheetnames
folha = planilha_cx[folhas[n]]
 
# reading the cell
coluna = 4  # column D of the selected sheet
linha = 1  # start on the first line of the sheet
celula = folha.cell(linha, coluna).value
 
while celula != None:  # looping while cell in column D is not blank
    celula = folha.cell(linha, coluna).value  # cell current value
    
    if celula == None:  # filling the cell if it is blank
        linha = linha + 2
        folha.cell(row=linha, column=1).value = 'aaaaaaa'  # inserts the word 'aaaaa' but doesn't work with pivot1
        planilha_cx.save("Expedition.xlsx")
        break
                
    else:  # while cell D1 is not blank, add +1 to row
        linha = linha + 1
        pass

【问题讨论】:

【参考方案1】:

问题是,当加载到工作簿时,您正在为每个加载创建一个具有新工作表名称的新对象(您在某种程度上解决了这个问题)。 诀窍是让 python/pandas 知道,您正在使用不同的表格上传到同一本书。

尝试使用此代码:

from openpyxl import load_workbook

book = load_workbook(your_destination_file)
writer = pd.ExcelWriter(your_destination_file, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)  # tells 
pandas/python what the sheet names are

Your_dataframe.to_excel(writer, sheet_name=DesiredSheetname, startcol=3, 
freeze_panes=(1,0))

writer.save()

以上代码应该可以解决您的问题。

【讨论】:

感谢您的帮助,这应该至少可以解决一个问题。您知道如何将数据透视表保存到现有的 excel 文件中吗? Pandas 有一个可以使用的内置数据透视函数。直播对你来说重要吗?您可以考虑将您的 pandas 输出保存为 csv 文件,然后使用“获取数据”功能作为表格将它们加载到 excel 中。这样您就不会将框架写入 excel,而是将它们导入,并且您的所有图片/轴等仍然存在。 有趣,所以我可以为每个数据透视创建很多 csv 文件,它们将数据放入主 excel 文件并在之后删除它们(它将每天运行 1 次)。但首先我不想尝试这个内置的枢轴功能,它是如何工作的?我正在使用 pivot_table 函数创建枢轴,但我无法复制 我认为你最好选择后者。这样,您可以每天将 5 个 csv 文件转储到同一个位置,让它们覆盖旧文件,然后只需在 excel 中单击“全部刷新”,即可更新所有 5 个数据透视表。

以上是关于将多个 DataFrame 附加到多个现有的 Excel 工作表的主要内容,如果未能解决你的问题,请参考以下文章

将多个字典附加到 Pandas 数据框:错误 DataFrame 构造函数未正确调用?

将 pandas DataFrame 列附加到 CSV

避免 FOR 循环将多个字符串附加到列表中

pandas如何在现有的Excel表格上新建工作表并添加dataframe

附加 DataFrame 时出现“数据类型不理解”

如何将多维数组添加到现有的 Spark DataFrame