通过中间处理从 MS Excel 导出到 MS Access
Posted
技术标签:
【中文标题】通过中间处理从 MS Excel 导出到 MS Access【英文标题】:Exporting from MS Excel to MS Access with intermediate processing 【发布时间】:2013-04-04 12:22:37 【问题描述】:我有一个应用程序可以生成 Excel (.XLS) 格式的报告。我需要将这些报告中的数据附加到 MS Access 2010 数据库中的现有表中。一个典型的记录是:
INC000000004154 Closed Cbeebies BBC Childrens HQ6 monitor wall dropping out. HQ6 P3 3/7/2013 7:03:01 PM 3/7/2013 7:03:01 PM 3/7/2013 7:14:15 PM The root cause of the problem was the power supply to the PC which was feeding the monitor. HQ6 Monitor wall dropping out. BBC Third Party Contractor supply this equipment.
复杂之处在于我需要对数据进行一些有限的处理。即
具体来说,我需要进行几次查找,将名称转换为数字并解析日期字符串(报告出于某种原因将日期以文本格式而不是日期格式放入电子表格中)。
现在我可以使用 XLRD/XLWT 在 Python 中执行此操作,但更喜欢在 Excel 或 Access 中执行此操作。有没有人对解决这个问题的好方法有任何建议?我非常希望不使用 VBA,所以我可以做一些事情,比如记录一个 MS Excel 宏,然后在新创建的 XLS 文件上执行该宏吗?
【问题讨论】:
嗯,如果您“录制 MS Excel 宏”,您就是在“使用 VBA”。请解释为什么您“非常不愿意使用 VBA”。 因为我过去曾尝试过用它编程,但它非常可怕。如果宏自动为我生成工作代码,那很好,但是结合 FORTRAN II 和德鲁伊符文进行编程的想法并不吸引人。这解释清楚了吗? 是的,谢谢。我怀疑这只是“语言势利”的案例,但我想确定一下,因为您说您“更愿意在 Excel 或 Access 中执行此操作”。根据您对所需处理的描述,VBA 可以轻松处理它。 (真的,这不是我们在这里谈论的“火箭科学”。)但是,如果 VBA 冒犯了您的敏感感受,那么也许您应该继续寻找替代方案。 没有。我并不骄傲。我什至用 FORTH 编程过。但是 VB 并不能证明学习曲线是合理的,因为它的不一致和限制以及一般的粗糙以及糟糕的文档和调试工具相结合。不过谢谢你的提问。 【参考方案1】:您可以直接将一些 Excel 数据导入 MS Access,但如果您的要求是进行一些处理,因为那我不明白如果没有您将如何实现:
ETL 应用程序,例如 Pentaho 或 Talend 或 others。 但这肯定就像用锤子压碎蚂蚁一样。
其他一些外部数据处理管道,使用 Python 或其他一些编程语言。
VBA(通过宏或手动编码)。 几十年来,VBA 一直非常擅长在 Access 中做这类事情。 由于您使用的是 Excel 和 Access,因此停留在该领域似乎是解决您的问题的最佳解决方案。
只需使用查询: 您无需转换即可将数据导入一个表,该表的唯一目的是容纳来自 Excel 的数据;然后您从该原始数据创建查询以添加缺失的信息并在将结果附加到最终目标表之前对数据进行处理。 该解决方案的优点是让您可以在 Access 中创建简单的步骤,您可以使用宏轻松记录这些步骤。
【讨论】:
【参考方案2】:我前段时间问过这个问题,并决定用 Python 来做会更容易。戈德让我分享,就在这里(抱歉耽搁了,其他项目暂时优先)。
"""
Routine to migrate the S7 data from mysql to the new Access
database.
We're using the pyodbc libraries to connect to Microsoft Access
Note that there are 32- and 64-bit versions of these libraries
available but in order to work the word-length for pyodbc and by
implication Python and all its associated compiled libraries must
match that of MS Access. Which is an arse as I've just had to
delete my 64-bit installation of Python and replace it and all
the libraries with the 32-bit version.
Tim Greening-Jackson 08 May 2013 (timATgreening-jackson.com)
"""
import pyodbc
import re
import datetime
import tkFileDialog
from Tkinter import *
class S7Incident:
"""
Class containing the records downloaded from the S7.INCIDENTS table
"""
def __init__(self, id_incident, priority, begin, acknowledge,
diagnose, workaround,fix, handoff, lro, nlro,
facility, ctas, summary, raised, code):
self.id_incident=unicode(id_incident)
self.priority = u'P1':1, u'P2':2, u'P3':3, u'P4':4, u'P5':5 [unicode(priority.upper())]
self.begin = begin
self.acknowledge = acknowledge
self.diagnose = diagnose
self.workaround = workaround
self.fix = fix
self.handoff = True if handoff else False
self.lro = True if lro else False
self.nlro = True if nlro else False
self.facility = unicode(facility)
self.ctas = ctas
self.summary = "** NONE ***" if type(summary) is NoneType else summary.replace("'","")
self.raised = raised.replace("'","")
self.code = 0 if code is None else code
self.production = None
self.dbid = None
def __repr__(self):
return "[] ID: P Prod: Begin: A: D:+s W:+s F:+s\nH/O: LRO: NLRO: Facility= CTAS=\nSummary:'',Raised:'',Code:".format(
self.id_incident,self.dbid, self.priority, self.production, self.begin,
self.acknowledge, self.diagnose, self.workaround, self.fix,
self.handoff, self.lro, self.nlro, self.facility, self.ctas,
self.summary, self.raised, self.code)
def ProcessIncident(self, cursor, facilities, productions):
"""
Produces the SQL necessary to insert the incident in to the Access
database, executes it and then gets the autonumber ID (dbid) of the newly
created incident (this is used so LRO, NRLO CTAS and AD1 can refer to
their parent incident.
If the incident is classed as LRO, NLRO, CTAS then the appropriate
record is created. Returns the dbid.
"""
if self.raised.upper() in productions:
self.production = productions[self.raised.upper()]
else:
self.production = 0
sql="""INSERT INTO INCIDENTS (ID_INCIDENT, PRIORITY, FACILITY, BEGIN,
ACKNOWLEDGE, DIAGNOSE, WORKAROUND, FIX, HANDOFF, SUMMARY, RAISED, CODE, PRODUCTION)
VALUES ('', , , ##, , , , , , '', '', , )
""".format(self.id_incident, self.priority, facilities[self.facility], self.begin,
self.acknowledge, self.diagnose, self.workaround, self.fix,
self.handoff, self.summary, self.raised, self.code, self.production)
cursor.execute(sql)
cursor.execute("SELECT @@IDENTITY")
self.dbid = cursor.fetchone()[0]
if self.lro:
self.ProcessLRO(cursor, facilities[self.facility])
if self.nlro:
self.ProcessNLRO(cursor, facilities[self.facility])
if self.ctas:
self.ProcessCTAS(cursor, facilities[self.facility], self.ctas)
return self.dbid
def ProcessLRO(self, cursor, facility):
sql = "INSERT INTO LRO (PID, DURATION, FACILITY) VALUES (, , )"\
.format(self.dbid, self.workaround, facility)
cursor.execute(sql)
def ProcessNLRO(self, cursor, facility):
sql = "INSERT INTO NLRO (PID, DURATION, FACILITY) VALUES (, , )"\
.format(self.dbid, self.workaround, facility)
cursor.execute(sql)
def ProcessCTAS(self, cursor, facility, code):
sql = "INSERT INTO CTAS (PID, DURATION, FACILITY, CODE) VALUES (, , , )"\
.format(self.dbid, self.workaround, facility, self.ctas)
cursor.execute(sql)
class S7AD1:
"""
S7.AD1 records.
"""
def __init__(self, id_ad1, date, ref, commentary, adjustment):
self.id_ad1 = id_ad1
self.date = date
self.ref = unicode(ref)
self.commentary = unicode(commentary)
self.adjustment = float(adjustment)
self.pid = 0
self.production = 0
def __repr__(self):
return "[] Date: Parent: PID: Amount: Commentary: "\
.format(self.id_ad1, self.date.strftime("%d/%m/%y"), self.ref, self.pid, self.adjustment, self.commentary)
def SetPID(self, pid):
self.pid = pid
def SetProduction(self, p):
self.production = p
def Process(self, cursor):
sql = "INSERT INTO AD1 (pid, begin, commentary, production, adjustment) VALUES (, ##, '', , )"\
.format(self.pid, self.date.strftime("%d/%m/%y"), self.commentary, self.production, self.adjustment)
cursor.execute(sql)
class S7Financial:
"""
S7 monthly financial summary of income and penalties from S7.FINANCIALS table.
These are identical in the new database
"""
def __init__(self, month, year, gco, cta, support, sc1, sc2, sc3, ad1):
self.begin = datetime.date(year, month, 1)
self.gco = float(gco)
self.cta = float(cta)
self.support = float(support)
self.sc1 = float(sc1)
self.sc2 = float(sc2)
self.sc3 = float(sc3)
self.ad1 = float(ad1)
def __repr__(self):
return "Period: GCO::.2f CTA::.2f SUP::.2f SC1::.2f SC2::.2f SC3::.2f AD1::.2f"\
.format(self.start.strftime("%m/%y"), self.gco, self.cta, self.support, self.sc1, self.sc2, self.sc3, self.ad1)
def Process(self, cursor):
"""
Insert in to FINANCIALS table
"""
sql = "INSERT INTO FINANCIALS (BEGIN, GCO, CTA, SUPPORT, SC1, SC2, SC3, AD1) VALUES (##, , , , , , ,)"\
.format(self.begin, self.gco, self.cta, self.support, self.sc1, self.sc2, self.sc3, self.ad1)
cursor.execute(sql)
class S7SC3:
"""
Miscellaneous S7 SC3 stuff. The new table is identical to the old one.
"""
def __init__(self, begin, month, year, p1ot, p2ot, totchg, succchg, chgwithinc, fldchg, egychg):
self.begin = begin
self.p1ot = p1ot
self.p2ot = p2ot
self.changes = totchg
self.successful = succchg
self.incidents = chgwithinc
self.failed = fldchg
self.emergency = egychg
def __repr__(self):
return " P1: P2: CHG: SUC: INC: FLD: EGY:"\
.format(self.period.strftime("%m/%y"), self.p1ot, self.p1ot, self.changes, self.successful, self.incidents, self.failed, self.emergency)
def Process(self, cursor):
"""
Inserts a record in to the Access database
"""
sql = "INSERT INTO SC3 (BEGIN, P1OT, P2OT, CHANGES, SUCCESSFUL, INCIDENTS, FAILED, EMERGENCY) VALUES\
(##, , , , , , , )"\
.format(self.begin, self.p1ot, self.p2ot, self.changes, self.successful, self.incidents, self.failed, self.emergency)
cursor.execute(sql)
def ConnectToAccessFile():
"""
Prompts the user for an Access database file, connects, creates a cursor,
cleans out the tables which are to be replaced, gets a hash of the facilities
table keyed on facility name returning facility id
"""
# Prompts the user to select which Access DB file he wants to use and then attempts to connect
root = Tk()
dbname = tkFileDialog.askopenfilename(parent=root, title="Select output database", filetypes=[('Access 2010', '*.accdb')])
root.destroy()
# Connect to the Access (new) database and clean its existing incidents etc. tables out as
# these will be replaced with the new data
dbcxn = pyodbc.connect("Driver=Microsoft Access Driver (*.mdb, *.accdb);DBQ="+dbname+";")
dbcursor=dbcxn.cursor()
print("Connected to ".format(dbname))
for table in ["INCIDENTS", "AD1", "LRO", "NLRO", "CTAS", "SC3", "PRODUCTIONS", "FINANCIALS"]:
print("Clearing table ...".format(table))
dbcursor.execute("DELETE * FROM ".format(table))
# Get the list of facilities from the Access database...
dbcursor.execute("SELECT id, facility FROM facilities")
rows = dbcursor.fetchall()
dbfacilities = unicode(row[1]):row[0] for row in rows
return dbcxn, dbcursor, dbfacilities
# Entry point
incre = re.compile("INC\d12[A-Z]?") # Regex that matches incident references
try:
dbcxn, dbcursor, dbfacilities = ConnectToAccessFile()
# Connect to the MySQL S7 (old) database and read the incidents and ad1 tables
s7cxn = pyodbc.connect("DRIVER=MySQL ODBC 3.51 Driver; SERVER=localhost;DATABASE=s7; UID=root; PASSWORD=********; OPTION=3")
print("Connected to MySQL S7 database")
s7cursor = s7cxn.cursor()
s7cursor.execute("""
SELECT id_incident, priority, begin, acknowledge,
diagnose, workaround, fix, handoff, lro, nlro,
facility, ctas, summary, raised, code FROM INCIDENTS""")
rows = s7cursor.fetchall()
# Discard any incidents which don't have a reference of the form INC... as they are ancient
print("Fetching incidents")
s7incidents = unicode(row[0]):S7Incident(*row) for row in rows if incre.match(row[0])
# Get the list of productions from the S7 database to replace the one we've just deleted ...
print("Fetching productions")
s7cursor.execute("SELECT DISTINCT RAISED FROM INCIDENTS")
rows = s7cursor.fetchall()
s7productions = [r[0] for r in rows]
# ... now get the AD1s ...
print("Fetching AD1s")
s7cursor.execute("SELECT id_ad1, date, ref, commentary, adjustment from AD1")
rows = s7cursor.fetchall()
s7ad1s = [S7AD1(*row) for row in rows]
# ... and the financial records ...
print("Fetching Financials")
s7cursor.execute("SELECT month, year, gco, cta, support, sc1, sc2, sc3, ad1 FROM Financials")
rows = s7cursor.fetchall()
s7financials = [S7Financial(*row) for row in rows]
print("Writing financials ()".format(len(s7financials)))
[p.Process(dbcursor) for p in s7financials]
# ... and the SC3s.
print("Fetching SC3s")
s7cursor.execute("SELECT begin, month, year, p1ot, p2ot, totchg, succhg, chgwithinc, fldchg, egcychg from SC3")
rows = s7cursor.fetchall()
s7sc3s = [S7SC3(*row) for row in rows]
print("Writing SC3s ()".format(len(s7sc3s)))
[p.Process(dbcursor) for p in s7sc3s]
# Re-create the productions table in the new database. Note we refer to production
# by number in the incidents table so need to do the SELECT @@IDENTITY to give us the
# autonumber index. To make sure everything is case-insensitive convert the
# hash keys to UPPERCASE.
dbproductions =
print("Writing productions ()".format(len(s7productions)))
for p in sorted(s7productions):
dbcursor.execute("INSERT INTO PRODUCTIONS (PRODUCTION) VALUES ('')".format(p))
dbcursor.execute("SELECT @@IDENTITY")
dbproductions[p.upper()] = dbcursor.fetchone()[0]
# Now process the incidents etc. that we have retrieved from the S7 database
print("Writing incidents ()".format(len(s7incidents)))
[s7incidents[k].ProcessIncident(dbcursor, dbfacilities, dbproductions) for k in sorted(s7incidents)]
# Match the new parent incident IDs in the AD1s and then write to the new table. Some
# really old AD1s don't have the parent incident reference in the REF field, it is just
# mentioned SOMEWHERE in the commentary. So if the REF field doesn't match then do a
# re.search (not re.match!) for it. It isn't essential to match these older AD1s with
# their parent incident, but it is quite useful (and tidy).
print("Matching and writing AD1s".format(len(s7ad1s)))
for a in s7ad1s:
if a.ref in s7incidents:
a.SetPID(s7incidents[a.ref].dbid)
a.SetProduction(s7incidents[a.ref].production)
else:
z=incre.search(a.commentary)
if z and z.group() in s7incidents:
a.SetPID(s7incidents[z.group()].dbid)
a.SetProduction(s7incidents[z.group()].production)
a.Process(dbcursor)
print("Comitting changes")
dbcursor.commit()
finally:
print("Closing databases")
dbcxn.close()
s7cxn.close()
【讨论】:
【参考方案3】:事实证明,该文件在数据损坏方面存在额外的复杂性,这需要一定程度的处理,这在 Excel 中很难做到,但在 Python 中却很简单。所以我将重用一些 Python 2.x 脚本,这些脚本使用 XLWT/XLRD 库来处理电子表格。
【讨论】:
我认为自己非常熟悉 VBA,并且正在努力提高我的 Python 技能。如果您使用一个(或两个)您需要做的转换类型的示例(或两个)来编辑您的答案,这将是对 Stack Overflow 的一个很好的补充(我将不胜感激),这在 Python 中“非常简单”,但是将是“在 Excel 中做的痛苦”。谢谢! 我确信只要具备适当的 VBA 知识,它就会很简单。但正如我所提到的,我只是没有时间(或意愿)来爬上那个特定的学习曲线。希望在英国时间今天晚些时候完成后,我会发布代码。以上是关于通过中间处理从 MS Excel 导出到 MS Access的主要内容,如果未能解决你的问题,请参考以下文章