INSERT 命令不按顺序使用 sqlalchemy 和 mysql 处理大行
Posted
技术标签:
【中文标题】INSERT 命令不按顺序使用 sqlalchemy 和 mysql 处理大行【英文标题】:INSERT commands not in order using sqlalchemy and mysql for large rows 【发布时间】:2018-03-13 23:29:37 【问题描述】:我正在使用 sqlalchemy 写入 mysql 数据库,我在其中索引一些文件并存储它们的内容。我需要编写文件,然后将具有外键的索引条目写入files
表。但是,sqlalchemy 似乎发出了无序的INSERT
语句。
这是一个最小的功能示例,说明使用模拟随机数据(减去包含服务器特定信息的配置文件)的问题:
索引/ORM.py:
#!/bin/env python2.7
from __future__ import print_function
import os
from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.dialects.mysql import LONGBLOB, INTEGER
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy import create_engine
from Index import load_cfg
class Base(object):
"""
Basic MySQL table settings
"""
__table_args__ =
'mysql_engine': 'InnoDB',
'mysql_collate': 'latin1_general_cs'
Base = declarative_base(cls=Base)
class CoverageIndex(Base):
"""
Class for coverage_index table objects
"""
__tablename__ = 'coverage_index'
filename = Column(String(45), primary_key=True)
#filename = Column(String(45), ForeignKey("files.filename"), primary_key=True)
sequence_id = Column(String(45), primary_key=True, index=True)
def __init__(self, filename, sequence_id):
self.filename = filename
self.sequence_id = sequence_id
class FileRow(Base):
"""
Class for files stored in db
"""
__tablename__ = 'files'
filename = Column(String(45), primary_key=True)
contents = Column(LONGBLOB)
def __init__(self, filename, contents):
self.filename = filename
self.contents = contents
cfg = load_cfg()
db_string = 'mysql://%(user)s:%(passwd)s@%(host)s/%(db)s' % cfg['db_config']
engine = create_engine(db_string, echo=True)
Base.metadata.create_all(engine)
if __name__ == '__main__':
pass
index.py:
#!/usr/bin/env python2.7
from __future__ import print_function
import os
import sys
from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
from sqlalchemy.exc import IntegrityError
from Index.ORM import Base, CoverageIndex, FileRow, engine as db_engine
if __name__ == '__main__':
import string, random
data =
for i in range(0,10):
file = 'file' + str(i)
data[file] =
'seqs': ['seqa' + str(i), 'seqb' + str(i)],
'contents': '\n'.join([''.join([random.choice(string.letters) for x in range (0, 80)]) for y in range (0, 2500)])
#print (data)
Base.metadata.bind = db_engine
DBSession = sessionmaker(bind=db_engine)
session = DBSession()
for file, datum in data.iteritems():
file_query = session.query(FileRow).filter(FileRow.filename == file)
if file_query.count() > 0:
session.query(CoverageIndex).filter(CoverageIndex.filename == file).delete(synchronize_session='fetch')
file_query.delete(synchronize_session='fetch')
for i in datum['seqs']:
# Write to DB
fqc = file_query.count()
print ("No. of files: " + str(fqc))
if fqc == 0:
print ("Adding: ")
fr = FileRow(
filename = file,
contents = datum['contents']
)
session.add(fr)
cov = CoverageIndex(
filename = file,
sequence_id = i)
session.add(cov)
try:
session.commit()
except:
#print ("SQL Commit Failed: %s" % file)
session.rollback()
session.close()
raise
session.close()
这是一次运行的一部分输出。我想提请您注意2018-03-13 16:05:40,291
和...,292
行:
...
2018-03-13 16:05:40,287 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2018-03-13 16:05:40,288 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents
FROM files
WHERE files.filename = %s) AS anon_1
2018-03-13 16:05:40,288 INFO sqlalchemy.engine.base.Engine ('file1',)
2018-03-13 16:05:40,290 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents
FROM files
WHERE files.filename = %s) AS anon_1
2018-03-13 16:05:40,290 INFO sqlalchemy.engine.base.Engine ('file1',)
No. of files: 0
Adding:
2018-03-13 16:05:40,291 INFO sqlalchemy.engine.base.Engine INSERT INTO coverage_index (filename, sequence_id) VALUES (%s, %s)
2018-03-13 16:05:40,291 INFO sqlalchemy.engine.base.Engine ('file1', 'seqa1')
2018-03-13 16:05:40,292 INFO sqlalchemy.engine.base.Engine INSERT INTO files (filename, contents) VALUES (%s, %s)
2018-03-13 16:05:40,292 INFO sqlalchemy.engine.base.Engine ('file1', 'BkTsRJTcNEigPFjofFxDmwVZDXRAsPECawRUjiFZTDGWWoLZzLnGlCwQQeAFyXhLqKjPAJmme
mFNfVzF\nJlZSvwGAdoImTnBAmcrSdMRDvxNYnnMfbQXdfuXulqufiIYpqjFUgfElZSrVkvBvPTg ... (204700 characters truncated) ... trwtYOycEOuDTVxsXeGoNYKAqHlE
LGPqcimwzwAFAEsCZGBBnGzYMHgabgnGZaGmQsn\nSNjYvBwSVdXVKbmJpKdSHSXCDKKvDlkyLxOxsEfOtmlCRruqzaiPhYRocKZQEJSVrtSHncFMBMTEpWUX')
2018-03-13 16:05:40,310 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents
FROM files
WHERE files.filename = %s) AS anon_1
2018-03-13 16:05:40,310 INFO sqlalchemy.engine.base.Engine ('file1',)
No. of files: 1
2018-03-13 16:05:40,311 INFO sqlalchemy.engine.base.Engine INSERT INTO coverage_index (filename, sequence_id) VALUES (%s, %s)
2018-03-13 16:05:40,311 INFO sqlalchemy.engine.base.Engine ('file1', 'seqb1')
2018-03-13 16:05:40,312 INFO sqlalchemy.engine.base.Engine COMMIT
...
在这里,您可以看到 sqlalchemy 正在插入 coverage_index
在插入 files
对象之前。我认为这是因为文件对象更大并且需要一些时间来准备,因此引擎决定首先异步运行后面的INSERT
。
但是,files
条目需要先插入,因为coverage_index
中的filename
应该是files
的外键。 (如果我在定义外键约束的情况下执行此操作,则会引发异常)
我知道我可以在添加到 files
后提交,但我希望 files
和 coverage_index
INSERT
在同一个事务中,以便它们保持同步。
所以问题是,有没有办法强制 sqlalchemy 在事务中同步执行?
【问题讨论】:
【参考方案1】:不确定这是否是 最好的方法,但它似乎实现了我的目标:
flush(objects=None)
将所有对象更改刷新到数据库。
将所有挂起的对象创建、删除和修改写入数据库作为 INSERT、DELETE、UPDATE 等。操作由 Session 的工作单元依赖求解器自动排序。
数据库操作将在当前事务上下文中发出并且不会影响事务的状态,除非发生错误,在这种情况下整个事务被回滚。您可以在事务中尽可能频繁地刷新(),以将更改从 Python 移动到数据库的事务缓冲区
感谢:
Is SQLAlchemy saves order in adding objects to session?
http://www.aosabook.org/en/sqlalchemy.html - 第 20.9 节工作单元
【讨论】:
以上是关于INSERT 命令不按顺序使用 sqlalchemy 和 mysql 处理大行的主要内容,如果未能解决你的问题,请参考以下文章