INSERT 命令不按顺序使用 sqlalchemy 和 mysql 处理大行

Posted

技术标签:

【中文标题】INSERT 命令不按顺序使用 sqlalchemy 和 mysql 处理大行【英文标题】:INSERT commands not in order using sqlalchemy and mysql for large rows 【发布时间】:2018-03-13 23:29:37 【问题描述】:

我正在使用 sqlalchemy 写入 mysql 数据库,我在其中索引一些文件并存储它们的内容。我需要编写文件,然后将具有外键的索引条目写入files 表。但是,sqlalchemy 似乎发出了无序的INSERT 语句。

这是一个最小的功能示例,说明使用模拟随机数据(减去包含服务器特定信息的配置文件)的问题:

索引/ORM.py:

#!/bin/env python2.7

from __future__ import print_function

import os

from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.dialects.mysql import LONGBLOB, INTEGER
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy import create_engine

from Index import load_cfg

class Base(object):
    """
    Basic MySQL table settings
    """
    __table_args__ = 
            'mysql_engine': 'InnoDB',
            'mysql_collate': 'latin1_general_cs'
            

Base = declarative_base(cls=Base)

class CoverageIndex(Base):
    """
    Class for coverage_index table objects
    """
    __tablename__ = 'coverage_index'

    filename = Column(String(45), primary_key=True)
    #filename = Column(String(45), ForeignKey("files.filename"), primary_key=True)
    sequence_id = Column(String(45), primary_key=True, index=True)

    def __init__(self, filename, sequence_id):
        self.filename = filename
        self.sequence_id = sequence_id

class FileRow(Base):
    """
    Class for files stored in db
    """
    __tablename__ = 'files'

    filename = Column(String(45), primary_key=True)
    contents = Column(LONGBLOB)

    def __init__(self, filename, contents):
        self.filename = filename
        self.contents = contents

cfg = load_cfg()
db_string = 'mysql://%(user)s:%(passwd)s@%(host)s/%(db)s' % cfg['db_config']
engine = create_engine(db_string, echo=True)
Base.metadata.create_all(engine)

if __name__ == '__main__':
    pass

index.py:

#!/usr/bin/env python2.7

from __future__ import print_function

import os
import sys

from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
from sqlalchemy.exc import IntegrityError

from Index.ORM import Base, CoverageIndex, FileRow, engine as db_engine

if __name__ == '__main__':
    import string, random

    data = 
    for i in range(0,10):
        file = 'file' + str(i)
        data[file] = 
                'seqs': ['seqa' + str(i), 'seqb' + str(i)],
                'contents': '\n'.join([''.join([random.choice(string.letters) for x in range (0, 80)]) for y in range (0, 2500)])
    #print (data)

    Base.metadata.bind = db_engine

    DBSession = sessionmaker(bind=db_engine)
    session = DBSession()

    for file, datum in data.iteritems():
        file_query = session.query(FileRow).filter(FileRow.filename == file)
        if file_query.count() > 0:
            session.query(CoverageIndex).filter(CoverageIndex.filename == file).delete(synchronize_session='fetch')
            file_query.delete(synchronize_session='fetch')
        for i in datum['seqs']: 
            # Write to DB
            fqc = file_query.count() 
            print ("No. of files: " + str(fqc))
            if fqc == 0:
                print ("Adding: ")
                fr = FileRow(
                        filename = file,
                        contents = datum['contents']
                        )
                session.add(fr)
            cov = CoverageIndex(
                    filename = file, 
                    sequence_id = i) 
            session.add(cov)
        try:
            session.commit()
        except:
            #print ("SQL Commit Failed: %s" % file)
            session.rollback()
            session.close()
            raise
    session.close()

这是一次运行的一部分输出。我想提请您注意2018-03-13 16:05:40,291...,292 行:

...
2018-03-13 16:05:40,287 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)                                                                    
2018-03-13 16:05:40,288 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1                                                          
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents                                                                
FROM files                                                                                                                                     
WHERE files.filename = %s) AS anon_1                                                                                                           
2018-03-13 16:05:40,288 INFO sqlalchemy.engine.base.Engine ('file1',)                                                                          
2018-03-13 16:05:40,290 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1                                                          
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents                                                                
FROM files                                                                                                                                     
WHERE files.filename = %s) AS anon_1                                                                                                           
2018-03-13 16:05:40,290 INFO sqlalchemy.engine.base.Engine ('file1',)                                                                          
No. of files: 0                                                                                                                                
Adding:                                                                                                                                        
2018-03-13 16:05:40,291 INFO sqlalchemy.engine.base.Engine INSERT INTO coverage_index (filename, sequence_id) VALUES (%s, %s)                  
2018-03-13 16:05:40,291 INFO sqlalchemy.engine.base.Engine ('file1', 'seqa1')                                                                  
2018-03-13 16:05:40,292 INFO sqlalchemy.engine.base.Engine INSERT INTO files (filename, contents) VALUES (%s, %s)                              
2018-03-13 16:05:40,292 INFO sqlalchemy.engine.base.Engine ('file1', 'BkTsRJTcNEigPFjofFxDmwVZDXRAsPECawRUjiFZTDGWWoLZzLnGlCwQQeAFyXhLqKjPAJmme
mFNfVzF\nJlZSvwGAdoImTnBAmcrSdMRDvxNYnnMfbQXdfuXulqufiIYpqjFUgfElZSrVkvBvPTg ... (204700 characters truncated) ... trwtYOycEOuDTVxsXeGoNYKAqHlE
LGPqcimwzwAFAEsCZGBBnGzYMHgabgnGZaGmQsn\nSNjYvBwSVdXVKbmJpKdSHSXCDKKvDlkyLxOxsEfOtmlCRruqzaiPhYRocKZQEJSVrtSHncFMBMTEpWUX')                    
2018-03-13 16:05:40,310 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1                                                          
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents                                                                
FROM files                                                                                                                                     
WHERE files.filename = %s) AS anon_1                                                                                                           
2018-03-13 16:05:40,310 INFO sqlalchemy.engine.base.Engine ('file1',)                                                                          
No. of files: 1                                                                                                                                
2018-03-13 16:05:40,311 INFO sqlalchemy.engine.base.Engine INSERT INTO coverage_index (filename, sequence_id) VALUES (%s, %s)                  
2018-03-13 16:05:40,311 INFO sqlalchemy.engine.base.Engine ('file1', 'seqb1')                                                                  
2018-03-13 16:05:40,312 INFO sqlalchemy.engine.base.Engine COMMIT       
...

在这里,您可以看到 sqlalchemy 正在插入 coverage_index 插入 files 对象之前。我认为这是因为文件对象更大并且需要一些时间来准备,因此引擎决定首先异步运行后面的INSERT

但是,files 条目需要先插入,因为coverage_index 中的filename 应该是files 的外键。 (如果我在定义外键约束的情况下执行此操作,则会引发异常)

我知道我可以在添加到 files 后提交,但我希望 filescoverage_index INSERT 在同一个事务中,以便它们保持同步。

所以问题是,有没有办法强制 sqlalchemy 在事务中同步执行?

【问题讨论】:

【参考方案1】:

不确定这是否是 最好的方法,但它似乎实现了我的目标:

flush(objects=None)

将所有对象更改刷新到数据库。

将所有挂起的对象创建、删除和修改写入数据库作为 INSERT、DELETE、UPDATE 等。操作由 Session 的工作单元依赖求解器自动排序。

数据库操作将在当前事务上下文中发出并且不会影响事务的状态,除非发生错误,在这种情况下整个事务被回滚。您可以在事务中尽可能频繁地刷新(),以将更改从 Python 移动到数据库的事务缓冲区

感谢:

Is SQLAlchemy saves order in adding objects to session?

http://www.aosabook.org/en/sqlalchemy.html - 第 20.9 节工作单元

【讨论】:

以上是关于INSERT 命令不按顺序使用 sqlalchemy 和 mysql 处理大行的主要内容,如果未能解决你的问题,请参考以下文章

Codeigniter 限制和偏移分页不按降序工作

睡眠功能不按顺序工作

为啥 plotly 条形图在 R 中不按顺序使用我指定的颜色,我如何强制它按顺序使用我的颜色?

Lua脚本不按顺序执行

链式期货不按顺序执行

表格视图从 NSDictionary 加载,但不按字母顺序