mysql table is full error running query with pymysql

Posted

技术标签:

【中文标题】mysql table is full error running query with pymysql【英文标题】:mysql the table is full error running query with pymysql 【发布时间】:2021-11-10 17:21:40 【问题描述】:

我正在使用 pymysql 来查询 mysql 数据库中的 mysql 表。 mysql 数据库位于远程 ubuntu 服务器上。代码运行良好,但最近出现错误:

(1114, "The table '/tmp/#sql1422a_19_46' is full")

我检查了 ubuntu 服务器上的磁盘空间(如下所示)。我的三个目录上有足够的可用存储空间。似乎其他一些目录可能已填满。我有每天运行的 cron 作业并更新 mysql 数据库中的其他表。他们还在更新就好了。有谁看到问题可能是什么并建议如何解决它?桌子不应该那么大。

代码:df -h

输出:

Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs           3.2G  1.8M  3.2G   1% /run
/dev/nvme0n1p2  228G  215G  1.2G 100% /
tmpfs            16G   12K   16G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/nvme0n1p1  511M  6.7M  505M   2% /boot/efi
/dev/sda1       458G   73M  435G   1% /mnt/data/storage3_500gb
/dev/sdb1       916G   77M  870G   1% /mnt/data/storage1_1tb
/dev/nvme1n1    916G   77M  870G   1% /mnt/data/nvme0n1
/dev/sdc1       916G   77M  870G   1% /mnt/data/storage2_1tb
/dev/sdd1       916G  109G  761G  13% /mnt/data/sda
tmpfs           3.2G     0  3.2G   0% /run/user/1000
/dev/loop1      100M  100M     0 100% /snap/core/11420
/dev/loop0      100M  100M     0 100% /snap/core/11606

代码:

import pandas as pd
import numpy as np

import os

import re, dateutil.parser
#BeautifulSoup provide a model for the source html
# from bs4 import BeautifulSoup

import time


import decimal as dc
from bs4 import BeautifulSoup as bs
import time

import json
import urllib.parse

import requests

import datetime

import logging

import pymysql

import glob

from sqlalchemy import create_engine


# function to query mysql db and return dataframe of results
def mysql_query(user,password,database,host,query):
    
    connection = pymysql.connect(user=user, password=password, database=database, host=host)


    try:
        with connection.cursor() as cursor:
            query = query


        df = pd.read_sql(query, connection)
        
        logging.info('query succeeded: '+query)
        
#     finally:
        connection.close()
        
        logging.info('close connection mysql')
        
        return df

    except Exception as err:
        
        logger.error('query failed: '+query+' got error: '+str(err))
        
        
        
    pass



# creating zillow_latest

zillow_latest_query="""with zillow_latest as
(
select
distinct 
zpid,
last_updated,
first_value(providerListingId) over (partition by zpid order by last_updated desc) as providerListingId,
first_value(imgSrc) over (partition by zpid order by last_updated desc) as imgSrc,
first_value(hasImage) over (partition by zpid order by last_updated desc) as hasImage,
first_value(detailUrl) over (partition by zpid order by last_updated desc) as detailUrl,
first_value(statusType) over (partition by zpid order by last_updated desc) as statusType,
first_value(statusText) over (partition by zpid order by last_updated desc) as statusText,
first_value(countryCurrency) over (partition by zpid order by last_updated desc) as countryCurrency,
first_value(price) over (partition by zpid order by last_updated desc) as price,
first_value(unformattedPrice) over (partition by zpid order by last_updated desc) as unformattedPrice,
first_value(address) over (partition by zpid order by last_updated desc) as address,
first_value(addressStreet) over (partition by zpid order by last_updated desc) as addressStreet,
first_value(addressCity) over (partition by zpid order by last_updated desc) as addressCity,
first_value(addressState) over (partition by zpid order by last_updated desc) as addressState,
first_value(addressZipcode) over (partition by zpid order by last_updated desc) as addressZipcode,
first_value(isUndisclosedAddress) over (partition by zpid order by last_updated desc) as isUndisclosedAddress,
first_value(beds) over (partition by zpid order by last_updated desc) as beds,
first_value(baths) over (partition by zpid order by last_updated desc) as baths,
first_value(area) over (partition by zpid order by last_updated desc) as area,
first_value(latLong) over (partition by zpid order by last_updated desc) as latLong,
first_value(isZillowOwned) over (partition by zpid order by last_updated desc) as isZillowOwned,
first_value(variableData) over (partition by zpid order by last_updated desc) as variableData,
first_value(badgeInfo) over (partition by zpid order by last_updated desc) as badgeInfo,
first_value(hdpData) over (partition by zpid order by last_updated desc) as hdpData,
first_value(isSaved) over (partition by zpid order by last_updated desc) as isSaved,
first_value(isUserClaimingOwner) over (partition by zpid order by last_updated desc) as isUserClaimingOwner,
first_value(isUserConfirmedClaim) over (partition by zpid order by last_updated desc) as isUserConfirmedClaim,
first_value(pgapt) over (partition by zpid order by last_updated desc) as pgapt,
first_value(sgapt) over (partition by zpid order by last_updated desc) as sgapt,
first_value(zestimate) over (partition by zpid order by last_updated desc) as zestimate,
first_value(shouldShowZestimateAsPrice) over (partition by zpid order by last_updated desc) as shouldShowZestimateAsPrice,
first_value(has3DModel) over (partition by zpid order by last_updated desc) as has3DModel,
first_value(hasVideo) over (partition by zpid order by last_updated desc) as hasVideo,
first_value(isHomeRec) over (partition by zpid order by last_updated desc) as isHomeRec,
first_value(hasAdditionalAttributions) over (partition by zpid order by last_updated desc) as hasAdditionalAttributions,
first_value(isFeaturedListing) over (partition by zpid order by last_updated desc) as isFeaturedListing,
first_value(list) over (partition by zpid order by last_updated desc) as list,
first_value(relaxed) over (partition by zpid order by last_updated desc) as relaxed,
first_value(hasOpenHouse) over (partition by zpid order by last_updated desc) as hasOpenHouse,
first_value(openHouseStartDate) over (partition by zpid order by last_updated desc) as openHouseStartDate,
first_value(openHouseEndDate) over (partition by zpid order by last_updated desc) as openHouseEndDate,
first_value(openHouseDescription) over (partition by zpid order by last_updated desc) as openHouseDescription,
first_value(builderName) over (partition by zpid order by last_updated desc) as builderName,
first_value(info3String) over (partition by zpid order by last_updated desc) as info3String,
first_value(brokerName) over (partition by zpid order by last_updated desc) as brokerName,
first_value(lotAreaString) over (partition by zpid order by last_updated desc) as lotAreaString,
first_value(streetViewMetadataURL) over (partition by zpid order by last_updated desc) as streetViewMetadataURL,
first_value(streetViewURL) over (partition by zpid order by last_updated desc) as streetViewURL,
first_value(info2String) over (partition by zpid order by last_updated desc) as info2String,
first_value(info6String) over (partition by zpid order by last_updated desc) as info6String
from realestate.zillow
),
distinct_values as(
select
distinct
zpid,
providerListingId,
imgSrc,
hasImage,
detailUrl,
statusType,
statusText,
countryCurrency,
price,
unformattedPrice,
address,
addressStreet,
addressCity,
addressState,
addressZipcode,
isUndisclosedAddress,
beds,
baths,
area,
latLong,
isZillowOwned,
variableData,
badgeInfo,
hdpData,
isSaved,
isUserClaimingOwner,
isUserConfirmedClaim,
pgapt,
sgapt,
zestimate,
shouldShowZestimateAsPrice,
has3DModel,
hasVideo,
isHomeRec,
hasAdditionalAttributions,
isFeaturedListing,
list,
relaxed,
hasOpenHouse,
openHouseStartDate,
openHouseEndDate,
openHouseDescription,
builderName,
info3String,
brokerName,
lotAreaString,
streetViewMetadataURL,
streetViewURL,
info2String,
info6String
from zillow_latest
)
select * from distinct_values"""

zillow_latest_df=mysql_query(user='username',
                            password='xxx',
                            database='realestate',
                            host='xxxxxxx',
                            query=zillow_latest_query)

错误:

(1114, "The table '/tmp/#sql1422a_19_46' is full")

【问题讨论】:

我看到一些东西表明临时表有 16MB 的限制。这不是官方的 MySQL 页面,所以我没有链接它。您可以尝试在此答案中运行查询以明确确定限制:***.com/a/36832492/42346 @mechanical_meat 临时表 in RAM 受限于 tmp_table_sizemax_heap_table_size,以较小者为准。 MySQL 8.0 中两者的默认值都是 16MB。但是临时表可以比这大得多,它们只需要持久化到磁盘。 @BillKarwin:非常感谢您的澄清和回答。 【参考方案1】:

好吧,/tmp 上的文件将使用您的根文件系统,因为您没有在 /tmp 上安装单独的文件系统。

Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  228G  215G  1.2G 100% /

对我来说,它似乎快满了。因此 MySQL 创建的大型临时表可以填满它。

即使您没有明确请求,查询也可能会创建临时表。例如,某些使用GROUP BYUNION 的查询、生成派生表的子查询、公用表表达式和某些类型的视图可以创建临时表。请参阅https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html 了解更多详情。

在您的情况下,您使用的是公用表表达式,因此它必然会创建一个临时表。

临时表当然可以大于 16MB(尽管上面有评论)。磁盘上临时表的大小由行数和每行的大小决定。

在几年前的一份咨询合同中,客户说他们不断收到有关他们的文件系统空间不足的警告。但是当他们去看时,它有 6 GB 的空闲空间。我帮助他们分析了他们的 MySQL 查询日志,发现他们偶尔有至少 4 个并发查询,每个查询使用 1.5GB 的临时表空间。但是一旦查询完成,临时表就会消失,所以当他们去检查服务器时,这些查询很可能不再运行了。

如何解决这个问题?

有几种补救措施。您可能需要使用不止一种:

通过使用索引优化搜索来减少检查的行数。 减少表中的总行数,因此即使您无法使用索引进行优化,查询要检查的行数也会减少。 将 MySQL 的 tmpdir 选项配置到另一个具有更多空间的文件系统。 清理tmpdir引用的文件系统上的一些不需要的数据。

【讨论】:

感谢您就此事回复我。我注意到,如果我在内部查询中过滤 realestate.zillow 仅过去 7 天,它将在 mysql 中完成,但是当我尝试将相同的查询作为 python 管道的一部分运行时,它只有在我过滤它时才会完成1天。这是否意味着我有两个问题,一个是 mysql 的空间不足,另一个是 python 的空间不足?如果是这样,你能建议如何解决这两个问题吗?我也尝试编辑我的 /etc/mysql/my.cnf 以添加 tmpdir = /mnt/data/sda/user_storage/username/mysql_tmp/ 但它在重新启动时抛出错误 我猜不出你遇到了什么错误。 Python 不像 MySQL 那样将临时表存储在磁盘上。如果 Python 收到一个结果集,它会将其存储在 RAM 中。

以上是关于mysql table is full error running query with pymysql的主要内容,如果未能解决你的问题,请参考以下文章

mysql插入数据出现error1114 table is full

mysql table is full error running query with pymysql

MySQL memory引擎 table is full 问题处理

mysql 解决 ERROR 1114 (HY000): The table 'XXX' is full

Mysql还原时提示“The table 'TABLENAME' is full”

mysql 出现 the table ‘xxxxx’ is full 的问题