AWS Redshift：致命：非引导用户超出连接限制“500”

Posted 2023-03-30

技术标签:

【中文标题】AWS Redshift：致命：非引导用户超出连接限制“500”【英文标题】：AWS Redshift: FATAL: connection limit "500" exceeded for non-bootstrap users 【发布时间】：2020-04-17 18:10:14 【问题描述】：

希望你们一切都好。

我们经常达到这个限制。我们知道在 Redshift 中无法提高 500 个并发用户连接的限制。我们还知道某些视图 (pg_user_info) 提供了有关用户实际限制的信息。

我们正在寻找一些在本论坛中找不到的答案以及根据您的经验提供的任何指导。

问题：

使用更大的 EC2 实例重新创建集群是否会产生更高的限值？向现有集群添加新节点会产生更高的限值吗？从应用程序开发的角度来看：为了发现或预测会达到此限制的情况，您建议采取哪些具体策略/行动？

Txs - 吉米

【问题讨论】：

您的 Amazon Redshift 集群实际上是否有 500 多个并发用户？这些是真实的人，还是正在连接的应用程序？你真的需要那么多连接吗？他们都在做什么？ 【参考方案1】：

正如您所说，这是 Redshift 中的硬性限制，没有办法提高它。 Redshift 不是高并发/高连接的数据库。

我希望，如果您需要 Redshift 的大数据分析能力，您可以通过连接共享来解决这个问题。 Pgpool 是一个常用的工具。

【讨论】：

【参考方案2】：

好的，伙计们。感谢所有回答的人。我在 AWS 上发布了一张支持票，这是建议，在这里粘贴所有内容，它很长，但我希望它适用于许多遇到此问题的人。我们的想法是在情况发生之前抓住它：

To monitor the number of connections made to the database, you can create a cloudwatch alarm based on the Database connections metrics that will trigger a lambda function when a certain threshold is reached. This lambda function can then terminate idle connections by calling a procedure that terminates idle connections.

Please find the query that creates a procedure to log and terminate long running inactive sessions
:

1. Add view to get all current inactive sessions in the cluster

CREATE OR REPLACE VIEW inactive_sessions as (
    select a.process, 
    trim(a.user_name) as user_name,
    trim(c.remotehost) as remotehost,
    a.usesysid, 
    a.starttime, 
    datediff(s,a.starttime,sysdate) as session_dur, 
    b.last_end, 
    datediff(s,case when b.last_end is not null then b.last_end else a.starttime end,sysdate) idle_dur
        FROM
        (
            select starttime,process,u.usesysid,user_name 
            from stv_sessions s, pg_user u 
            where 
            s.user_name = u.usename 
            and u.usesysid>1
            and process NOT IN (select pid from stv_inflight where userid>1 
            union select pid from stv_recents where status != 'Done' and userid>1)
        ) a 
        LEFT OUTER JOIN (
            select 
            userid,pid,max(endtime) as last_end from svl_statementtext 
            where userid>1 and sequence=0 group by 1,2) b ON a.usesysid = b.userid AND a.process = b.pid

        LEFT OUTER JOIN (
            select username, pid, remotehost from stl_connection_log
            where event = 'initiating session' and username <> 'rsdb') c on a.user_name = c.username AND a.process = c.pid
        WHERE (b.last_end > a.starttime OR b.last_end is null)
        ORDER BY idle_dur
);

2. Add table for logging information about long running transactions that was terminated 

CREATE TABLE IF NOT EXISTS terminated_inactive_sessions (
    process int,
    user_name varchar(50),
    remotehost varchar(50),
    starttime timestamp,
    session_dur int,
    idle_dur int,
    terminated_on timestamp DEFAULT GETDATE()   
);

3. Add procedure to log and terminate any inactive transactions running for longer than 'n' amount of seconds

CREATE OR REPLACE PROCEDURE terminate_and_log_inactive_sessions (n INTEGER) 
AS $$ 
DECLARE
  expired RECORD ; 
BEGIN
FOR expired IN SELECT process, user_name, remotehost, starttime, session_dur, idle_dur FROM inactive_sessions where idle_dur >= n
LOOP
EXECUTE 'INSERT INTO terminated_inactive_sessions (process, user_name, remotehost, starttime, session_dur, idle_dur) values (' || expired.process || ' , ''' || expired.user_name || ''' , ''' || expired.remotehost || ''' , ''' || expired.starttime || ''' , ' || expired.session_dur || ' , ' || expired.idle_dur || ');';
EXECUTE 'SELECT PG_TERMINATE_BACKEND(' || expired.process || ')';
END LOOP ; 

END ; 
$$ LANGUAGE plpgsql;

4. Execute the procedure by running the following command:

  call terminate_and_log_inactive_sessions(100);


Here is a sample lambda function that attempts to close idle connections by querying the view 'inactive_sessions' created above, which you can use as a reference. 

#Current time
now = datetime.datetime.now()

query = "SELECT process, user_name, session_dur, idle_dur FROM inactive_sessions where idle_dur >= %d"

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):

   try:
       conn = psycopg2.connect("dbname=" + db_database + " user=" + db_user + " password=" + db_password + " port=" + db_port + " host=" + db_host)
       conn.autocommit = True
   except:
       logger.error("ERROR: Unexpected error: Could not connect to Redshift cluster.")   
       sys.exit()

   logger.info("SUCCESS: Connection to RDS Redshift cluster succeeded")

   with conn.cursor() as cur:
       cur.execute(query % (session_idle_limit))
       row_count = cur.rowcount
       if row_count >=1:
           result = cur.fetchall()
           for row in result:
               print("terminating session with pid %s that has been idle for %d seconds at %s" % (row[0],row[3],now))
              cur.execute("SELECT PG_TERMINATE_BACKEND(%s);" % (row[0]))
           conn.close()
       else:
           conn.close()

【讨论】：

以上是关于AWS Redshift：致命：非引导用户超出连接限制“500”的主要内容，如果未能解决你的问题，请参考以下文章