一次tomcat使用postgresql达到最大连接数案例

原创刘韬云和恩墨 2021-09-03

1904

客户在使用postgresql时，访问应用出现断断续续不稳定现象，客户环境为tomcat 8 + postgresql13.3。

应用报错

通过查看应用端tomcat的错误日志发现有如下错误：

org.postgresql.util.PSQLException: FATAL: sorry, too many clients already

at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:613)

at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:161)

at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:213)

at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:51)

at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223)

at org.postgresql.Driver.makeConnection(Driver.java:465)

at org.postgresql.Driver.connect(Driver.java:264)

at java.sql.DriverManager.getConnection(DriverManager.java:582)

at java.sql.DriverManager.getConnection(DriverManager.java:185)

说明db那边已经不能分配jdbc连接了

分析过程

登录postgresql检查运行日志，同样发现错误提示：

17:45:37.065 CST [8570] FATAL: sorry, too many clients already

出现上面错误是已经达到了pg设置的最大连接数了

检查pg的配置：

postgres=# show config_file;

config_file

------------------------------

/home/pgdata/postgresql.conf

postgres=# show max_connections;

max_connections

-----------------

500

检查当前的连接消耗

postgres=# select count(*) from pg_stat_activity where application_name='PostgreSQL JDBC Driver' ;

count

-------

495

(1 row)

最大500个连接，已经基本全部被耗光

开启日志

当前为默认值-1，不记录慢sql

postgres=# show log_min_duration_statement;

log_min_duration_statement

----------------------------

-1

(1 row)

调整为5秒

vi postgresql.conf

调整为： log_min_duration_statement = 5000

在线生效：

postgres=# SELECT pg_reload_conf();

pg_reload_conf

----------------

(1 row)

postgres=# show log_directory;

log_directory

---------------

log

(1 row)

postgres=# show log_filename ;

log_filename

-------------------

postgresql-%a.log

(1 row)

发现pg 日志中有大量的如下日志：

22:21:07.201 CST [21146] LOG: duration: 147753.171 ms execute <unnamed>: select a.id from minervdb_bench a,minervdb_bench1 b where a.id = b.id……

22:23:17.275 CST [21535] LOG: duration: 16739.452 ms execute <unnamed>: select a.id from minervdb_bench a,minervdb_bench1 b where a.id = b.id……

22:24:20.080 CST [21598] LOG: duration: 16695.955 ms execute <unnamed>: select a.id from minervdb_bench a,minervdb_bench1 b where a.id = b.id……

22:24:38.846 CST [21618] LOG: duration: 16660.069 ms execute <unnamed>: select a.id from minervdb_bench a,minervdb_bench1 b where a.id = b.id……

……

大量10多秒的慢查询耗光了连接资源

至此大概故障的原因就明显了，由于sql语句的执行效率明显低于正常的预期，当访问量达到一定程度后，由于大量的连接没有释放，当连接积累到一定程度后达到了最大值，新的访问请求刚好在达到最大连接时出现访问异常，当访问量降低后，只要没有达到连接的最大值，业务处理还是能够正常返回结果，只不过需要等待较长的加载时间；优化慢sql后应用恢复正常。

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

杜伟

1年前

坚持一件能让自己变好的事，的确不容易，打磨自己的过程，也总是充满艰难和迷茫，但只要你朝前走，即使失败，也是经历

1年前

一次tomcat使用postgresql达到最大连接数案例

应用报错

分析过程

开启日志

评论

相关阅读