Skip to content

Adjust database connection pool and timeout configurations#34941

Open
spbolton wants to merge 5 commits intomainfrom
claude/fix-issue-34921-6OQFO
Open

Adjust database connection pool and timeout configurations#34941
spbolton wants to merge 5 commits intomainfrom
claude/fix-issue-34921-6OQFO

Conversation

@spbolton
Copy link
Contributor

@spbolton spbolton commented Mar 11, 2026

Proposed Changes

  • Increased DB_MAX_WAIT from 900000ms (15m) to 1800000ms (30m) to allow longer connection lifetime
  • Decreased DB_MIN_IDLE from 3 to 1 to reduce minimum idle connections in the pool
  • Increased DB_CONNECTION_TIMEOUT from 5000ms (5s) to 30000ms (30s) for new connection attempts
  • Fixed environment variable name inconsistency: DB_MAXWAITDB_MAX_WAIT in DataSourceStrategyProvider to match the actual environment variable used in setenv.sh

Checklist

  • Tests
  • Translations
  • Security Implications Contemplated (add notes if applicable)

Additional Info

The configuration changes optimize database connection pool behavior by allowing longer connection lifetimes and more lenient timeout windows, while reducing the minimum idle connection overhead. The variable name fix ensures the code correctly references the environment variable defined in the shell configuration.

https://claude.ai/code/session_01QWBxEhHLYeQKZNQAqawGuM

This PR fixes: #34921

…aults (#34921)

The CONNECTION_DB_MAX_WAIT constant in DataSourceStrategyProvider.java
used "DB_MAXWAIT" while setenv.sh exports "DB_MAX_WAIT", causing the
maxLifetime setting to never be applied. This resulted in HikariCP
falling back to a 60s default, causing excessive connection churn under
load.

Changes:
- Fix constant from "DB_MAXWAIT" to "DB_MAX_WAIT" to match setenv.sh
- Increase DB_MAX_WAIT default from 900000ms (15m) to 1800000ms (30m)
- Reduce DB_MIN_IDLE default from 3 to 1 connection
- Increase DB_CONNECTION_TIMEOUT default from 5000ms to 30000ms (30s)

https://claude.ai/code/session_01QWBxEhHLYeQKZNQAqawGuM
claude added 2 commits March 11, 2026 14:52
The env var name fix (DB_MAXWAIT → DB_MAX_WAIT) changes the effective
maxLifetime from 60s (code default when var wasn't found) to 1800s
(30min from setenv.sh). Combined with the CI test pool limit of only
15 connections, this causes pool exhaustion in AI embeddings tests.

Set DB_MAX_WAIT=120000 (2min) in both pom.xml and docker-compose.yml
to keep test pool behavior stable with the constrained connection limit.

https://claude.ai/code/session_013tzbmrxwwC4FozZ5P4wP7B
Apply the same DB_MAX_WAIT=120000 (2min) override to e2e-java,
e2e-node, dotcms-ui-e2e, and karate test modules. All use
DB_MAX_TOTAL=15 and would be affected by the effective maxLifetime
change from 60s to 30min.

https://claude.ai/code/session_013tzbmrxwwC4FozZ5P4wP7B
@github-actions github-actions bot added the Area : Frontend PR changes Angular/TypeScript frontend code label Mar 11, 2026
runSQL() used getPGVectorConnection() which calls PGvector.addVectorType()
for DDL operations like CREATE EXTENSION and CREATE TABLE. When called
before the pgvector extension exists, addVectorType queries pg_type and
caches OID=0 for the "vector" type. This stale cached value persists for
the connection's lifetime in HikariCP, causing "Unknown type vector"
errors when that connection is later used for queries with PGvector
parameters (e.g. countEmbeddings).

The bug became more likely to manifest after the DB_MAX_WAIT fix because:
- DB_MIN_IDLE=1 (fewer pool connections = higher reuse of poisoned conn)
- Longer maxLifetime = poisoned connection stays in pool longer

Fix: use a plain connection (without addVectorType) for DDL operations,
which don't need the vector type registered.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous value of 120000ms (2 min) was less than the default
idleTimeout of 300000ms (5 min), causing HikariCP to log:
"idleTimeout is close to or more than maxLifetime, disabling it."

Bumping to 600000ms (10 min) ensures maxLifetime > idleTimeout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Max Connection Lifetime 15m
export DB_MAX_WAIT=${DB_MAX_WAIT:-"900000"}
# Max Connection Lifetime 30m
export DB_MAX_WAIT=${DB_MAX_WAIT:-"1800000"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this - if dotCMS is waiting 30s for a db connection, dotCMS is literally dead in the water. It might almost be better to fail fast, e.g. lower it to 5 seconds, instead of queueing up a bunch of waiting connections, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a broader discussion to be had on this and 30s was not chosen at random and ties into the changes of max wait and max lifetime. A new connection in our infrastructure can take "a few seconds" even when running well and Hikari chose 30s as their default based upon real world usage in cloud environments. There is a broader question though of how this relates to request timeouts and how we handle resilience in general. I have a full discussion here https://docs.google.com/document/d/1zvUhexNfryQ8GTMX-PrAGqH8_9JWqFFt3hgD7NowvTY/edit?usp=sharing

I think that 5s may be too small from what I have seen, and to balance the existing architecture and in combination with the properly sized pool, I would be fine if we want to start with 10 or 15s. The key really is to properly monitor so we know, we should be moving away from the idea of a request from the app triggering a new physical connection almost every time as it does currently this in itself should reduce delays and load on the backend db

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area : Backend PR changes Java/Maven backend code Area : Frontend PR changes Angular/TypeScript frontend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

fix(db): DB_MAXWAIT naming mismatch causes maxLifetime=60s on all deployments + pool defaults tune

4 participants