...
 
Commits (1)
  • John T Skarbek's avatar
    Adds an alert when consuls fails to find a postgres master · f7bb0974
    John T Skarbek authored
    * This is a bit tricky, each consul node will report that it knows about
    the primary node
    * Due to this, sums up, if less than 3, there is SOME problem
      1. We are short on consul servers, which is still bad overall
      1. No master detected, which is problematic as if there's a failover,
      pgbouncer won't be able to recreate it's configuration properly
      1. We are in the middle of a failover
    * I linked to the main postgresql document due to troubleshooting this
    being quite tricky
    * I'm unsure of the best way to handle this....
    * Closes: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5358
    f7bb0974
groups:
- name: postgresql.rules
rules:
- alert: NoPostgresMasterDetectedByConsul
expr: |
sum(consul_health_service_status{check="service:postgresql", tier="inf", status="passing"}) < 3
labels:
pager: pagerduty
severity: critical
channel: database
annotations:
description: |
No postgresql master is passing the consul check. If there were a
failover, no server is available to populate the pgbouncer
configuration. Check: https://dashboards.gitlab.net/d/a988f2tmz/consul?panelId=23&fullscreen&orgId=1
runbook: troubleshooting/postgres.md
title: No Postgresql Master detected by Consul
- alert: PostgresSQL_XIDConsumptionTooLow
expr: rate(pg_txid_current[1m]) < 5
for: 1m
......