Adds an alert when consuls fails to find a postgres master

* This is a bit tricky, each consul node will report that it knows about
the primary node
* So this alert assumes that there will be quorum
* We check to see if there's 0 reporting `passing`
* I linked to the main postgresql document due to troubleshooting this
being quite tricky
* Closes: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5358
parent d85aa3b6
groups:
- name: postgresql.rules
rules:
- alert: NoPostgresMasterDetectedByConsul
expr: |
sum(consul_health_service_status{check="service:postgresql", tier="inf", status="passing"}) == 0
labels:
pager: pagerduty
severity: critical
channel: database
annotations:
description: |
No postgresql master is passing the consul check. If there were a
failover, no server is available to populate the pgbouncer
configuration. Check: https://dashboards.gitlab.net/d/a988f2tmz/consul?panelId=23&fullscreen&orgId=1
runbook: troubleshooting/postgres.md
title: No Postgresql Master detected by Consul
- alert: PostgresSQL_XIDConsumptionTooLow
expr: rate(pg_txid_current[1m]) < 5
for: 1m
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment