Commit d0e27b56 authored by John Jarvis's avatar John Jarvis Committed by Nick Thomas

Jarv/db scripts

parent b140aa2f
...@@ -344,9 +344,7 @@ state of the secondary to converge. ...@@ -344,9 +344,7 @@ state of the secondary to converge.
* On staging, verification may not complete * On staging, verification may not complete
1. [ ] 🐺 {+ Coordinator +}: In "Sync Information", wait for "Last event ID seen from primary" to equal "Last event ID processed by cursor" 1. [ ] 🐺 {+ Coordinator +}: In "Sync Information", wait for "Last event ID seen from primary" to equal "Last event ID processed by cursor"
1. [ ] 🐘 {+ Database-Wrangler +}: Ensure the prospective failover target in GCP is up to date 1. [ ] 🐘 {+ Database-Wrangler +}: Ensure the prospective failover target in GCP is up to date
* Staging: `postgres-01.db.gstg.gitlab.com` * `/opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p03/check-wal-secondary-sync.sh`
* Production: `postgres-01-db-gprd.c.gitlab-production.internal`
* `sudo gitlab-psql -d gitlabhq_production -c "SELECT now() - pg_last_xact_replay_timestamp();"`
* Assuming the clocks are in sync, this value should be close to 0 * Assuming the clocks are in sync, this value should be close to 0
* If this is a large number, GCP may not have some data that is in Azure * If this is a large number, GCP may not have some data that is in Azure
1. [ ] 🐺 {+ Coordinator +}: Now disable all sidekiq-cron jobs on the secondary 1. [ ] 🐺 {+ Coordinator +}: Now disable all sidekiq-cron jobs on the secondary
...@@ -401,38 +399,30 @@ of errors while it is being promoted. ...@@ -401,38 +399,30 @@ of errors while it is being promoted.
1. [ ] 🐘 {+ Database-Wrangler +}: Update the priority of GCP nodes in the repmgr database. Run the following on the current primary: 1. [ ] 🐘 {+ Database-Wrangler +}: Update the priority of GCP nodes in the repmgr database. Run the following on the current primary:
```shell ```shell
# gitlab-psql -d gitlab_repmgr -c "update repmgr_gitlab_cluster.repl_nodes set priority=100 where name like '%gstg%'" /opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p04/update-priority.sh
/opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p04/check-priority.sh
``` ```
1. [ ] 🐘 {+ Database-Wrangler +}: **Gracefully** turn off the **Azure** postgresql standby instances. 1. [ ] 🐘 {+ Database-Wrangler +}: **Gracefully** turn off the **Azure** postgresql standby instances.
* Keep everything, just ensure it’s turned off * Keep everything, just ensure it’s turned off on the secondaries. The following script will prompt before shutting down postgresql.
```shell ```shell
$ knife ssh "role:staging-base-db-postgres AND NOT fqdn:CURRENT_PRIMARY" "gitlab-ctl stop postgresql" /opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p04/shutdown-azure-secondaries.sh
``` ```
1. [ ] 🐘 {+ Database-Wrangler +}: **Gracefully** turn off the **Azure** postgresql primary instance. 1. [ ] 🐘 {+ Database-Wrangler +}: **Gracefully** turn off the **Azure** postgresql primary instance.
* Keep everything, just ensure it’s turned off * Keep everything, just ensure it’s turned off. The following script will prompt before shutting down postgresql.
```shell ```shell
$ knife ssh "fqdn:CURRENT_PRIMARY" "gitlab-ctl stop postgresql" /opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p04/shutdown-azure-primary.sh
``` ```
1. [ ] 🐘 {+ Database-Wrangler +}: After timeout of 30 seconds, repmgr should failover primary to the chosen node in GCP, and other nodes should automatically follow. 1. [ ] 🐘 {+ Database-Wrangler +}: After timeout of 30 seconds, repmgr should failover primary to the chosen node in GCP, and other nodes should automatically follow.
- [ ] Confirm `gitlab-ctl repmgr cluster show` reflects the desired state - [ ] Confirm `gitlab-ctl repmgr cluster show` reflects the desired state
- [ ] Confirm pgbouncer node in GCP (Password is in 1password) ```shell
* Staging: `pgbouncer-01-db-gstg` /opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p04/confirm-repmgr.sh
* Production: `pgbouncer-01-db-gprd` /opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p04/connect-pgbouncers.sh
```
```shell
$ gitlab-ctl pgb-console
...
pgbouncer# SHOW DATABASES;
# You want to see lines like
gitlabhq_production | PRIMARY_IP_HERE | 5432 | gitlabhq_production | | 100 | 5 | | 0 | 0
gitlabhq_production_sidekiq | PRIMARY_IP_HERE | 5432 | gitlabhq_production | | 150 | 5 | | 0 | 0
...
pgbouncer# SHOW SERVERS;
# You want to see lines like
S | gitlab | gitlabhq_production | idle | PRIMARY_IP | 5432 | PGBOUNCER_IP | 54714 | 2018-05-11 20:59:11 | 2018-05-11 20:59:12 | 0x718ff0 | | 19430 |
```
1. [ ] 🐘 {+ Database-Wrangler +}: In case automated failover does not occur, perform a manual failover 1. [ ] 🐘 {+ Database-Wrangler +}: In case automated failover does not occur, perform a manual failover
- [ ] Promote the desired primary - [ ] Promote the desired primary
...@@ -446,9 +436,9 @@ of errors while it is being promoted. ...@@ -446,9 +436,9 @@ of errors while it is being promoted.
``` ```
*Note*: This will fail on the WAL-E node *Note*: This will fail on the WAL-E node
1. [ ] 🐘 {+ Database-Wrangler +}: Check the database is now read-write 1. [ ] 🐘 {+ Database-Wrangler +}: Check the database is now read-write
* Connect to the newly promoted primary in GCP ```bash
* `sudo gitlab-psql -d gitlabhq_production -c "select * from pg_is_in_recovery();"` /opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p04/check-gcp-recovery.sh
* The result should be `F` ```
1. [ ] 🔪 {+ Chef-Runner +}: Update the chef configuration according to 1. [ ] 🔪 {+ Chef-Runner +}: Update the chef configuration according to
* Staging: https://dev.gitlab.org/cookbooks/chef-repo/merge_requests/1989 * Staging: https://dev.gitlab.org/cookbooks/chef-repo/merge_requests/1989
* Production: https://dev.gitlab.org/cookbooks/chef-repo/merge_requests/2218 * Production: https://dev.gitlab.org/cookbooks/chef-repo/merge_requests/2218
......
...@@ -11,7 +11,7 @@ function find_scripts() { ...@@ -11,7 +11,7 @@ function find_scripts() {
find_scripts | while IFS='' read -r file; do find_scripts | while IFS='' read -r file; do
# Ensures all the ../.. references are correct.... # Ensures all the ../.. references are correct....
if [[ -x ${file} ]]; then if [[ -x ${file} && $(basename "$file") != "connect-pgbouncers.sh" ]]; then
echo "${file}" echo "${file}"
SANITY_CHECK_ONLY=1 "${file}" SANITY_CHECK_ONLY=1 "${file}"
fi fi
......
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
UNSYMLINKED_SCRIPT_DIR="$(readlink -f "${SCRIPT_DIR}" || readlink "${SCRIPT_DIR}" || echo "${SCRIPT_DIR}")"
# shellcheck disable=SC1091,SC1090
source "${UNSYMLINKED_SCRIPT_DIR}/../../../../workflow-script-commons.sh"
if [[ -z $POSTGRESQL_GCP_WAL_SECONDARY ]]; then
echo "You must set POSTGRESQL_GCP_WAL_SECONDARY in source_vars"
fi
last_event=$(ssh "$POSTGRESQL_GCP_WAL_SECONDARY" 'sudo gitlab-psql -t -d gitlabhq_production -c "SELECT now() - pg_last_xact_replay_timestamp();"')
echo "Last event seen for $POSTGRESQL_GCP_WAL_SECONDARY was${last_event} ago."
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
UNSYMLINKED_SCRIPT_DIR="$(readlink -f "${SCRIPT_DIR}" || readlink "${SCRIPT_DIR}" || echo "${SCRIPT_DIR}")"
# shellcheck disable=SC1091,SC1090
source "${UNSYMLINKED_SCRIPT_DIR}/../../../../workflow-script-commons.sh"
if [[ -z $POSTGRESQL_GCP_SECONDARIES ]]; then
echo "You must set POSTGRESQL_GCP_SECONDARIES in source_vars"
fi
for secondary in ${POSTGRESQL_GCP_SECONDARIES[*]}
do
recovery=$(ssh_host "$secondary" "sudo gitlab-psql -t -d gitlab_repmgr -c 'select pg_is_in_recovery();' 2>/dev/null")
echo "$secondary : pg_is_in_recovery=$recovery"
done
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
UNSYMLINKED_SCRIPT_DIR="$(readlink -f "${SCRIPT_DIR}" || readlink "${SCRIPT_DIR}" || echo "${SCRIPT_DIR}")"
# shellcheck disable=SC1091,SC1090
source "${UNSYMLINKED_SCRIPT_DIR}/../../../../workflow-script-commons.sh"
if [[ -z $POSTGRESQL_GCP_WAL_SECONDARY ]]; then
echo "You must set POSTGRESQL_GCP_WAL_SECONDARY in source_vars"
fi
ssh_host "$POSTGRESQL_GCP_WAL_SECONDARY" "sudo gitlab-psql -d gitlab_repmgr -c 'select name,priority from repmgr_gitlab_cluster.repl_nodes'"
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
UNSYMLINKED_SCRIPT_DIR="$(readlink -f "${SCRIPT_DIR}" || readlink "${SCRIPT_DIR}" || echo "${SCRIPT_DIR}")"
# shellcheck disable=SC1091,SC1090
source "${UNSYMLINKED_SCRIPT_DIR}/../../../../workflow-script-commons.sh"
if [[ -z $POSTGRESQL_GCP_SECONDARIES ]]; then
echo "You must set POSTGRESQL_GCP_SECONDARIES in source_vars"
fi
ssh_host "${POSTGRESQL_GCP_SECONDARIES[0]}" "sudo gitlab-ctl repmgr cluster show"
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
UNSYMLINKED_SCRIPT_DIR="$(readlink -f "${SCRIPT_DIR}" || readlink "${SCRIPT_DIR}" || echo "${SCRIPT_DIR}")"
# shellcheck disable=SC1091,SC1090
source "${UNSYMLINKED_SCRIPT_DIR}/../../../../source_vars"
if [[ -z $PGBOUNCERS_GCP ]]; then
echo "You must set PGBOUNCERS_GCP in source_vars"
fi
for pgbouncer in ${PGBOUNCERS_GCP[*]}
do
echo "Connecting to $pgbouncer"
echo "After logging in run:"
echo " sudo gitlab-ctl pgb-console"
echo " show databases;"
echo " show servers;"
echo "Confirm the new primary"
ssh "$pgbouncer"
done
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
UNSYMLINKED_SCRIPT_DIR="$(readlink -f "${SCRIPT_DIR}" || readlink "${SCRIPT_DIR}" || echo "${SCRIPT_DIR}")"
# shellcheck disable=SC1091,SC1090
source "${UNSYMLINKED_SCRIPT_DIR}/../../../../workflow-script-commons.sh"
if [[ -z $POSTGRESQL_AZURE_PRIMARY ]]; then
echo "You must set POSTGRESQL_AZURE_PRIMARY in source_vars"
fi
recovery=$(ssh_host "$POSTGRESQL_AZURE_PRIMARY" "sudo gitlab-psql -t -d gitlab_repmgr -c 'select pg_is_in_recovery();' 2>/dev/null")
echo "$POSTGRESQL_AZURE_PRIMARY: pg_is_in_recovery=$recovery"
echo "postgresql will be shutdown on the above host, press enter to continue"
read -r
ssh_host "$POSTGRESQL_AZURE_PRIMARY" "sudo gitlab-ctl stop postgresql"
echo "Getting status:"
p_status=$(ssh_host "$POSTGRESQL_AZURE_PRIMARY" "sudo gitlab-ctl status postgresql")
echo "$POSTGRESQL_AZURE_PRIMARY: $p_status"
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
UNSYMLINKED_SCRIPT_DIR="$(readlink -f "${SCRIPT_DIR}" || readlink "${SCRIPT_DIR}" || echo "${SCRIPT_DIR}")"
# shellcheck disable=SC1091,SC1090
source "${UNSYMLINKED_SCRIPT_DIR}/../../../../workflow-script-commons.sh"
if [[ -z $POSTGRESQL_AZURE_SECONDARIES ]]; then
echo "You must set POSTGRESQL_AZURE_SECONDARIES in source_vars"
fi
for secondary in ${POSTGRESQL_AZURE_SECONDARIES[*]}
do
recovery=$(ssh_host "$secondary" "sudo gitlab-psql -t -d gitlab_repmgr -c 'select pg_is_in_recovery();' 2>/dev/null")
echo "$secondary : pg_is_in_recovery=$recovery"
done
echo "postgresql will be shutdown on the above hosts, press enter to continue"
read -r
echo "Shutting down:"
for secondary in ${POSTGRESQL_AZURE_SECONDARIES[*]}
do
ssh_host "$secondary" "sudo gitlab-ctl stop postgresql"
done
echo "Getting status:"
for secondary in ${POSTGRESQL_AZURE_SECONDARIES[*]}
do
p_status=$(ssh_host "$secondary" "sudo gitlab-ctl status postgresql")
echo "$secondary: $p_status"
done
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
UNSYMLINKED_SCRIPT_DIR="$(readlink -f "${SCRIPT_DIR}" || readlink "${SCRIPT_DIR}" || echo "${SCRIPT_DIR}")"
# shellcheck disable=SC1091,SC1090
source "${UNSYMLINKED_SCRIPT_DIR}/../../../../workflow-script-commons.sh"
if [[ -z $POSTGRESQL_AZURE_PRIMARY ]]; then
echo "You must set POSTGRESQL_AZURE_PRIMARY in source_vars"
fi
ssh_host "$POSTGRESQL_AZURE_PRIMARY" "sudo gitlab-psql -t -d gitlab_repmgr -c \"update repmgr_gitlab_cluster.repl_nodes set priority=100 where name like '%g${FAILOVER_ENVIRONMENT}%'\" 2>/dev/null"
...@@ -24,3 +24,9 @@ export TEAM_CLOUD_CONDUCTOR="@ahmadsherif" # Cloud Conductor (backup: ...@@ -24,3 +24,9 @@ export TEAM_CLOUD_CONDUCTOR="@ahmadsherif" # Cloud Conductor (backup:
export TEAM_QUALITY="@meks" # Quality (backup: @remy) export TEAM_QUALITY="@meks" # Quality (backup: @remy)
export TEAM_FAILBACK_HANDLER="@ahmadsherif" # Failback Handler (staging only) (backup: @alejandro) export TEAM_FAILBACK_HANDLER="@ahmadsherif" # Failback Handler (staging only) (backup: @alejandro)
export TEAM_HEAD_HONCHO="@edjdev" # Head Honcho (production only) (backup: @sytses) export TEAM_HEAD_HONCHO="@edjdev" # Head Honcho (production only) (backup: @sytses)
export POSTGRESQL_GCP_SECONDARIES=()
export POSTGRESQL_GCP_WAL_SECONDARY=()
export POSTGRESQL_AZURE_SECONDARIES=()
export POSTGRESQL_AZURE_PRIMARY=()
export PGBOUNCERS_GCP=()
...@@ -18,6 +18,11 @@ function die() { ...@@ -18,6 +18,11 @@ function die() {
exit 1 exit 1
} }
function ssh_host() {
# shellcheck disable=SC2068
ssh -o StrictHostKeyChecking=no $@ 2>/dev/null
}
function PRODUCTION_ONLY() { function PRODUCTION_ONLY() {
if [[ $FAILOVER_ENVIRONMENT != "prd" ]]; then if [[ $FAILOVER_ENVIRONMENT != "prd" ]]; then
die "This step is production only" die "This step is production only"
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment