diff --git a/.gitlab/issue_templates/failover.md b/.gitlab/issue_templates/failover.md index 4471c51febad05cb5185cbff8b5b5762abf2cfa3..80a580f673d91f5f02ea6a50c4e47db718b1d56d 100644 --- a/.gitlab/issue_templates/failover.md +++ b/.gitlab/issue_templates/failover.md @@ -286,7 +286,7 @@ Running CI jobs will no longer be able to push updates. Jobs that complete now m 1. [ ] 🐺 {+ Coordinator +}: Sidekiq monitor: start purge of non-mandatory jobs, disable Sidekiq crons and allow Sidekiq to wind-down: * In a separate terminal on the deploy host: `/opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p02/030-await-sidekiq-drain.sh` * The `geo_sidekiq_cron_config` job or an RSS kill may re-enable the crons, which is why we run it in a loop - * The loop may be stopped once sidekiq is shut down + * The loop should be stopped once sidekiq is shut down 1. [ ] 🐺 {+ Coordinator +}: Wait for repository verification on the **primary** to complete * Staging: https://staging.gitlab.com/admin/geo_nodes - `staging.gitlab.com` node * Production: https://gitlab.com/admin/geo_nodes - `gitlab.com` node @@ -312,6 +312,7 @@ Running CI jobs will no longer be able to push updates. Jobs that complete now m * Staging: `knife ssh roles:staging-base-be-sidekiq "sudo gitlab-ctl stop sidekiq-cluster"` * Production: `knife ssh roles:gitlab-base-be-sidekiq "sudo gitlab-ctl stop sidekiq-cluster"` * Check that no sidekiq processes show in the GitLab admin panel +1. [ ] 🐺 {+ Coordinator +}: Stop the Sidekiq queue disabling loop from above At this point, the primary can no longer receive any updates. This allows the state of the secondary to converge. @@ -349,7 +350,7 @@ state of the secondary to converge. 1. [ ] 🐺 {+ Coordinator +}: Now disable all sidekiq-cron jobs on the secondary * In a dedicated rails console on the **secondary**: * `loop { Sidekiq::Cron::Job.all.map(&:disable!); sleep 1 }` - * The loop may be stopped once sidekiq is shut down + * The loop should be stopped once sidekiq is shut down * The `geo_sidekiq_cron_config` job or an RSS kill may re-enable the crons, which is why we run it in a loop 1. [ ] 🐺 {+ Coordinator +}: Wait for all Sidekiq jobs to complete on the secondary * Review status of the running Sidekiq monitor script started in [phase 2, above](#phase-2-commence-shutdown-in-azure-), wait for `--> Status: PROCEED` @@ -361,6 +362,7 @@ state of the secondary to converge. * Staging: `knife ssh roles:gstg-base-be-sidekiq "sudo gitlab-ctl stop sidekiq-cluster"` * Production: `knife ssh roles:gprd-base-be-sidekiq "sudo gitlab-ctl stop sidekiq-cluster"` * Check that no sidekiq processes show in the GitLab admin panel +1. [ ] 🐺 {+ Coordinator +}: Stop the Sidekiq queue disabling loop from above At this point all data on the primary should be present in exactly the same form on the secondary. There is no outstanding work in sidekiq on the primary or