# **PRODUCTION ONLY** T minus 3 weeks (Date TBD) [📁](bin/scripts/02_failover/010_t-3w)
1. [x] Notify content team of upcoming announcements to give them time to prepare blog post, email content. https://gitlab.com/gitlab-com/blog-posts/issues/523
1. [ ] Ensure this issue has been created on `dev.gitlab.org`, since `gitlab.com` will be unavailable during the real failover!!!
# ** PRODUCTION ONLY** T minus 1 week (Date TBD)
# ** PRODUCTION ONLY** T minus 1 week (Date TBD) [📁](bin/scripts/02_failover/020_t-1w)
1. [x] 🔪 {+ Chef-Runner +}: Scale up the `gprd` fleet to production capacity: https://gitlab.com/gitlab-com/migration/issues/286
1. [ ] ☎ {+ Comms-Handler +}: communicate date to Google
...
...
@@ -111,15 +111,15 @@ These dashboards might be useful during the failover:
1. [ ] 🔪 {+ Chef-Runner +}: Ensure the GCP environment is inaccessible to the outside world
# T minus 1 day (Date TBD)
# T minus 1 day (Date TBD) [📁](bin/scripts/02_failover/030_t-1d)
-`As part of upcoming GitLab.com maintenance work, CI runners will not be accepting new jobs until __MAINTENANCE_END_TIME__ UTC. GitLab.com will undergo maintenance in 1 hour. Working doc: __GOOGLE_DOC_URL__`
1. [ ] ☎ {+ Comms-Handler +}: Post to #announcements on Slack:
1.[ ] **PRODUCTION ONLY** ☁ {+ Cloud-conductor +}: Create a maintenance window in PagerDuty for [GitLab Production service](https://gitlab.pagerduty.com/services/PATDFCE) for 2 hours starting in an hour from now.
1.[ ] **PRODUCTION ONLY** ☁ {+ Cloud-conductor +}: [Create an alert silence](https://alerts.gitlab.com/#/silences/new) for 2 hours starting in an hour from now with the following matcher(s):
-`environment`: `prd`
...
...
@@ -188,7 +188,7 @@ an hour before the scheduled maintenance window.
```
# T minus zero (failover day) (Date TBD)
# T minus zero (failover day) (__FAILOVER_DATE__) [📁](bin/scripts/02_failover/060_go/)
We expect the maintenance window to last for up to 2 hours, starting from now.
...
...
@@ -245,7 +245,7 @@ you see something happening that shouldn't be public, mention it.
### Prevent updates to the primary
#### Phase 1: Block non-essential network access to the primary
#### Phase 1: Block non-essential network access to the primary [📁](bin/scripts/02_failover/060_go/p01)
1. [ ] 🔪 {+ Chef-Runner +}: Update HAProxy config to allow Geo and VPN traffic over HTTPS and drop everything else
* Staging
...
...
@@ -266,7 +266,7 @@ you see something happening that shouldn't be public, mention it.
Running CI jobs will no longer be able to push updates. Jobs that complete now may be lost.
#### Phase 2: Commence Shutdown in Azure
#### Phase 2: Commence Shutdown in Azure [📁](bin/scripts/02_failover/060_go/p02)
1. [ ] 🔪 {+ Chef-Runner +}: Stop mailroom on all the nodes
1.[ ] 🐺 {+ Coordinator +}: Ensure any data not replicated by Geo is replicated manually. We know about [these](https://docs.gitlab.com/ee/administration/geo/replication/index.html#examples-of-unreplicated-data):
* [ ] CI traces in Redis
...
...
@@ -365,7 +365,7 @@ of errors while it is being promoted.
## Promote the secondary
#### Phase 4: Reconfiguration, Part 1
#### Phase 4: Reconfiguration, Part 1 [📁](bin/scripts/02_failover/060_go/p04)
1. [ ] ☁ {+ Cloud-conductor +}: Incremental snapshot of database disks in case of failback in Azure and GCP
* Staging: `bin/snapshot-dbs staging`
...
...
@@ -476,8 +476,7 @@ of errors while it is being promoted.
## During-Blackout QA
#### Phase 5: Verification, Part 1
#### Phase 5: Verification, Part 1 [📁](bin/scripts/02_failover/060_go/p05)
The details of the QA tasks are listed in the test plan document.
...
...
@@ -488,7 +487,7 @@ The details of the QA tasks are listed in the test plan document.
-`GitLab.com's migration to @GCPcloud is almost complete. Site is back up, although we're continuing to verify that all systems are functioning correctly. We're live on YouTube`
#### Phase 10: Verification, Part 2
#### Phase 10: Verification, Part 2 [📁](bin/scripts/02_failover/060_go/p10)
1.**Start After-Blackout QA** This is the second half of the test plan.
1. [ ] 🏆 {+ Quality +}: Ensure all "after the blackout" QA automated tests have succeeded
1.**Configure `bin/source_vars`**: The variables are explained in the file. Since this contains secrets, this file should not be checked in. (it's `.gitignore`'d)
1.**Setup the workflow issues**": Run `bin/start-failover-procedure.sh`. This will setup several issues in the issue tracker for performing the checks, failover, tests, etc.
* Any variables in the template in the format `__VARIABLE__` will be substituted with their values from the `bin/source_vars` file, saving manual effort.
Before a failover, the coordinator needs to login to the deploy host:
*`deploy-01-sv-gprd.c.gitlab-production.internal` for production
*`deploy-01-sv-gstg.c.gitlab-staging-1.internal` for staging
1.**Configure `vi /opt/gitlab-migration/bin/source_vars`**: The variables are explained in the file. Since this contains secrets, this file should not be checked in. (it's `.gitignore`'d)
1.**Verify `/opt/gitlab-migration/bin/verify-failover-config`**: You should receive a message indicating success
1.**Setup the workflow issues**": Run `/opt/gitlab-migration/bin/start-failover-procedure.sh`. This will setup several issues in the issue tracker for performing the checks, failover, tests, etc.
* Any variables in the template in the format `__VARIABLE__` will be substituted with their values from the `bin/source_vars` file, saving manual effort.