Commit d3fe20fc authored by Brett Walker's avatar Brett Walker

Merge branch 'start-failover-procedure' into 'master'

Start failover procedure

See merge request gitlab-com/migration!164
parents 8d7a315b 85dc6104
Pipeline #88282 passed with stage
in 21 seconds
......@@ -10,7 +10,7 @@ shellcheck:
- wget https://storage.googleapis.com/shellcheck/shellcheck-stable.linux.x86_64.tar.xz -O - | xzcat | tar -xv
script:
- find ./bin/azure ./bin/gcp -name '*.sh' | xargs ./shellcheck-stable/shellcheck -x
- ./shellcheck-stable/shellcheck -x ./bin/check-script-references ./bin/workflow-script-commons.sh ./bin/source_vars_template.sh
- ./shellcheck-stable/shellcheck -x ./bin/check-script-references ./bin/workflow-script-commons.sh ./bin/source_vars_template.sh ./bin/start-failover-procedure.sh
references:
stage: test
......
# Failover Team
| Role | Assigned To |
| -----------------------------------------------------------------------|-------------|
| 🐺 Coordinator | |
| 🔪 Chef-Runner | |
| ☎ Comms-Handler | |
| 🐘 Database-Wrangler | |
| ☁ Cloud-conductor | |
| 🏆 Quality | |
| ↩ Fail-back Handler (_Staging Only_) | |
| 🎩 Head Honcho (_Production Only_) | |
| Role | Assigned To |
| -----------------------------------------------------------------------|----------------------------|
| 🐺 Coordinator | __TEAM_COORDINATOR__ |
| 🔪 Chef-Runner | __TEAM_CHEF_RUNNER__ |
| ☎ Comms-Handler | __TEAM_COMMS_HANDLER__ |
| 🐘 Database-Wrangler | __TEAM_DATABASE_WRANGLER__ |
| ☁ Cloud-conductor | __TEAM_CLOUD_CONDUCTOR__ |
| 🏆 Quality | __TEAM_QUALITY__ |
| ↩ Fail-back Handler (_Staging Only_) | __TEAM_FAILBACK_HANDLER__ |
| 🎩 Head Honcho (_Production Only_) | __TEAM_HEAD_HONCHO__ |
(try to ensure that 🔪, ☁ and ↩ are always the same person for any given run)
......@@ -20,10 +20,10 @@ Perform these steps when the issue is created.
- [ ] 🐺 {+ Coordinator +}: Fill out the names of the failover team in the table above.
- [ ] 🐺 {+ Coordinator +}: Fill out dates/times and links in this issue:
- `START_TIME` & `END_TIME`
- `GOOGLE_DOC_LINK` (for PRODUCTION, create a new doc and make it writable for GitLabbers, and readable for the world)
- **PRODUCTION ONLY** `LINK_TO_BLOG_POST`
- **PRODUCTION ONLY** `END_TIME`
- Start Time: `__MAINTENANCE_START_TIME__` & End Time: `__MAINTENANCE_END_TIME__`
- Google Working Doc: __GOOGLE_DOC_URL__ (for PRODUCTION, create a new doc and make it writable for GitLabbers, and readable for the world)
- **PRODUCTION ONLY** Blog Post: __BLOG_POST_URL__
- **PRODUCTION ONLY** End Time: __MAINTENANCE_END_TIME__
# Support Options
......@@ -107,7 +107,7 @@ These dashboards might be useful during the failover:
- Details of specific situations with very-long running CI jobs which may loose their artifacts and logs if they don't complete before the maintenance window
1. [ ] ☎ {+ Comms-Handler +}: Ensure that YouTube stream will be available for Zoom call
1. [ ] ☎ {+ Comms-Handler +}: Tweet blog post from `@gitlab` and `@gitlabstatus`
- `Reminder: GitLab.com will be undergoing 2 hours maintenance on Saturday XX June 2018, from START_TIME - END_TIME UTC. Follow @gitlabstatus for more details. LINK_TO_BLOG_POST`
- `Reminder: GitLab.com will be undergoing 2 hours maintenance on Saturday XX June 2018, from __MAINTENANCE_START_TIME__ - __MAINTENANCE_END_TIME__ UTC. Follow @gitlabstatus for more details. __BLOG_POST_URL__`
1. [ ] 🔪 {+ Chef-Runner +}: Ensure the GCP environment is inaccessible to the outside world
......@@ -138,7 +138,7 @@ much as possible, we'll stop any new runner jobs from being picked up, starting
an hour before the scheduled maintenance window.
1. [ ] **PRODUCTION ONLY** ☎ {+ Comms-Handler +}: Tweet from `@gitlabstatus`
- `As part of upcoming GitLab.com maintenance work, CI runners will not be accepting new jobs until END_TIME UTC. GitLab.com will undergo maintenance in 1 hour. Working doc: GOOGLE_DOC_LINK`
- `As part of upcoming GitLab.com maintenance work, CI runners will not be accepting new jobs until __MAINTENANCE_END_TIME__ UTC. GitLab.com will undergo maintenance in 1 hour. Working doc: __GOOGLE_DOC_URL__`
1. [ ] ☎ {+ Comms-Handler +}: Post to #announcements on Slack:
- `./bin/azure/02_failover/t-1h/020_slack_announcement.sh`
1. [ ] **PRODUCTION ONLY** ☁ {+ Cloud-conductor +}: Create a maintenance window in PagerDuty for [GitLab Production service](https://gitlab.pagerduty.com/services/PATDFCE) for 2 hours starting in an hour from now.
......
......@@ -174,3 +174,11 @@ Each [team](https://about.gitlab.com/team/chart/) involved in the effort has a l
1. **Automate the lifecycle of environments for GitLab.com**: https://gitlab.com/gitlab-com/environments
1. **GitLab.com Infrastructure**: https://gitlab.com/gitlab-com/infrastructure
1. **GitLab CE**: https://gitlab.com/gitlab-org/gitlab-ce
## Preparing for a Failover Run
1. **Setup `bin/source_vars`**: `cp ./bin/source_vars_template.sh ./bin/source_vars`
1. **Configure `bin/source_vars`**: The variables are explained in the file. Since this contains secrets, this file should not be checked in. (it's `.gitignore`'d)
1. **Setup the workflow issues**": Run `bin/start-failover-procedure.sh`. This will setup several issues in the issue tracker for performing the checks, failover, tests, etc.
* Any variables in the template in the format `__VARIABLE__` will be substituted with their values from the `bin/source_vars` file, saving manual effort.
......@@ -4,12 +4,23 @@
# NB. Note: do not edit this file. Please copy this file to `source_vars` and override the defaults there
# --------------------------------------------------------------------------------------------------------
export FAILOVER_DATE="__REQUIRED__" # The date of the failover in YYYY-MM-DD format
export FAILOVER_ENVIRONMENT="__REQUIRED__" # Environment: This should be prd or std
export MAINTENANCE_START_TIME="__REQUIRED__" # Maintenance window start time in UTC. eg 13h00
export MAINTENANCE_END_TIME="__REQUIRED__" # Maintenance window end time in UTC. eg 15h00
export FAILOVER_DATE="__REQUIRED__" # The date of the failover in YYYY-MM-DD format
export FAILOVER_ENVIRONMENT="__REQUIRED__" # Environment: This should be prd or stg
export MAINTENANCE_START_TIME="__REQUIRED__" # Maintenance window start time in UTC. eg 13h00
export MAINTENANCE_END_TIME="__REQUIRED__" # Maintenance window end time in UTC. eg 15h00
export BLOG_POST_URL="https://about.gitlab.com/2018/07/19/gcp-move-update/" # Blog post URL
export GOOGLE_DOC_URL="https://docs.google.com/document/d/18vGk6dQs7L0oGQOb_bNiFa5JhwLq5WBS7oNxQy09ml8/edit" # Google Working Doc for the event
export SLACK_WEBHOOK_URL="__REQUIRED__" # This is available at https://gitlab.slack.com/services/BBZP9QKLZ
export SLACK_WEBHOOK_URL="__REQUIRED__" # This is available at https://gitlab.slack.com/services/BBZP9QKLZ
export ZOOM_LINK="https://gitlab.zoom.us/j/859814316" # Find this in the calendar
export GITLAB_INSTANCE="https://dev.gitlab.org/" # Issue tracker in which to create the failover issues
export GITLAB_MIGRATION_PROJECT_PATH="gitlab-com/migration" # Migration Project
export GITLAB_TOKEN="__REQUIRED__" # Available at https://gitlab.1password.com/vaults/ljcrnm55wwhnh5ynr7edbxgowm/allitems/dx2qetzqbbcsrlrj3vtunexagy
export TEAM_COORDINATOR="@nick" # Coordinator (backup: @digitalmoksha)
export TEAM_CHEF_RUNNER="@ahmadsherif" # Chef Runner (backup: @alejandro)
export TEAM_COMMS_HANDLER="@dawsmith" # Comms Handler
export TEAM_DATABASE_WRANGLER="@ibaum" # Database Wrangler (backup: @jarv)
export TEAM_CLOUD_CONDUCTOR="@ahmadsherif" # Cloud Conductor (backup: @alejandro)
export TEAM_QUALITY="@meks" # Quality (backup: @remy)
export TEAM_FAILBACK_HANDLER="@ahmadsherif" # Failback Handler (staging only) (backup: @alejandro)
export TEAM_HEAD_HONCHO="@edjdev" # Head Honcho (production only) (backup: @sytses)
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
export SKIP_HOST_CHECK=true
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
# shellcheck disable=SC1091,SC1090
source "${SCRIPT_DIR}/workflow-script-commons.sh"
# --------------------------------------------------------------
function createIssue() {
local title
local template
local body
title=$1
template=$2
body=$(cat "${SCRIPT_DIR}/../.gitlab/issue_templates/${template}.md")
echo "Creating ${title} with template ${template}"
while read -r line; do
# shellcheck disable=SC2001
body=$(echo "$body"| sed "s#__${line}__#${!line}#")
done < <(grep -Eho 'export \w+' ./bin/source_vars_template.sh|cut -d" " -f2)
PROJECT_ID=${GITLAB_MIGRATION_PROJECT_PATH/\//%2f}
curl --silent --fail --request POST -H "Private-Token: ${GITLAB_TOKEN}" "${GITLAB_INSTANCE}/api/v4/projects/${PROJECT_ID}/issues" --data-urlencode "title=${title}" --data-urlencode "description=${body}" > /dev/null
}
if [[ ${FAILOVER_ENVIRONMENT} == "stg" ]]; then
name="${FAILOVER_DATE} STAGING failover attempt:"
else
name="${FAILOVER_DATE} PRODUCTION failover attempt:"
fi
createIssue "${name} preflight checks" "preflight_checks"
createIssue "${name} main procedure" "failover"
createIssue "${name} test plan" "test_plan"
createIssue "${name} failback" "failback"
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment