Commit ff93d389 authored by Andrew Newdigate's avatar Andrew Newdigate
Browse files

Script directory structure

parent 9068ebe7
source_vars
image: alpine:latest
stages:
- test
# Ensure scripts are well-written
shellcheck:
stage: test
before_script:
- wget https://storage.googleapis.com/shellcheck/shellcheck-stable.linux.x86_64.tar.xz -O - | xzcat | tar -xv
script:
- find ./bin/azure ./bin/gcp -name '*.sh' | xargs ./shellcheck-stable/shellcheck -x
- ./shellcheck-stable/shellcheck -x ./bin/check-script-references ./bin/workflow-script-commons.sh ./bin/source_vars_template.sh
references:
stage: test
before_script:
- apk add --no-cache bash
script:
- bash -x ./bin/check-script-references
...@@ -114,11 +114,10 @@ These dashboards might be useful during the failover: ...@@ -114,11 +114,10 @@ These dashboards might be useful during the failover:
# T minus 1 day (Date TBD) # T minus 1 day (Date TBD)
1. [ ] 🐺 {+ Coordinator +}: Perform (or coordinate) Preflight Checklist 1. [ ] 🐺 {+ Coordinator +}: Perform (or coordinate) Preflight Checklist
1. [ ] **PRODUCTION ONLY** ☎ {+ Comms-Handler +}: Tweet from `@gitlab` 1. [ ] **PRODUCTION ONLY** ☎ {+ Comms-Handler +}: Tweet from `@gitlab`.
- `Reminder: GitLab.com will be undergoing 2 hours maintenance tomorrow, from START_TIME - END_TIME UTC. Follow @gitlabstatus for more details. LINK_TO_BLOG_POST` - Tweet content from `./bin/azure/02_failover/t-1d/010_gitlab_twitter_announcement.sh`
1. [ ] **PRODUCTION ONLY** ☎ {+ Comms-Handler +}: Retweet `@gitlab` tweet from `@gitlabstatus` with further details 1. [ ] **PRODUCTION ONLY** ☎ {+ Comms-Handler +}: Retweet `@gitlab` tweet from `@gitlabstatus` with further details
- `Reminder: GitLab.com will be undergoing 2 hours maintenance tomorrow. We'll be live on YouTube. Working doc: GOOGLE_DOC_LINK, Blog: LINK_TO_BLOG_POST` - Tweet content from `./bin/azure/02_failover/t-1d/020_gitlabstatus_twitter_announcement.sh`
# T minus 3 hours (Date TBD) # T minus 3 hours (Date TBD)
...@@ -141,8 +140,7 @@ an hour before the scheduled maintenance window. ...@@ -141,8 +140,7 @@ an hour before the scheduled maintenance window.
1. [ ] **PRODUCTION ONLY** ☎ {+ Comms-Handler +}: Tweet from `@gitlabstatus` 1. [ ] **PRODUCTION ONLY** ☎ {+ Comms-Handler +}: Tweet from `@gitlabstatus`
- `As part of upcoming GitLab.com maintenance work, CI runners will not be accepting new jobs until END_TIME UTC. GitLab.com will undergo maintenance in 1 hour. Working doc: GOOGLE_DOC_LINK` - `As part of upcoming GitLab.com maintenance work, CI runners will not be accepting new jobs until END_TIME UTC. GitLab.com will undergo maintenance in 1 hour. Working doc: GOOGLE_DOC_LINK`
1. [ ] ☎ {+ Comms-Handler +}: Post to #announcements on Slack: 1. [ ] ☎ {+ Comms-Handler +}: Post to #announcements on Slack:
* Staging: `We're rehearsing the failover of GitLab.com in *1 hour* by migrating staging.gitlab.com to GCP. Come watch us at ZOOM_LINK! Notes in GOOGLE_DOC_LINK!` - `./bin/azure/02_failover/t-1h/020_slack_announcement.sh`
* Production: `GitLab.com is being migrated to GCP in *1 hour*. There is a 2-hour downtime window. We'll be live on YouTube. Notes in GOOGLE_DOC_LINK!`
1. [ ] **PRODUCTION ONLY** ☁ {+ Cloud-conductor +}: Create a maintenance window in PagerDuty for [GitLab Production service](https://gitlab.pagerduty.com/services/PATDFCE) for 2 hours starting in an hour from now. 1. [ ] **PRODUCTION ONLY** ☁ {+ Cloud-conductor +}: Create a maintenance window in PagerDuty for [GitLab Production service](https://gitlab.pagerduty.com/services/PATDFCE) for 2 hours starting in an hour from now.
1. [ ] **PRODUCTION ONLY** ☁ {+ Cloud-conductor +}: [Create an alert silence](https://alerts.gitlab.com/#/silences/new) for 2 hours starting in an hour from now with the following matcher(s): 1. [ ] **PRODUCTION ONLY** ☁ {+ Cloud-conductor +}: [Create an alert silence](https://alerts.gitlab.com/#/silences/new) for 2 hours starting in an hour from now with the following matcher(s):
- `environment`: `prd` - `environment`: `prd`
...@@ -177,16 +175,16 @@ an hour before the scheduled maintenance window. ...@@ -177,16 +175,16 @@ an hour before the scheduled maintenance window.
* Before you run the commands below, ensure that the ssh key used to ssh to the pages VMs are in your ssh-agent: * Before you run the commands below, ensure that the ssh key used to ssh to the pages VMs are in your ssh-agent:
``` ```
ssh-add -l # to list keys ssh-add -l # to list keys
ssh-add path/to/ssh/key # if you do not have the key loaded ssh-add path/to/ssh/key # if you do not have the key loaded
``` ```
* Staging: * Staging:
``` ```
ssh -A 10.124.2.8 # nfs5.staging.gitlab.com ssh -A 10.124.2.8 # nfs5.staging.gitlab.com
tmux tmux
sudo ls -1 /var/opt/gitlab/gitlab-rails/shared/pages | xargs -I {} -P 15 -n 1 sudo SSH_AUTH_SOCK=$SSH_AUTH_SOCK rsync -avh -e "ssh -oCompression=no" --rsync-path="sudo rsync" /var/opt/gitlab/gitlab-rails/shared/pages/{} $USER@pages.stor.gstg.gitlab.net:/var/opt/gitlab/gitlab-rails/shared/pages sudo ls -1 /var/opt/gitlab/gitlab-rails/shared/pages | xargs -I {} -P 15 -n 1 sudo SSH_AUTH_SOCK=$SSH_AUTH_SOCK rsync -avh -e "ssh -oCompression=no" --rsync-path="sudo rsync" /var/opt/gitlab/gitlab-rails/shared/pages/{} $USER@pages.stor.gstg.gitlab.net:/var/opt/gitlab/gitlab-rails/shared/pages
``` ```
* Production: * Production:
``` ```
ssh -A 10.70.2.161 # nfs-pages-01.stor.gitlab.com ssh -A 10.70.2.161 # nfs-pages-01.stor.gitlab.com
...@@ -455,7 +453,7 @@ of errors while it is being promoted. ...@@ -455,7 +453,7 @@ of errors while it is being promoted.
* Production: `knife ssh roles:gprd-base 'sudo gitlab-ctl status 2>/dev/null' | sort -k 3` * Production: `knife ssh roles:gprd-base 'sudo gitlab-ctl status 2>/dev/null' | sort -k 3`
* [ ] Unicorn * [ ] Unicorn
* [ ] Sidekiq * [ ] Sidekiq
* [ ] Gitlab Pages * [ ] Gitlab Pages
1. [ ] 🔪 {+ Chef-Runner +}: Fix the Geo node hostname for the old secondary 1. [ ] 🔪 {+ Chef-Runner +}: Fix the Geo node hostname for the old secondary
* This ensures we continue to generate Geo event logs for a time, maybe useful for last-gasp failback * This ensures we continue to generate Geo event logs for a time, maybe useful for last-gasp failback
* In a Rails console in GCP: * In a Rails console in GCP:
...@@ -547,7 +545,7 @@ unexpected ways. ...@@ -547,7 +545,7 @@ unexpected ways.
1. [ ] 🐺 {+Coordinator+}: **PRODUCTION ONLY** Ensure the secondary can send emails 1. [ ] 🐺 {+Coordinator+}: **PRODUCTION ONLY** Ensure the secondary can send emails
1. [ ] Run the following in a Rails console (changing `you` to yourself): `Notify.test_email("you+test@gitlab.com", "Test email", "test").deliver_now` 1. [ ] Run the following in a Rails console (changing `you` to yourself): `Notify.test_email("you+test@gitlab.com", "Test email", "test").deliver_now`
1. [ ] Ensure you receive the email 1. [ ] Ensure you receive the email
#### Phase 8: Reconfiguration, Part 2 #### Phase 8: Reconfiguration, Part 2
......
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
# shellcheck disable=SC1091,SC1090
source "${SCRIPT_DIR}/../../../workflow-script-commons.sh"
# --------------------------------------------------------------
PRODUCTION_ONLY
cat <<EOD
Open https://tweetdeck.twitter.com/ and tweet from @gitlab:
-------------------------------->8-------------------
Reminder: GitLab.com will be undergoing 2 hours maintenance tomorrow, from ${MAINTENANCE_START_TIME} - ${MAINTENANCE_END_TIME} UTC. Follow @gitlabstatus for more details. ${BLOG_POST_URL}
----------------------------------------------------
EOD
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
# shellcheck disable=SC1091,SC1090
source "${SCRIPT_DIR}/../../../workflow-script-commons.sh"
# --------------------------------------------------------------
cat <<EOD
Open https://tweetdeck.twitter.com/ and tweet from @gitlabstatus:
-------------------------------->8-------------------
Reminder: GitLab.com will be undergoing 2 hours maintenance tomorrow. We'll be live on YouTube. Working doc: ${GOOGLE_DOC_URL}, Blog: ${BLOG_POST_URL}
-------------------------------->8-------------------
EOD
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
# shellcheck disable=SC1091,SC1090
source "${SCRIPT_DIR}/../../../workflow-script-commons.sh"
# --------------------------------------------------------------
function send_slack() {
curl --fail --silent -X POST --data-urlencode 'payload={"text": "'"${1}"'"}' "$SLACK_WEBHOOK_URL"
}
case "${FAILOVER_ENVIRONMENT}" in
"prd")
send_slack "GitLab.com is being migrated to GCP at *${MAINTENANCE_START_TIME}* UTC. There is a 2-hour downtime window. We'll be live on YouTube. Notes in ${GOOGLE_DOC_URL}!"
;;
"stg")
send_slack "We're rehearsing the failover of GitLab.com at *${MAINTENANCE_START_TIME}* UTC by migrating staging.gitlab.com to GCP. Come watch us at ${ZOOM_LINK}! Notes in ${GOOGLE_DOC_URL}!"
;;
*)
die "Unknown environment"
;;
esac
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
ROOT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
ISSUE_TEMPLATES_DIR=${ROOT_DIR}/.gitlab/issue_templates
function find_script_ref() {
grep -Eho "\`./bin.*?\`" "${ISSUE_TEMPLATES_DIR}"/*.md|cut -d\` -f2|cut -d" " -f1|uniq
}
find_script_ref | while IFS='' read -r file; do
if ! [[ -f ${ROOT_DIR}/$file ]] || ! [[ -x ${ROOT_DIR}/$file ]]; then
>&2 echo "$file is missing or not executable"
grep -En "${file}" "${ISSUE_TEMPLATES_DIR}/*.md"
exit 1
fi
done
#!/usr/bin/env bash
# --------------------------------------------------------------------------------------------------------
# NB. Note: do not edit this file. Please copy this file to `source_vars` and override the defaults there
# --------------------------------------------------------------------------------------------------------
export FAILOVER_DATE="__REQUIRED__" # The date of the failover in YYYY-MM-DD format
export FAILOVER_ENVIRONMENT="__REQUIRED__" # Environment: This should be prd or std
export MAINTENANCE_START_TIME="__REQUIRED__" # Maintenance window start time in UTC. eg 13h00
export MAINTENANCE_END_TIME="__REQUIRED__" # Maintenance window end time in UTC. eg 15h00
export BLOG_POST_URL="https://about.gitlab.com/2018/07/19/gcp-move-update/" # Blog post URL
export GOOGLE_DOC_URL="https://docs.google.com/document/d/18vGk6dQs7L0oGQOb_bNiFa5JhwLq5WBS7oNxQy09ml8/edit" # Google Working Doc for the event
export SLACK_WEBHOOK_URL="__REQUIRED__" # This is available at https://gitlab.slack.com/services/BBZP9QKLZ
export ZOOM_LINK="https://gitlab.zoom.us/j/859814316" # Find this in the calendar
#!/usr/bin/env bash
# Everything is logged!
if [[ -z ${LOGGING_CONFIGURED:=} ]]; then
export LOGGING_CONFIGURED=1
# TODO(andrewn): once we have a destination, we can also tap these logs to another place
$0 "$@" 2>&1 | ruby -pe 'print Time.now.strftime("%Y-%m-%d %H:%M:%S.%L: ")'
exit
fi
function die() {
>&2 echo "Fatal:" "$@"
exit 1
}
function PRODUCTION_ONLY() {
if [[ $FAILOVER_ENVIRONMENT != "prd" ]]; then
die "This step is production only"
fi
}
# Basic support for OSX, helpful for testing
function gnu_date() {
if command -v gdate >/dev/null; then
gdate "$@"
else
date "$@"
fi
}
# Basic support for OSX, helpful for testing
function gnu_readlink() {
if command -v greadlink >/dev/null; then
greadlink "$@"
else
readlink "$@"
fi
}
function ensure_valid() {
grep -Eho '(\w+)="__REQUIRED__"' ./bin/source_vars_template.sh |cut -d= -f1 | while read -r i; do
if [[ ${!i:=__REQUIRED__} = "__REQUIRED__" ]]; then
die "Variable ${i} has not been configured. You may need to update your 'source_vars'"
fi
done
FAILOVER_DATE=$(gnu_date --date="$FAILOVER_DATE" "+%Y-%m-%d")
TODAY=$(gnu_date "+%Y-%m-%d")
if [[ "${FAILOVER_DATE}" < "${TODAY}" ]]; then
die "Failover date is in the past ${FAILOVER_DATE}. Have you updated 'source_vars'?"
fi
case $(hostname -f) in
"deploy.gitlab.com")
if [[ ${FAILOVER_ENVIRONMENT} != "prd" ]]; then
die "FAILOVER_ENVIRONMENT is ${FAILOVER_ENVIRONMENT}, but environment is detected as production. Have you updated 'source_vars'?"
fi
;;
"deploy.stg.gitlab.com")
if [[ ${FAILOVER_ENVIRONMENT} != "stg" ]]; then
die "FAILOVER_ENVIRONMENT is ${FAILOVER_ENVIRONMENT}, but environment is detected as staging. Have you updated 'source_vars'?"
fi
;;
*)
if [[ ${SKIP_HOST_CHECK:=} != "true" ]]; then
die "Unknown host: please run this from a deploy host "
fi
;;
esac
}
function header() {
local full_path
full_path=$(gnu_readlink -f "${BASH_SOURCE[2]}")
cat <<EOD
========================================================
Script: $full_path
User: ${SUDO_USER:=$USER}
Rev: $(git rev-parse --short HEAD)
========================================================
EOD
}
function footer() {
cat <<EOD
--------------------------------------------------------
Exit Status: $?
--------------------------------------------------------
EOD
}
header
trap "footer" EXIT
SOURCE_VARS_DIR=$(dirname "${BASH_SOURCE[0]}")
if ! [[ -f "${SOURCE_VARS_DIR}/source_vars" ]]; then
target=$(gnu_readlink -f "${SOURCE_VARS_DIR}/source_vars")
source=$(gnu_readlink -f "${SOURCE_VARS_DIR}/source_vars_template.sh")
die "${target} not found. Please initialise by with 'cp ${source} ${target}'"
fi
# Load the defaults
# shellcheck disable=SC1091,SC1090
source "${SOURCE_VARS_DIR}/source_vars_template.sh"
# Load the specific values
# shellcheck disable=SC1091,SC1090
source "${SOURCE_VARS_DIR}/source_vars"
ensure_valid
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment