Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
M
migration
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Incidents
Environments
Packages & Registries
Packages & Registries
Container Registry
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
gitlab-com
migration
Commits
61e9ad84
Verified
Commit
61e9ad84
authored
Aug 07, 2018
by
Nick Thomas
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Feedback from the 2018-08-07 failover attempt
parent
255c9186
Pipeline
#88817
passed with stage
in 28 seconds
Changes
2
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
10 additions
and
9 deletions
+10
-9
.gitlab/issue_templates/failover.md
.gitlab/issue_templates/failover.md
+9
-8
bin/scripts/02_failover/060_go/p02/030-await-sidekiq-drain.rb
...scripts/02_failover/060_go/p02/030-await-sidekiq-drain.rb
+1
-1
No files found.
.gitlab/issue_templates/failover.md
View file @
61e9ad84
...
...
@@ -177,7 +177,7 @@ an hour before the scheduled maintenance window.
1.
[ ] 🔪 {+ Chef-Runner +}: Stop automatic incremental GitLab Pages sync
*
Disable the cronjob on the
**Azure**
pages NFS server
*
This cronjob is found on the Pages Azure NFS server. The IPs are shown in the next step
*
`sudo crontab -e`
to get an editor window, comment out the line involving
rsync
*
`sudo crontab -e`
to get an editor window, comment out the line involving
a pages-sync script
1.
[ ] 🔪 {+ Chef-Runner +}: Start parallelized, incremental GitLab Pages sync
*
Expected to take ~30 minutes, run in screen/tmux! On the
**Azure**
pages NFS server!
*
Updates to pages after the transfer starts will be lost.
...
...
@@ -289,6 +289,7 @@ Running CI jobs will no longer be able to push updates. Jobs that complete now m
*
In a separate terminal on the deploy host:
`/opt/gitlab-migration/migration/bin/scripts/02_failover/060_go/p02/030-await-sidekiq-drain.sh`
*
The
`geo_sidekiq_cron_config`
job or an RSS kill may re-enable the crons, which is why we run it in a loop
*
The loop should be stopped once sidekiq is shut down
*
Wait for
`--> Status: PROCEED`
1.
[ ] 🐺 {+ Coordinator +}: Wait for repository verification on the
**primary**
to complete
*
Staging: https://staging.gitlab.com/admin/geo_nodes -
`staging.gitlab.com`
node
*
Production: https://gitlab.com/admin/geo_nodes -
`gitlab.com`
node
...
...
@@ -325,9 +326,9 @@ state of the secondary to converge.
#### Phase 3: Draining [📁](bin/scripts/02_failover/060_go/p03)
1.
[
] 🐺 {+ Coordinator +}:
Ensure any data not replicated by Geo is replicated manually. We know about [these
](
https://docs.gitlab.com/ee/administration/geo/replication/index.html#examples-of-unreplicated-data
)
:
*
[ ] CI traces in Redis
*
Run
`::Ci::BuildTraceChunk.redis.find_each(batch_size: 10, &:use_database!)`
1.
[ ] 🐺 {+ Coordinator +}:
Flush CI traces in Redis to the database
*
In a Rails console in Azure:
*
`::Ci::BuildTraceChunk.redis.find_each(batch_size: 10, &:use_database!)`
1.
[ ] 🐺 {+ Coordinator +}: Wait for all repositories and wikis to become synchronized
*
Staging: https://gstg.gitlab.com/admin/geo_nodes
*
Production: https://gprd.gitlab.com/admin/geo_nodes
...
...
@@ -355,10 +356,10 @@ state of the secondary to converge.
*
The loop should be stopped once sidekiq is shut down
*
The
`geo_sidekiq_cron_config`
job or an RSS kill may re-enable the crons, which is why we run it in a loop
1.
[ ] 🐺 {+ Coordinator +}: Wait for all Sidekiq jobs to complete on the secondary
*
Review status of the running Sidekiq monitor script started in
[
phase 2, above
](
#phase-2-commence-shutdown-in-azure-
)
, wait for
`--> Status: PROCEED`
*
Need more details?
*
Staging: Navigate to
[
https://gstg.gitlab.com/admin/background_jobs
](
https://gstg.gitlab.com/admin/background_jobs
)
*
Production: Navigate to
[
https://gprd.gitlab.com/admin/background_jobs
](
https://gprd.gitlab.com/admin/background_jobs
)
*
Staging: Navigate to
[
https://gstg.gitlab.com/admin/background_jobs
](
https://gstg.gitlab.com/admin/background_jobs
)
*
Production: Navigate to
[
https://gprd.gitlab.com/admin/background_jobs
](
https://gprd.gitlab.com/admin/background_jobs
)
*
`Busy`
,
`Enqueued`
,
`Scheduled`
, and
`Retry`
should all be 0
*
If a
`geo_metrics_update`
job is running, that can be ignored
1.
[ ] 🔪 {+ Chef-Runner +}: Stop sidekiq in GCP
*
This ensures the postgresql promotion can happen and gives a better guarantee of sidekiq consistency
*
Staging:
`knife ssh roles:gstg-base-be-sidekiq "sudo gitlab-ctl stop sidekiq-cluster"`
...
...
bin/scripts/02_failover/060_go/p02/030-await-sidekiq-drain.rb
View file @
61e9ad84
...
...
@@ -13,7 +13,7 @@ if ENV["FAILOVER_ENVIRONMENT"] == "stg" || `hostname -f` == "deploy.stg.gitlab.c
$purge_allowed
<<
"background_migration"
end
$dry_run
=
tru
e
$dry_run
=
fals
e
def
queue_can_be_purged
(
queue_name
)
# Make sure that the geo crons are not included in this list...
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment