Geo - Re-run repository verification for both primary and secondary
Summary
During the Saturday (2018-07-21) maint window we should use this 1-hour of downtime to see if we can get the number of repository mismatches as close to zero as possible. To accomplish this we should mark all repositories as unverified and re-run the verification for both primary and secondary on Friday (2018-07-20).
Steps
-
Turn the feature flag off:
-
Disable the feature flag on the primary node:
Feature.disable('geo_repository_verification')
-
Disable the feature flag on the secondary node:
Feature.disable('geo_repository_verification')
-
Check if the feature flag is disabled on both primary and secondary nodes:
Gitlab::Geo.repository_verification_enabled?
-
-
Reset the repository checksum on the primary node:
ProjectRepositoryState.all.in_batches(of: 10_000) { |relation| relation.update_all(repository_verification_checksum: nil, wiki_verification_checksum: nil, last_repository_verification_failure: nil, last_wiki_verification_failure: nil); sleep(5) }
-
Wait for step 2 to finish.
-
Reset the repository verification on the secondary node:
Geo::ProjectRegistry.all.in_batches(of: 10_000) { |relation| relation.update_all(repository_verification_checksum_sha: nil, wiki_verification_checksum_sha: nil, last_repository_verification_failure: nil, last_wiki_verification_failure: nil, repository_checksum_mismatch: false, wiki_checksum_mismatch: false); sleep(5) }
-
Turn the feature flag on:
-
Enable the feature flag on the primary node:
Feature.enable('geo_repository_verification')
-
Enable the feature flag on the secondary node:
Feature.enable('geo_repository_verification')
-
Check if the feature flag is enabled on both primary and secondary nodes:
Gitlab::Geo.repository_verification_enabled?
-
Monitoring
- Sign in as admin on GitLab.com the visit
/admin/geo_nodes
. Expand theVerification information
section for both primary and secondary. - Dashboards:
- Azure: https://performance.gitlab.net/d/000000286/gcp-failover-azure?orgId=1&var-environment=prd&from=now-6h&to=now&refresh=5s
- GPRD: https://dashboards.gitlab.net/d/YoKVGxSmk/gcp-failover-gcp?orgId=1&from=now-6h&to=now&refresh=5s&var-environment=gprd&var-prometheus=prometheus-01-inf-gprd&var-app_prometheus=prometheus-app-01-inf-gprd&var-azure_environment=prd&panelId=15