Make "bin/repository thaw" workflow more clear when devices are disabled
Summary:
Ref T13216. See PHI943. If autoscale lightning strikes all your servers at once and destroys them, the path to recovery can be unclear. You're "supposed" to:
- demote all the devices;
- disable the bindings;
- bind the new servers;
- put whatever working copies you can scrape up back on disk;
- promote one of the new servers.
However, the documentation is a bit misleading (it was sort of written with "you lost one or two devices" in mind, not "you lost every device") and demote-before-disable is unnecessary and slightly risky if servers come back online. There's also a missing guardrail before the promote step which lets you accidentally skip the demotion step and end up in a confusing state. Instead:
- Add a guard rail: when you try to promote a new server, warn if inactive devices still have versions and tell the user to demote them.
- Allow demotion of inactive devices: the order "disable, demote" is safer and more intuitive than "demote, disable" and there's no reason to require the unintuitive order.
- Make the "cluster already has leaders" message more clear.
- Make the documentation more clear.
Test Plan:
- Bound a repository to two devices.
- Wrote to A to make it a leader, then disabled it (simulating a lightning strike).
- Tried to promote B. Got a new, useful error ("demote A first").
- Demoted A (before: error about demoting inactive devices; now: works fine).
- Promoted B. This worked.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13216
Differential Revision: https://secure.phabricator.com/D19793
This commit is contained in:
@@ -414,28 +414,24 @@ present on the leaders but not present on the followers by examining the
|
||||
push logs.
|
||||
|
||||
If you are comfortable discarding these changes, you can instruct Phabricator
|
||||
that it can forget about the leaders in two ways: disable the service bindings
|
||||
to all of the leader devices so they are no longer part of the cluster, or use
|
||||
`bin/repository thaw` to `--demote` the leaders explicitly.
|
||||
that it can forget about the leaders by doing this:
|
||||
|
||||
If you do this, **you will lose data**. Either action will discard any changes
|
||||
on the affected leaders which have not replicated to other devices in the
|
||||
cluster.
|
||||
- Disable the service bindings to all of the leader devices so they are no
|
||||
longer part of the cluster.
|
||||
- Then, use `bin/repository thaw` to `--demote` the leaders explicitly.
|
||||
|
||||
To remove a device from the cluster, disable all of the bindings to it
|
||||
in Almanac, using the web UI.
|
||||
|
||||
{icon exclamation-triangle, color="red"} Any data which is only present on
|
||||
the disabled device will be lost.
|
||||
|
||||
To demote a device without removing it from the cluster, run this command:
|
||||
To demote a device, run this command:
|
||||
|
||||
```
|
||||
phabricator/ $ ./bin/repository thaw rXYZ --demote repo002.corp.net
|
||||
```
|
||||
|
||||
{icon exclamation-triangle, color="red"} Any data which is only present on
|
||||
**this** device will be lost.
|
||||
the demoted device will be lost.
|
||||
|
||||
If you do this, **you will lose unreplicated data**. You will discard any
|
||||
changes on the affected leaders which have not replicated to other devices
|
||||
in the cluster.
|
||||
|
||||
|
||||
Ambiguous Leaders
|
||||
|
||||
Reference in New Issue
Block a user