Make "bin/repository thaw" workflow more clear when devices are disabled
Summary:
Ref T13216. See PHI943. If autoscale lightning strikes all your servers at once and destroys them, the path to recovery can be unclear. You're "supposed" to:
  - demote all the devices;
  - disable the bindings;
  - bind the new servers;
  - put whatever working copies you can scrape up back on disk;
  - promote one of the new servers.
However, the documentation is a bit misleading (it was sort of written with "you lost one or two devices" in mind, not "you lost every device") and demote-before-disable is unnecessary and slightly risky if servers come back online. There's also a missing guardrail before the promote step which lets you accidentally skip the demotion step and end up in a confusing state. Instead:
  - Add a guard rail: when you try to promote a new server, warn if inactive devices still have versions and tell the user to demote them.
  - Allow demotion of inactive devices: the order "disable, demote" is safer and more intuitive than "demote, disable" and there's no reason to require the unintuitive order.
  - Make the "cluster already has leaders" message more clear.
  - Make the documentation more clear.
Test Plan:
  - Bound a repository to two devices.
  - Wrote to A to make it a leader, then disabled it (simulating a lightning strike).
  - Tried to promote B. Got a new, useful error ("demote A first").
  - Demoted A (before: error about demoting inactive devices; now: works fine).
  - Promoted B. This worked.
Reviewers: amckinley
Reviewed By: amckinley
Maniphest Tasks: T13216
Differential Revision: https://secure.phabricator.com/D19793
			
			
This commit is contained in:
		| @@ -120,33 +120,71 @@ final class PhabricatorRepositoryManagementThawWorkflow | |||||||
|               $repository->getDisplayName())); |               $repository->getDisplayName())); | ||||||
|         } |         } | ||||||
|  |  | ||||||
|  |         if ($promote) { | ||||||
|  |           // You can only promote active devices. (You may demote active or | ||||||
|  |           // inactive devices.) | ||||||
|           $bindings = $service->getActiveBindings(); |           $bindings = $service->getActiveBindings(); | ||||||
|           $bindings = mpull($bindings, null, 'getDevicePHID'); |           $bindings = mpull($bindings, null, 'getDevicePHID'); | ||||||
|           if (empty($bindings[$device->getPHID()])) { |           if (empty($bindings[$device->getPHID()])) { | ||||||
|             throw new PhutilArgumentUsageException( |             throw new PhutilArgumentUsageException( | ||||||
|               pht( |               pht( | ||||||
|                 'Repository "%s" has no active binding to device "%s". Only '. |                 'Repository "%s" has no active binding to device "%s". Only '. | ||||||
|               'actively bound devices can be promoted or demoted.', |                 'actively bound devices can be promoted.', | ||||||
|                 $repository->getDisplayName(), |                 $repository->getDisplayName(), | ||||||
|                 $device->getName())); |                 $device->getName())); | ||||||
|           } |           } | ||||||
|  |  | ||||||
|           $versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions( |           $versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions( | ||||||
|             $repository->getPHID()); |             $repository->getPHID()); | ||||||
|  |  | ||||||
|           $versions = mpull($versions, null, 'getDevicePHID'); |           $versions = mpull($versions, null, 'getDevicePHID'); | ||||||
|         $versions = array_select_keys($versions, array_keys($bindings)); |  | ||||||
|  |  | ||||||
|         if ($versions && $promote) { |           // Before we promote, make sure there are no outstanding versions on | ||||||
|           throw new PhutilArgumentUsageException( |           // devices with inactive bindings. If there are, you need to demote | ||||||
|             pht( |           // these first. | ||||||
|               'Unable to promote "%s" for repository "%s": the leaders for '. |           $inactive = array(); | ||||||
|               'this cluster are not ambiguous.', |           foreach ($versions as $device_phid => $version) { | ||||||
|               $device->getName(), |             if (isset($bindings[$device_phid])) { | ||||||
|               $repository->getDisplayName())); |               continue; | ||||||
|  |             } | ||||||
|  |             $inactive[$device_phid] = $version; | ||||||
|  |           } | ||||||
|  |  | ||||||
|  |           if ($inactive) { | ||||||
|  |             $handles = $viewer->loadHandles(array_keys($inactive)); | ||||||
|  |  | ||||||
|  |             $handle_list = iterator_to_array($handles); | ||||||
|  |             $handle_list = mpull($handle_list, 'getName'); | ||||||
|  |             $handle_list = implode(', ', $handle_list); | ||||||
|  |  | ||||||
|  |             throw new PhutilArgumentUsageException( | ||||||
|  |               pht( | ||||||
|  |                 'Repository "%s" has versions on inactive devices. Demote '. | ||||||
|  |                 '(or reactivate) these devices before promoting a new '. | ||||||
|  |                 'leader: %s.', | ||||||
|  |                 $repository->getDisplayName(), | ||||||
|  |                 $handle_list)); | ||||||
|  |           } | ||||||
|  |  | ||||||
|  |           // Now, make sure there are no outstanding versions on devices with | ||||||
|  |           // active bindings. These also need to be demoted (or promoting is a | ||||||
|  |           // mistake or already happened). | ||||||
|  |           $active = array_select_keys($versions, array_keys($bindings)); | ||||||
|  |           if ($active) { | ||||||
|  |             $handles = $viewer->loadHandles(array_keys($active)); | ||||||
|  |  | ||||||
|  |             $handle_list = iterator_to_array($handles); | ||||||
|  |             $handle_list = mpull($handle_list, 'getName'); | ||||||
|  |             $handle_list = implode(', ', $handle_list); | ||||||
|  |  | ||||||
|  |             throw new PhutilArgumentUsageException( | ||||||
|  |               pht( | ||||||
|  |                 'Unable to promote "%s" for repository "%s" because this '. | ||||||
|  |                 'cluster already has one or more unambiguous leaders: %s.', | ||||||
|  |                 $device->getName(), | ||||||
|  |                 $repository->getDisplayName(), | ||||||
|  |                 $handle_list)); | ||||||
|           } |           } | ||||||
|  |  | ||||||
|         if ($promote) { |  | ||||||
|           PhabricatorRepositoryWorkingCopyVersion::updateVersion( |           PhabricatorRepositoryWorkingCopyVersion::updateVersion( | ||||||
|             $repository->getPHID(), |             $repository->getPHID(), | ||||||
|             $device->getPHID(), |             $device->getPHID(), | ||||||
|   | |||||||
| @@ -414,28 +414,24 @@ present on the leaders but not present on the followers by examining the | |||||||
| push logs. | push logs. | ||||||
|  |  | ||||||
| If you are comfortable discarding these changes, you can instruct Phabricator | If you are comfortable discarding these changes, you can instruct Phabricator | ||||||
| that it can forget about the leaders in two ways: disable the service bindings | that it can forget about the leaders by doing this: | ||||||
| to all of the leader devices so they are no longer part of the cluster, or use |  | ||||||
| `bin/repository thaw` to `--demote` the leaders explicitly. |  | ||||||
|  |  | ||||||
| If you do this, **you will lose data**. Either action will discard any changes |   - Disable the service bindings to all of the leader devices so they are no | ||||||
| on the affected leaders which have not replicated to other devices in the |     longer part of the cluster. | ||||||
| cluster. |   - Then, use `bin/repository thaw` to `--demote` the leaders explicitly. | ||||||
|  |  | ||||||
| To remove a device from the cluster, disable all of the bindings to it | To demote a device, run this command: | ||||||
| in Almanac, using the web UI. |  | ||||||
|  |  | ||||||
| {icon exclamation-triangle, color="red"} Any data which is only present on |  | ||||||
| the disabled device will be lost. |  | ||||||
|  |  | ||||||
| To demote a device without removing it from the cluster, run this command: |  | ||||||
|  |  | ||||||
| ``` | ``` | ||||||
| phabricator/ $ ./bin/repository thaw rXYZ --demote repo002.corp.net | phabricator/ $ ./bin/repository thaw rXYZ --demote repo002.corp.net | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| {icon exclamation-triangle, color="red"} Any data which is only present on | {icon exclamation-triangle, color="red"} Any data which is only present on | ||||||
| **this** device will be lost. | the demoted device will be lost. | ||||||
|  |  | ||||||
|  | If you do this, **you will lose unreplicated data**. You will discard any | ||||||
|  | changes on the affected leaders which have not replicated to other devices | ||||||
|  | in the cluster. | ||||||
|  |  | ||||||
|  |  | ||||||
| Ambiguous Leaders | Ambiguous Leaders | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user
	 epriestley
					epriestley