2016-04-24 09:04:27 -07:00
|
|
|
<?php
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Manages repository synchronization for cluster repositories.
|
|
|
|
|
*
|
|
|
|
|
* @task config Configuring Synchronization
|
|
|
|
|
* @task sync Cluster Synchronization
|
|
|
|
|
* @task internal Internals
|
|
|
|
|
*/
|
|
|
|
|
final class DiffusionRepositoryClusterEngine extends Phobject {
|
|
|
|
|
|
|
|
|
|
private $repository;
|
|
|
|
|
private $viewer;
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
private $logger;
|
|
|
|
|
|
2016-04-24 09:04:27 -07:00
|
|
|
private $clusterWriteLock;
|
|
|
|
|
private $clusterWriteVersion;
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
private $clusterWriteOwner;
|
2016-04-24 09:04:27 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
/* -( Configuring Synchronization )---------------------------------------- */
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
public function setRepository(PhabricatorRepository $repository) {
|
|
|
|
|
$this->repository = $repository;
|
|
|
|
|
return $this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public function getRepository() {
|
|
|
|
|
return $this->repository;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public function setViewer(PhabricatorUser $viewer) {
|
|
|
|
|
$this->viewer = $viewer;
|
|
|
|
|
return $this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
public function getViewer() {
|
|
|
|
|
return $this->viewer;
|
|
|
|
|
}
|
|
|
|
|
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
public function setLog(DiffusionRepositoryClusterEngineLogInterface $log) {
|
|
|
|
|
$this->logger = $log;
|
|
|
|
|
return $this;
|
|
|
|
|
}
|
|
|
|
|
|
2016-04-24 09:04:27 -07:00
|
|
|
|
|
|
|
|
/* -( Cluster Synchronization )-------------------------------------------- */
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Synchronize repository version information after creating a repository.
|
|
|
|
|
*
|
|
|
|
|
* This initializes working copy versions for all currently bound devices to
|
|
|
|
|
* 0, so that we don't get stuck making an ambiguous choice about which
|
|
|
|
|
* devices are leaders when we later synchronize before a read.
|
|
|
|
|
*
|
|
|
|
|
* @task sync
|
|
|
|
|
*/
|
|
|
|
|
public function synchronizeWorkingCopyAfterCreation() {
|
Don't require a device be registered in Almanac to do cluster init/resync steps
Summary:
Fixes T12893. See also PHI15. This is complicated but:
- In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
- We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
- If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.
Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.
If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.
Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:
```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
return null;
}
```
Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12893
Differential Revision: https://secure.phabricator.com/D18273
2017-07-24 11:37:04 -07:00
|
|
|
if (!$this->shouldEnableSynchronization(false)) {
|
2016-04-24 09:04:27 -07:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
$repository_phid = $repository->getPHID();
|
|
|
|
|
|
|
|
|
|
$service = $repository->loadAlmanacService();
|
|
|
|
|
if (!$service) {
|
|
|
|
|
throw new Exception(pht('Failed to load repository cluster service.'));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$bindings = $service->getActiveBindings();
|
|
|
|
|
foreach ($bindings as $binding) {
|
|
|
|
|
PhabricatorRepositoryWorkingCopyVersion::updateVersion(
|
|
|
|
|
$repository_phid,
|
|
|
|
|
$binding->getDevicePHID(),
|
|
|
|
|
0);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return $this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
/**
|
|
|
|
|
* @task sync
|
|
|
|
|
*/
|
|
|
|
|
public function synchronizeWorkingCopyAfterHostingChange() {
|
Don't require a device be registered in Almanac to do cluster init/resync steps
Summary:
Fixes T12893. See also PHI15. This is complicated but:
- In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
- We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
- If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.
Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.
If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.
Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:
```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
return null;
}
```
Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12893
Differential Revision: https://secure.phabricator.com/D18273
2017-07-24 11:37:04 -07:00
|
|
|
if (!$this->shouldEnableSynchronization(false)) {
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
$repository_phid = $repository->getPHID();
|
|
|
|
|
|
|
|
|
|
$versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions(
|
|
|
|
|
$repository_phid);
|
|
|
|
|
$versions = mpull($versions, null, 'getDevicePHID');
|
|
|
|
|
|
|
|
|
|
// After converting a hosted repository to observed, or vice versa, we
|
|
|
|
|
// need to reset version numbers because the clocks for observed and hosted
|
|
|
|
|
// repositories run on different units.
|
|
|
|
|
|
|
|
|
|
// We identify all the cluster leaders and reset their version to 0.
|
|
|
|
|
// We identify all the cluster followers and demote them.
|
|
|
|
|
|
2016-09-05 10:34:34 -07:00
|
|
|
// This allows the cluster to start over again at version 0 but keep the
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
// same leaders.
|
|
|
|
|
|
|
|
|
|
if ($versions) {
|
|
|
|
|
$max_version = (int)max(mpull($versions, 'getRepositoryVersion'));
|
|
|
|
|
foreach ($versions as $version) {
|
|
|
|
|
$device_phid = $version->getDevicePHID();
|
|
|
|
|
|
|
|
|
|
if ($version->getRepositoryVersion() == $max_version) {
|
|
|
|
|
PhabricatorRepositoryWorkingCopyVersion::updateVersion(
|
|
|
|
|
$repository_phid,
|
|
|
|
|
$device_phid,
|
|
|
|
|
0);
|
|
|
|
|
} else {
|
|
|
|
|
PhabricatorRepositoryWorkingCopyVersion::demoteDevice(
|
|
|
|
|
$repository_phid,
|
|
|
|
|
$device_phid);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return $this;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
2016-04-24 09:04:27 -07:00
|
|
|
/**
|
|
|
|
|
* @task sync
|
|
|
|
|
*/
|
|
|
|
|
public function synchronizeWorkingCopyBeforeRead() {
|
Don't require a device be registered in Almanac to do cluster init/resync steps
Summary:
Fixes T12893. See also PHI15. This is complicated but:
- In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
- We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
- If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.
Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.
If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.
Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:
```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
return null;
}
```
Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12893
Differential Revision: https://secure.phabricator.com/D18273
2017-07-24 11:37:04 -07:00
|
|
|
if (!$this->shouldEnableSynchronization(true)) {
|
2016-04-24 09:04:27 -07:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
$repository_phid = $repository->getPHID();
|
|
|
|
|
|
|
|
|
|
$device = AlmanacKeys::getLiveDevice();
|
|
|
|
|
$device_phid = $device->getPHID();
|
|
|
|
|
|
|
|
|
|
$read_lock = PhabricatorRepositoryWorkingCopyVersion::getReadLock(
|
|
|
|
|
$repository_phid,
|
|
|
|
|
$device_phid);
|
|
|
|
|
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
$lock_wait = phutil_units('2 minutes in seconds');
|
|
|
|
|
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
2018-03-21 17:51:14 -07:00
|
|
|
'Acquiring read lock for repository "%s" on device "%s"...',
|
|
|
|
|
$repository->getDisplayName(),
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
$device->getName()));
|
|
|
|
|
|
|
|
|
|
try {
|
|
|
|
|
$start = PhabricatorTime::getNow();
|
|
|
|
|
$read_lock->lock($lock_wait);
|
|
|
|
|
$waited = (PhabricatorTime::getNow() - $start);
|
|
|
|
|
|
|
|
|
|
if ($waited) {
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'Acquired read lock after %s second(s).',
|
|
|
|
|
new PhutilNumber($waited)));
|
|
|
|
|
} else {
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'Acquired read lock immediately.'));
|
|
|
|
|
}
|
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
|
throw new PhutilProxyException(
|
|
|
|
|
pht(
|
|
|
|
|
'Failed to acquire read lock after waiting %s second(s). You '.
|
|
|
|
|
'may be able to retry later.',
|
2016-11-22 10:16:47 -08:00
|
|
|
new PhutilNumber($lock_wait)),
|
|
|
|
|
$ex);
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
}
|
2016-04-24 09:04:27 -07:00
|
|
|
|
|
|
|
|
$versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions(
|
|
|
|
|
$repository_phid);
|
|
|
|
|
$versions = mpull($versions, null, 'getDevicePHID');
|
|
|
|
|
|
|
|
|
|
$this_version = idx($versions, $device_phid);
|
|
|
|
|
if ($this_version) {
|
|
|
|
|
$this_version = (int)$this_version->getRepositoryVersion();
|
|
|
|
|
} else {
|
|
|
|
|
$this_version = -1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if ($versions) {
|
|
|
|
|
// This is the normal case, where we have some version information and
|
|
|
|
|
// can identify which nodes are leaders. If the current node is not a
|
|
|
|
|
// leader, we want to fetch from a leader and then update our version.
|
|
|
|
|
|
|
|
|
|
$max_version = (int)max(mpull($versions, 'getRepositoryVersion'));
|
|
|
|
|
if ($max_version > $this_version) {
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
if ($repository->isHosted()) {
|
|
|
|
|
$fetchable = array();
|
|
|
|
|
foreach ($versions as $version) {
|
|
|
|
|
if ($version->getRepositoryVersion() == $max_version) {
|
|
|
|
|
$fetchable[] = $version->getDevicePHID();
|
|
|
|
|
}
|
2016-04-24 09:04:27 -07:00
|
|
|
}
|
|
|
|
|
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
$this->synchronizeWorkingCopyFromDevices($fetchable);
|
|
|
|
|
} else {
|
|
|
|
|
$this->synchornizeWorkingCopyFromRemote();
|
|
|
|
|
}
|
2016-04-24 09:04:27 -07:00
|
|
|
|
|
|
|
|
PhabricatorRepositoryWorkingCopyVersion::updateVersion(
|
|
|
|
|
$repository_phid,
|
|
|
|
|
$device_phid,
|
|
|
|
|
$max_version);
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
} else {
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'Device "%s" is already a cluster leader and does not need '.
|
|
|
|
|
'to be synchronized.',
|
|
|
|
|
$device->getName()));
|
2016-04-24 09:04:27 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$result_version = $max_version;
|
|
|
|
|
} else {
|
|
|
|
|
// If no version records exist yet, we need to be careful, because we
|
|
|
|
|
// can not tell which nodes are leaders.
|
|
|
|
|
|
|
|
|
|
// There might be several nodes with arbitrary existing data, and we have
|
|
|
|
|
// no way to tell which one has the "right" data. If we pick wrong, we
|
|
|
|
|
// might erase some or all of the data in the repository.
|
|
|
|
|
|
2017-10-09 10:48:01 -07:00
|
|
|
// Since this is dangerous, we refuse to guess unless there is only one
|
2016-04-24 09:04:27 -07:00
|
|
|
// device. If we're the only device in the group, we obviously must be
|
|
|
|
|
// a leader.
|
|
|
|
|
|
|
|
|
|
$service = $repository->loadAlmanacService();
|
|
|
|
|
if (!$service) {
|
|
|
|
|
throw new Exception(pht('Failed to load repository cluster service.'));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$bindings = $service->getActiveBindings();
|
|
|
|
|
$device_map = array();
|
|
|
|
|
foreach ($bindings as $binding) {
|
|
|
|
|
$device_map[$binding->getDevicePHID()] = true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (count($device_map) > 1) {
|
|
|
|
|
throw new Exception(
|
|
|
|
|
pht(
|
|
|
|
|
'Repository "%s" exists on more than one device, but no device '.
|
|
|
|
|
'has any repository version information. Phabricator can not '.
|
Provide more useful guidance if a repository is clusterized into an existing multi-device cluster
Summary:
Fixes T12087. When transitioning into a clustered configuration for the first time, the documentation recommends using a one-device cluster as a transitional step.
However, installs may not do this for whatever reason, and we aren't as clear as we could be in warning about clusterizing directly into a multi-device cluster.
Roughly, when you do this, we end up believing that working copies exist on several different devices, but have no information about which copy or copies are up to date. //Usually// they all were already synchronized and are all up to date, but we can't make this assumption safely without risking data.
Instead, we err on the side of caution, and require a human to tell us which copy we should consider to be up-to-date, using `bin/repository thaw --promote`.
Test Plan:
```
$ ./bin/repository clusterize rLOCKS --service repos001.phacility.net
Service "repos001.phacility.net" is actively bound to more than one device
(local002.local, local001.phacility.net).
If you clusterize a repository onto this service it will be unclear which
devices have up-to-date copies of the repository. This leader/follower
ambiguity will freeze the repository. You may need to manually promote a
device to unfreeze it. See "Ambiguous Leaders" in the documentation for
discussion.
Continue anyway? [y/N]
```
Read other changes.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12087
Differential Revision: https://secure.phabricator.com/D17169
2017-01-10 12:21:22 -08:00
|
|
|
'guess which copy of the existing data is authoritative. Promote '.
|
2017-10-09 10:48:01 -07:00
|
|
|
'a device or see "Ambiguous Leaders" in the documentation.',
|
2016-04-24 09:04:27 -07:00
|
|
|
$repository->getDisplayName()));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (empty($device_map[$device->getPHID()])) {
|
|
|
|
|
throw new Exception(
|
|
|
|
|
pht(
|
|
|
|
|
'Repository "%s" is being synchronized on device "%s", but '.
|
|
|
|
|
'this device is not bound to the corresponding cluster '.
|
|
|
|
|
'service ("%s").',
|
|
|
|
|
$repository->getDisplayName(),
|
|
|
|
|
$device->getName(),
|
|
|
|
|
$service->getName()));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// The current device is the only device in service, so it must be a
|
|
|
|
|
// leader. We can safely have any future nodes which come online read
|
|
|
|
|
// from it.
|
|
|
|
|
PhabricatorRepositoryWorkingCopyVersion::updateVersion(
|
|
|
|
|
$repository_phid,
|
|
|
|
|
$device_phid,
|
|
|
|
|
0);
|
|
|
|
|
|
|
|
|
|
$result_version = 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$read_lock->unlock();
|
|
|
|
|
|
|
|
|
|
return $result_version;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* @task sync
|
|
|
|
|
*/
|
|
|
|
|
public function synchronizeWorkingCopyBeforeWrite() {
|
Don't require a device be registered in Almanac to do cluster init/resync steps
Summary:
Fixes T12893. See also PHI15. This is complicated but:
- In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
- We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
- If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.
Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.
If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.
Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:
```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
return null;
}
```
Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12893
Differential Revision: https://secure.phabricator.com/D18273
2017-07-24 11:37:04 -07:00
|
|
|
if (!$this->shouldEnableSynchronization(true)) {
|
2016-04-24 09:04:27 -07:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
$viewer = $this->getViewer();
|
|
|
|
|
|
|
|
|
|
$repository_phid = $repository->getPHID();
|
|
|
|
|
|
|
|
|
|
$device = AlmanacKeys::getLiveDevice();
|
|
|
|
|
$device_phid = $device->getPHID();
|
|
|
|
|
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
$table = new PhabricatorRepositoryWorkingCopyVersion();
|
|
|
|
|
$locked_connection = $table->establishConnection('w');
|
|
|
|
|
|
2016-04-24 09:04:27 -07:00
|
|
|
$write_lock = PhabricatorRepositoryWorkingCopyVersion::getWriteLock(
|
|
|
|
|
$repository_phid);
|
|
|
|
|
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
$write_lock->useSpecificConnection($locked_connection);
|
|
|
|
|
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
2018-03-21 17:51:14 -07:00
|
|
|
'Acquiring write lock for repository "%s"...',
|
|
|
|
|
$repository->getDisplayName()));
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
|
2018-03-21 17:51:14 -07:00
|
|
|
$lock_wait = phutil_units('2 minutes in seconds');
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
try {
|
|
|
|
|
$start = PhabricatorTime::getNow();
|
2018-03-21 17:51:14 -07:00
|
|
|
$step_wait = 1;
|
|
|
|
|
|
|
|
|
|
while (true) {
|
|
|
|
|
try {
|
|
|
|
|
$write_lock->lock((int)floor($step_wait));
|
|
|
|
|
break;
|
|
|
|
|
} catch (PhutilLockException $ex) {
|
|
|
|
|
$waited = (PhabricatorTime::getNow() - $start);
|
|
|
|
|
if ($waited > $lock_wait) {
|
|
|
|
|
throw $ex;
|
|
|
|
|
}
|
|
|
|
|
$this->logActiveWriter($viewer, $repository);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Wait a little longer before the next message we print.
|
|
|
|
|
$step_wait = $step_wait + 0.5;
|
|
|
|
|
$step_wait = min($step_wait, 3);
|
|
|
|
|
}
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
|
2018-03-21 17:51:14 -07:00
|
|
|
$waited = (PhabricatorTime::getNow() - $start);
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
if ($waited) {
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'Acquired write lock after %s second(s).',
|
|
|
|
|
new PhutilNumber($waited)));
|
|
|
|
|
} else {
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'Acquired write lock immediately.'));
|
|
|
|
|
}
|
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
|
throw new PhutilProxyException(
|
|
|
|
|
pht(
|
|
|
|
|
'Failed to acquire write lock after waiting %s second(s). You '.
|
|
|
|
|
'may be able to retry later.',
|
2017-02-08 13:13:32 -05:00
|
|
|
new PhutilNumber($lock_wait)),
|
|
|
|
|
$ex);
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
}
|
2016-04-24 09:04:27 -07:00
|
|
|
|
|
|
|
|
$versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions(
|
|
|
|
|
$repository_phid);
|
|
|
|
|
foreach ($versions as $version) {
|
|
|
|
|
if (!$version->getIsWriting()) {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
throw new Exception(
|
|
|
|
|
pht(
|
|
|
|
|
'An previous write to this repository was interrupted; refusing '.
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
'new writes. This issue requires operator intervention to resolve, '.
|
2016-04-24 09:04:27 -07:00
|
|
|
'see "Write Interruptions" in the "Cluster: Repositories" in the '.
|
|
|
|
|
'documentation for instructions.'));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
try {
|
|
|
|
|
$max_version = $this->synchronizeWorkingCopyBeforeRead();
|
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
|
$write_lock->unlock();
|
|
|
|
|
throw $ex;
|
|
|
|
|
}
|
|
|
|
|
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
$pid = getmypid();
|
|
|
|
|
$hash = Filesystem::readRandomCharacters(12);
|
|
|
|
|
$this->clusterWriteOwner = "{$pid}.{$hash}";
|
|
|
|
|
|
2016-04-24 09:04:27 -07:00
|
|
|
PhabricatorRepositoryWorkingCopyVersion::willWrite(
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
$locked_connection,
|
2016-04-24 09:04:27 -07:00
|
|
|
$repository_phid,
|
|
|
|
|
$device_phid,
|
|
|
|
|
array(
|
|
|
|
|
'userPHID' => $viewer->getPHID(),
|
|
|
|
|
'epoch' => PhabricatorTime::getNow(),
|
|
|
|
|
'devicePHID' => $device_phid,
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
),
|
|
|
|
|
$this->clusterWriteOwner);
|
2016-04-24 09:04:27 -07:00
|
|
|
|
|
|
|
|
$this->clusterWriteVersion = $max_version;
|
|
|
|
|
$this->clusterWriteLock = $write_lock;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
public function synchronizeWorkingCopyAfterDiscovery($new_version) {
|
Don't require a device be registered in Almanac to do cluster init/resync steps
Summary:
Fixes T12893. See also PHI15. This is complicated but:
- In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
- We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
- If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.
Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.
If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.
Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:
```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
return null;
}
```
Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12893
Differential Revision: https://secure.phabricator.com/D18273
2017-07-24 11:37:04 -07:00
|
|
|
if (!$this->shouldEnableSynchronization(true)) {
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
$repository_phid = $repository->getPHID();
|
|
|
|
|
if ($repository->isHosted()) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$viewer = $this->getViewer();
|
|
|
|
|
|
|
|
|
|
$device = AlmanacKeys::getLiveDevice();
|
|
|
|
|
$device_phid = $device->getPHID();
|
|
|
|
|
|
|
|
|
|
// NOTE: We are not holding a lock here because this method is only called
|
|
|
|
|
// from PhabricatorRepositoryDiscoveryEngine, which already holds a device
|
|
|
|
|
// lock. Even if we do race here and record an older version, the
|
|
|
|
|
// consequences are mild: we only do extra work to correct it later.
|
|
|
|
|
|
|
|
|
|
$versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions(
|
|
|
|
|
$repository_phid);
|
|
|
|
|
$versions = mpull($versions, null, 'getDevicePHID');
|
|
|
|
|
|
|
|
|
|
$this_version = idx($versions, $device_phid);
|
|
|
|
|
if ($this_version) {
|
|
|
|
|
$this_version = (int)$this_version->getRepositoryVersion();
|
|
|
|
|
} else {
|
|
|
|
|
$this_version = -1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if ($new_version > $this_version) {
|
|
|
|
|
PhabricatorRepositoryWorkingCopyVersion::updateVersion(
|
|
|
|
|
$repository_phid,
|
|
|
|
|
$device_phid,
|
|
|
|
|
$new_version);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
2016-04-24 09:04:27 -07:00
|
|
|
/**
|
|
|
|
|
* @task sync
|
|
|
|
|
*/
|
|
|
|
|
public function synchronizeWorkingCopyAfterWrite() {
|
Don't require a device be registered in Almanac to do cluster init/resync steps
Summary:
Fixes T12893. See also PHI15. This is complicated but:
- In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
- We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
- If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.
Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.
If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.
Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:
```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
return null;
}
```
Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12893
Differential Revision: https://secure.phabricator.com/D18273
2017-07-24 11:37:04 -07:00
|
|
|
if (!$this->shouldEnableSynchronization(true)) {
|
2016-04-24 09:04:27 -07:00
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (!$this->clusterWriteLock) {
|
|
|
|
|
throw new Exception(
|
|
|
|
|
pht(
|
|
|
|
|
'Trying to synchronize after write, but not holding a write '.
|
|
|
|
|
'lock!'));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
$repository_phid = $repository->getPHID();
|
|
|
|
|
|
|
|
|
|
$device = AlmanacKeys::getLiveDevice();
|
|
|
|
|
$device_phid = $device->getPHID();
|
|
|
|
|
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
// It is possible that we've lost the global lock while receiving the push.
|
|
|
|
|
// For example, the master database may have been restarted between the
|
|
|
|
|
// time we acquired the global lock and now, when the push has finished.
|
|
|
|
|
|
|
|
|
|
// We wrote a durable lock while we were holding the the global lock,
|
|
|
|
|
// essentially upgrading our lock. We can still safely release this upgraded
|
|
|
|
|
// lock even if we're no longer holding the global lock.
|
|
|
|
|
|
|
|
|
|
// If we fail to release the lock, the repository will be frozen until
|
|
|
|
|
// an operator can figure out what happened, so we try pretty hard to
|
|
|
|
|
// reconnect to the database and release the lock.
|
|
|
|
|
|
|
|
|
|
$now = PhabricatorTime::getNow();
|
|
|
|
|
$duration = phutil_units('5 minutes in seconds');
|
|
|
|
|
$try_until = $now + $duration;
|
|
|
|
|
|
|
|
|
|
$did_release = false;
|
|
|
|
|
$already_failed = false;
|
|
|
|
|
while (PhabricatorTime::getNow() <= $try_until) {
|
|
|
|
|
try {
|
|
|
|
|
// NOTE: This means we're still bumping the version when pushes fail. We
|
|
|
|
|
// could select only un-rejected events instead to bump a little less
|
|
|
|
|
// often.
|
|
|
|
|
|
|
|
|
|
$new_log = id(new PhabricatorRepositoryPushEventQuery())
|
|
|
|
|
->setViewer(PhabricatorUser::getOmnipotentUser())
|
|
|
|
|
->withRepositoryPHIDs(array($repository_phid))
|
|
|
|
|
->setLimit(1)
|
|
|
|
|
->executeOne();
|
|
|
|
|
|
|
|
|
|
$old_version = $this->clusterWriteVersion;
|
|
|
|
|
if ($new_log) {
|
|
|
|
|
$new_version = $new_log->getID();
|
|
|
|
|
} else {
|
|
|
|
|
$new_version = $old_version;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
PhabricatorRepositoryWorkingCopyVersion::didWrite(
|
|
|
|
|
$repository_phid,
|
|
|
|
|
$device_phid,
|
|
|
|
|
$this->clusterWriteVersion,
|
2016-05-23 17:13:12 -07:00
|
|
|
$new_version,
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
$this->clusterWriteOwner);
|
|
|
|
|
$did_release = true;
|
|
|
|
|
break;
|
|
|
|
|
} catch (AphrontConnectionQueryException $ex) {
|
|
|
|
|
$connection_exception = $ex;
|
|
|
|
|
} catch (AphrontConnectionLostQueryException $ex) {
|
|
|
|
|
$connection_exception = $ex;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (!$already_failed) {
|
|
|
|
|
$already_failed = true;
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht('CRITICAL. Failed to release cluster write lock!'));
|
|
|
|
|
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'The connection to the master database was lost while receiving '.
|
|
|
|
|
'the write.'));
|
|
|
|
|
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'This process will spend %s more second(s) attempting to '.
|
|
|
|
|
'recover, then give up.',
|
|
|
|
|
new PhutilNumber($duration)));
|
|
|
|
|
}
|
2016-04-24 09:04:27 -07:00
|
|
|
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
sleep(1);
|
|
|
|
|
}
|
2016-04-24 09:04:27 -07:00
|
|
|
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
if ($did_release) {
|
|
|
|
|
if ($already_failed) {
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht('RECOVERED. Link to master database was restored.'));
|
|
|
|
|
}
|
|
|
|
|
$this->logLine(pht('Released cluster write lock.'));
|
2016-04-24 09:04:27 -07:00
|
|
|
} else {
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
throw new Exception(
|
|
|
|
|
pht(
|
|
|
|
|
'Failed to reconnect to master database and release held write '.
|
|
|
|
|
'lock ("%s") on device "%s" for repository "%s" after trying '.
|
|
|
|
|
'for %s seconds(s). This repository will be frozen.',
|
|
|
|
|
$this->clusterWriteOwner,
|
|
|
|
|
$device->getName(),
|
|
|
|
|
$this->getDisplayName(),
|
|
|
|
|
new PhutilNumber($duration)));
|
2016-04-24 09:04:27 -07:00
|
|
|
}
|
|
|
|
|
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
// We can continue even if we've lost this lock, everything is still
|
|
|
|
|
// consistent.
|
|
|
|
|
try {
|
|
|
|
|
$this->clusterWriteLock->unlock();
|
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
|
// Ignore.
|
|
|
|
|
}
|
2016-04-24 09:04:27 -07:00
|
|
|
|
|
|
|
|
$this->clusterWriteLock = null;
|
Make cluster repositories more resistant to freezing
Summary:
Ref T10860. This allows us to recover if the connection to the database is lost during a push.
If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it.
We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock.
Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again.
Basically, the changes are:
- If we can't release the lock at first, sit in a loop trying really hard to release it for a while.
- Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on.
- Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read.
Test Plan:
- Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur.
- Pushed like this:
```
$ echo D >> record && git commit -am D && git push
[master 707ecc3] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 254 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
BEGIN SLEEP
```
- Here, I stopped `mysqld` from the CLI in another terminal window.
```
END SLEEP
# CRITICAL. Failed to release cluster write lock!
# The connection to the master database was lost while receiving the write.
# This process will spend 300 more second(s) attempting to recover, then give up.
```
- Here, I started `mysqld` again.
```
# RECOVERED. Link to master database was restored.
# Released cluster write lock.
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
2cbf87c..707ecc3 master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15792
2016-04-24 10:07:35 -07:00
|
|
|
$this->clusterWriteOwner = null;
|
2016-04-24 09:04:27 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/* -( Internals )---------------------------------------------------------- */
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* @task internal
|
|
|
|
|
*/
|
Don't require a device be registered in Almanac to do cluster init/resync steps
Summary:
Fixes T12893. See also PHI15. This is complicated but:
- In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
- We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
- If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.
Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.
If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.
Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:
```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
return null;
}
```
Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12893
Differential Revision: https://secure.phabricator.com/D18273
2017-07-24 11:37:04 -07:00
|
|
|
private function shouldEnableSynchronization($require_device) {
|
2016-04-24 09:04:27 -07:00
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
|
|
|
|
|
$service_phid = $repository->getAlmanacServicePHID();
|
|
|
|
|
if (!$service_phid) {
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2017-04-23 06:13:53 -07:00
|
|
|
if (!$repository->supportsSynchronization()) {
|
2016-04-24 09:04:27 -07:00
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
Don't require a device be registered in Almanac to do cluster init/resync steps
Summary:
Fixes T12893. See also PHI15. This is complicated but:
- In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster.
- We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to.
- If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't.
Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices.
If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered.
Test Plan:
This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository:
```
if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') {
return null;
}
```
Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized).
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12893
Differential Revision: https://secure.phabricator.com/D18273
2017-07-24 11:37:04 -07:00
|
|
|
if ($require_device) {
|
|
|
|
|
$device = AlmanacKeys::getLiveDevice();
|
|
|
|
|
if (!$device) {
|
|
|
|
|
return false;
|
|
|
|
|
}
|
2016-04-24 09:04:27 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
/**
|
|
|
|
|
* @task internal
|
|
|
|
|
*/
|
|
|
|
|
private function synchornizeWorkingCopyFromRemote() {
|
|
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
$device = AlmanacKeys::getLiveDevice();
|
|
|
|
|
|
|
|
|
|
$local_path = $repository->getLocalPath();
|
|
|
|
|
$fetch_uri = $repository->getRemoteURIEnvelope();
|
|
|
|
|
|
|
|
|
|
if ($repository->isGit()) {
|
|
|
|
|
$this->requireWorkingCopy();
|
|
|
|
|
|
|
|
|
|
$argv = array(
|
|
|
|
|
'fetch --prune -- %P %s',
|
|
|
|
|
$fetch_uri,
|
|
|
|
|
'+refs/*:refs/*',
|
|
|
|
|
);
|
|
|
|
|
} else {
|
|
|
|
|
throw new Exception(pht('Remote sync only supported for git!'));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$future = DiffusionCommandEngine::newCommandEngine($repository)
|
|
|
|
|
->setArgv($argv)
|
|
|
|
|
->setSudoAsDaemon(true)
|
|
|
|
|
->setCredentialPHID($repository->getCredentialPHID())
|
Fix a bad DiffusionCommandEngine parameter from HTTPEngine conversion
Summary:
I converted this call incorrectly in D16092. We should pass the `PhutilURI` object, not the string version of it.
Specifically, this resulted in hitting an error like this if a replica needed synchronization:
```
[2016-08-11 21:22:37] EXCEPTION: (InvalidArgumentException) Argument 1 passed to DiffusionCommandEngine::setURI() must be an instance of PhutilURI, string given, called in...
#0 PhutilErrorHandler::handleError(integer, string, string, integer, array) called at [<phabricator>/src/applications/diffusion/protocol/DiffusionCommandEngine.php:52]
#1 DiffusionCommandEngine::setURI(string) called at [<phabricator>/src/applications/diffusion/protocol/DiffusionRepositoryClusterEngine.php:601]
...
```
Test Plan: Clusterized an observed repository, demoted a node, ran `bin/repository update Rxxx` to update, saw no typehint fatal.
Reviewers: chad
Reviewed By: chad
Differential Revision: https://secure.phabricator.com/D16390
2016-08-11 15:07:17 -07:00
|
|
|
->setURI($repository->getRemoteURIObject())
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
->newFuture();
|
|
|
|
|
|
|
|
|
|
$future->setCWD($local_path);
|
|
|
|
|
|
|
|
|
|
try {
|
|
|
|
|
$future->resolvex();
|
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'Synchronization of "%s" from remote failed: %s',
|
|
|
|
|
$device->getName(),
|
|
|
|
|
$ex->getMessage()));
|
|
|
|
|
throw $ex;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
2016-04-24 09:04:27 -07:00
|
|
|
/**
|
|
|
|
|
* @task internal
|
|
|
|
|
*/
|
|
|
|
|
private function synchronizeWorkingCopyFromDevices(array $device_phids) {
|
|
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
|
|
|
|
|
$service = $repository->loadAlmanacService();
|
|
|
|
|
if (!$service) {
|
|
|
|
|
throw new Exception(pht('Failed to load repository cluster service.'));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$device_map = array_fuse($device_phids);
|
|
|
|
|
$bindings = $service->getActiveBindings();
|
|
|
|
|
|
|
|
|
|
$fetchable = array();
|
|
|
|
|
foreach ($bindings as $binding) {
|
|
|
|
|
// We can't fetch from nodes which don't have the newest version.
|
|
|
|
|
$device_phid = $binding->getDevicePHID();
|
|
|
|
|
if (empty($device_map[$device_phid])) {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// TODO: For now, only fetch over SSH. We could support fetching over
|
|
|
|
|
// HTTP eventually.
|
|
|
|
|
if ($binding->getAlmanacPropertyValue('protocol') != 'ssh') {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$fetchable[] = $binding;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (!$fetchable) {
|
|
|
|
|
throw new Exception(
|
|
|
|
|
pht(
|
|
|
|
|
'Leader lost: no up-to-date nodes in repository cluster are '.
|
|
|
|
|
'fetchable.'));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$caught = null;
|
|
|
|
|
foreach ($fetchable as $binding) {
|
|
|
|
|
try {
|
|
|
|
|
$this->synchronizeWorkingCopyFromBinding($binding);
|
|
|
|
|
$caught = null;
|
|
|
|
|
break;
|
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
|
$caught = $ex;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if ($caught) {
|
|
|
|
|
throw $caught;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* @task internal
|
|
|
|
|
*/
|
|
|
|
|
private function synchronizeWorkingCopyFromBinding($binding) {
|
|
|
|
|
$repository = $this->getRepository();
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
$device = AlmanacKeys::getLiveDevice();
|
|
|
|
|
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'Synchronizing this device ("%s") from cluster leader ("%s") before '.
|
|
|
|
|
'read.',
|
|
|
|
|
$device->getName(),
|
|
|
|
|
$binding->getDevice()->getName()));
|
2016-04-24 09:04:27 -07:00
|
|
|
|
|
|
|
|
$fetch_uri = $repository->getClusterRepositoryURIFromBinding($binding);
|
|
|
|
|
$local_path = $repository->getLocalPath();
|
|
|
|
|
|
|
|
|
|
if ($repository->isGit()) {
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
$this->requireWorkingCopy();
|
2016-04-24 09:04:27 -07:00
|
|
|
|
|
|
|
|
$argv = array(
|
|
|
|
|
'fetch --prune -- %s %s',
|
|
|
|
|
$fetch_uri,
|
|
|
|
|
'+refs/*:refs/*',
|
|
|
|
|
);
|
|
|
|
|
} else {
|
|
|
|
|
throw new Exception(pht('Binding sync only supported for git!'));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$future = DiffusionCommandEngine::newCommandEngine($repository)
|
|
|
|
|
->setArgv($argv)
|
|
|
|
|
->setConnectAsDevice(true)
|
|
|
|
|
->setSudoAsDaemon(true)
|
2016-06-09 11:19:16 -07:00
|
|
|
->setURI($fetch_uri)
|
2016-04-24 09:04:27 -07:00
|
|
|
->newFuture();
|
|
|
|
|
|
|
|
|
|
$future->setCWD($local_path);
|
|
|
|
|
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
try {
|
|
|
|
|
$future->resolvex();
|
|
|
|
|
} catch (Exception $ex) {
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'Synchronization of "%s" from leader "%s" failed: %s',
|
|
|
|
|
$device->getName(),
|
|
|
|
|
$binding->getDevice()->getName(),
|
|
|
|
|
$ex->getMessage()));
|
|
|
|
|
throw $ex;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* @task internal
|
|
|
|
|
*/
|
|
|
|
|
private function logLine($message) {
|
|
|
|
|
return $this->logText("# {$message}\n");
|
2016-04-24 09:04:27 -07:00
|
|
|
}
|
|
|
|
|
|
Make cluster repositories more chatty
Summary:
Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on.
This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty.
Test Plan:
```
$ echo D >> record && git commit -am D && git push
[master ca5efff] D
1 file changed, 1 insertion(+)
# Push received by "local001.phacility.net", forwarding to cluster host.
# Waiting up to 120 second(s) for a cluster write lock...
# Acquired write lock immediately.
# Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"...
# Acquired read lock immediately.
# Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized.
# Ready to receive on cluster host "local001.phacility.net".
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 256 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git
8616189..ca5efff master -> master
```
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T10860
Differential Revision: https://secure.phabricator.com/D15791
2016-04-24 09:49:18 -07:00
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* @task internal
|
|
|
|
|
*/
|
|
|
|
|
private function logText($message) {
|
|
|
|
|
$log = $this->logger;
|
|
|
|
|
if ($log) {
|
|
|
|
|
$log->writeClusterEngineLogMessage($message);
|
|
|
|
|
}
|
|
|
|
|
return $this;
|
|
|
|
|
}
|
Version clustered, observed repositories in a reasonable way (by largest discovered HEAD)
Summary:
Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1.
We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got.
For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98").
This implements a simple way to version an observed repository:
- Take the head of every branch/tag.
- Look them up.
- Pick the biggest internal ID number.
This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push.
Test Plan:
- Created an observed repository.
- Ran `bin/repository update` and observed a sensible version number appear in the version table.
- Pushed to the remote, did another update, saw a sensible update.
- Did an update with no push, saw no effect on version number.
- Toggled repository to hosted, saw the version reset.
- Simulated read traffic to out-of-sync node, saw it do a remote fetch.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T4292
Differential Revision: https://secure.phabricator.com/D15986
2016-05-27 06:21:19 -07:00
|
|
|
|
|
|
|
|
private function requireWorkingCopy() {
|
|
|
|
|
$repository = $this->getRepository();
|
|
|
|
|
$local_path = $repository->getLocalPath();
|
|
|
|
|
|
|
|
|
|
if (!Filesystem::pathExists($local_path)) {
|
|
|
|
|
$device = AlmanacKeys::getLiveDevice();
|
|
|
|
|
|
|
|
|
|
throw new Exception(
|
|
|
|
|
pht(
|
|
|
|
|
'Repository "%s" does not have a working copy on this device '.
|
|
|
|
|
'yet, so it can not be synchronized. Wait for the daemons to '.
|
|
|
|
|
'construct one or run `bin/repository update %s` on this host '.
|
|
|
|
|
'("%s") to build it explicitly.',
|
|
|
|
|
$repository->getDisplayName(),
|
|
|
|
|
$repository->getMonogram(),
|
|
|
|
|
$device->getName()));
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-21 17:51:14 -07:00
|
|
|
private function logActiveWriter(
|
|
|
|
|
PhabricatorUser $viewer,
|
|
|
|
|
PhabricatorRepository $repository) {
|
|
|
|
|
|
|
|
|
|
$writer = PhabricatorRepositoryWorkingCopyVersion::loadWriter(
|
|
|
|
|
$repository->getPHID());
|
|
|
|
|
if (!$writer) {
|
|
|
|
|
$this->logLine(pht('Waiting on another user to finish writing...'));
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
$user_phid = $writer->getWriteProperty('userPHID');
|
|
|
|
|
$device_phid = $writer->getWriteProperty('devicePHID');
|
|
|
|
|
$epoch = $writer->getWriteProperty('epoch');
|
|
|
|
|
|
|
|
|
|
$phids = array($user_phid, $device_phid);
|
|
|
|
|
$handles = $viewer->loadHandles($phids);
|
|
|
|
|
|
|
|
|
|
$duration = (PhabricatorTime::getNow() - $epoch) + 1;
|
|
|
|
|
|
|
|
|
|
$this->logLine(
|
|
|
|
|
pht(
|
|
|
|
|
'Waiting for %s to finish writing (on device "%s" for %ss)...',
|
|
|
|
|
$handles[$user_phid]->getName(),
|
|
|
|
|
$handles[$device_phid]->getName(),
|
|
|
|
|
new PhutilNumber($duration)));
|
|
|
|
|
}
|
|
|
|
|
|
2016-04-24 09:04:27 -07:00
|
|
|
}
|