archive/phabricator - phabricator - Blender Projects

archive/phabricator

Author	SHA1	Message	Date
epriestley	859b274970	Provide more information to users during `git push` while waiting for write locks Summary: Ref T13109. Make it slightly more clear what the scope of the write and read locks are, and slightly more clear that we're actively acquiring locks, not just sitting around waiting. While waiting on another writer, show who we're waiting on so you can walk over to their desk and glare at them. Test Plan: Added `sleep(15)` after `willWrite()`. Pushed in two windows. Saw new, more informative messages. In the second window, saw the new guidance: > # Waiting for hector to finish writing (on device "repo1.local.phacility.net" for 11s)... Reviewers: asherkin Reviewed By: asherkin Subscribers: asherkin Maniphest Tasks: T13109 Differential Revision: https://secure.phabricator.com/D19247	2018-03-22 13:42:18 -07:00
Dmitri Iouchtchenko	9bd6a37055	Fix spelling Summary: Noticed a couple of typos in the docs, and then things got out of hand. Test Plan: - Stared at the words until my eyes watered and the letters began to swim on the screen. - Consulted a dictionary. Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley, yelirekim, PHID-OPKG-gm6ozazyms6q6i22gyam Differential Revision: https://secure.phabricator.com/D18693	2017-10-09 10:48:04 -07:00
epriestley	8034b9d819	Don't require a device be registered in Almanac to do cluster init/resync steps Summary: Fixes T12893. See also PHI15. This is complicated but: - In the documentation, we say "register your web devices with Almanac". We do this ourselves on `secure` and in the production Phacility cluster. - We don't actually require you to do this, don't detect that you didn't, and there's no actual reason you need to. - If you don't register your "web" devices, the only bad thing that really happens is that creating repositories skips version initialization, creating the bug in T12893. This process does not actually require the devices be registered, but the code currently just kind of fails silently if they aren't. Instead, just move forward on these init/resync phases even if the device isn't registered. These steps are safe to run from unregistered hosts since they just wipe the whole table and don't affect specific devices. If this sticks, I'll probably update the docs to not tell you to register `web` devices, or at least add "Optionally, ...". I don't think there's any future reason we'd need them to be registered. Test Plan: This is a bit tough to test without multiple hosts, but I added this piece of code to `AlmanacKeys` so we'd pretend to be a nameless "web" device when creating a repository: ``` if ($_REQUEST['__path__'] == '/diffusion/edit/form/default/') { return null; } ``` Then I created some Git repositories. Before the patch, they came up with `-` versions (no version information). After the patch, they came up with `0` versions (correctly initialized). Reviewers: chad Reviewed By: chad Maniphest Tasks: T12893 Differential Revision: https://secure.phabricator.com/D18273	2017-07-25 05:12:10 -07:00
epriestley	0d5538672c	Detect unsynchronizable repositories on multiple cluster hosts Summary: Ref T12613. Currently, the SVNTEST and HGTEST repositories are improperly configured on `secure`. These repositories use VCS systems which do not support synchronization, so they can not be served from cluster services with multiple hosts. However, I've incorrectly configured them the same way as all the Git repositories, which support synchronization. This causes about 50% of requests to randomly fail (when they reach the wrong host). Detect this issue and warn the user that the configuration is not valid. It should be exceptionally difficult for normal installs to run into this. Test Plan: - Mostly faked these conditions locally, verified that `secure` really has this configuration. - I'll push this, verify that the issue is detected correctly in production, then fix the config which should resolve the intermittent issues with SVNTEST. Reviewers: chad Reviewed By: chad Maniphest Tasks: T12613 Differential Revision: https://secure.phabricator.com/D17774	2017-04-24 10:43:05 -07:00
Josh Cox	e0675b28d8	Pass exception to PhutilProxyException Summary: Fixes T12243. That error occured due to network flakiness with some mounted filesystems so I'm not sure how best to simulate it. But you can look and see that the PhutilProxyException does indeed expect an exception as its second arg. Test Plan: Look at method signature... look at callsite... now back at the method. Smile and nod. Reviewers: #blessed_reviewers, yelirekim, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley Maniphest Tasks: T12243 Differential Revision: https://secure.phabricator.com/D17335	2017-02-08 13:24:44 -05:00
epriestley	ccff47682f	Provide more useful guidance if a repository is clusterized into an existing multi-device cluster Summary: Fixes T12087. When transitioning into a clustered configuration for the first time, the documentation recommends using a one-device cluster as a transitional step. However, installs may not do this for whatever reason, and we aren't as clear as we could be in warning about clusterizing directly into a multi-device cluster. Roughly, when you do this, we end up believing that working copies exist on several different devices, but have no information about which copy or copies are up to date. //Usually// they all were already synchronized and are all up to date, but we can't make this assumption safely without risking data. Instead, we err on the side of caution, and require a human to tell us which copy we should consider to be up-to-date, using `bin/repository thaw --promote`. Test Plan: ``` $ ./bin/repository clusterize rLOCKS --service repos001.phacility.net Service "repos001.phacility.net" is actively bound to more than one device (local002.local, local001.phacility.net). If you clusterize a repository onto this service it will be unclear which devices have up-to-date copies of the repository. This leader/follower ambiguity will freeze the repository. You may need to manually promote a device to unfreeze it. See "Ambiguous Leaders" in the documentation for discussion. Continue anyway? [y/N] ``` Read other changes. Reviewers: chad Reviewed By: chad Maniphest Tasks: T12087 Differential Revision: https://secure.phabricator.com/D17169	2017-01-10 12:45:55 -08:00
epriestley	0ed767b967	Fix a couple of partition migration bugs Summary: Ref T11044. Few issues here: - The `PhutilProxyException` is missing an argument (hit this while in read-only mode). - The `$ref_key` is unused. - When you add a new master to an existing cluster, we can incorrectly apply `.php` patches which we should not reapply. Instead, mark them as already-applied. Test Plan: - Poked this locally, but will initialize `secure004` as an empty master to be sure. Reviewers: chad, avivey Reviewed By: avivey Maniphest Tasks: T11044 Differential Revision: https://secure.phabricator.com/D16916	2016-11-22 10:57:24 -08:00
epriestley	4dc37bcee0	Ignore repository versions on inactive devices in "Repository Servers" panel in Config Summary: Fixes T11590. Currently, we incorrectly consider cluster repository versions that are (or were) on devices which are no longer part of the active cluster service when building this status screen. Instead, ignore them. This is just a display bug; the actual `ClusterEngine` already had similar logic. Test Plan: - Added a bad leader record to `repository_workingcopyversion`. - Before patch, got a bad "Partial (1w)" sync: {F1802292} - After patch, got a good "Sycnchronized": {F1802293} Reviewers: chad Reviewed By: chad Maniphest Tasks: T11590 Differential Revision: https://secure.phabricator.com/D16492	2016-09-05 11:10:16 -07:00
epriestley	39d4e21eec	Fix a bad DiffusionCommandEngine parameter from HTTPEngine conversion Summary: I converted this call incorrectly in D16092. We should pass the `PhutilURI` object, not the string version of it. Specifically, this resulted in hitting an error like this if a replica needed synchronization: ``` [2016-08-11 21:22:37] EXCEPTION: (InvalidArgumentException) Argument 1 passed to DiffusionCommandEngine::setURI() must be an instance of PhutilURI, string given, called in... #0 PhutilErrorHandler::handleError(integer, string, string, integer, array) called at [<phabricator>/src/applications/diffusion/protocol/DiffusionCommandEngine.php:52] #1 DiffusionCommandEngine::setURI(string) called at [<phabricator>/src/applications/diffusion/protocol/DiffusionRepositoryClusterEngine.php:601] ... ``` Test Plan: Clusterized an observed repository, demoted a node, ran `bin/repository update Rxxx` to update, saw no typehint fatal. Reviewers: chad Reviewed By: chad Differential Revision: https://secure.phabricator.com/D16390	2016-08-11 16:41:09 -07:00
epriestley	55a698a28a	Use HTTPEngineExtension proxy for `git` HTTP operations Summary: Ref T10227. When we perform `git` http operations (fetch, mirror) check if we should use a proxy; if we should, set `http_proxy` or `https_proxy` in the environment to make `git` have `curl` use it. Test Plan: - Configured a proxy extension to run stuff through a local instance of Charles. - Ran `repository pull` and `repository mirror`. - Saw `git` HTTP requests route through the proxy. Reviewers: chad Reviewed By: chad Maniphest Tasks: T10227 Differential Revision: https://secure.phabricator.com/D16092	2016-06-09 12:17:10 -07:00
epriestley	f5f784f4c1	Version clustered, observed repositories in a reasonable way (by largest discovered HEAD) Summary: Ref T4292. For hosted, clustered repositories we have a good way to increment the internal version of the repository: every time a user pushes something, we increment the version by 1. We don't have a great way to do this for observed/remote repositories because when we `git fetch` we might get nothing, or we might get some changes, and we can't easily tell //what// changes we got. For example, if we see that another node is at "version 97", and we do a fetch and see some changes, we don't know if we're in sync with them (i.e., also at "version 97") or ahead of them (at "version 98"). This implements a simple way to version an observed repository: - Take the head of every branch/tag. - Look them up. - Pick the biggest internal ID number. This will work //except// when branches are deleted, which could cause the version to go backward if the "biggest commit" is the one that was deleted. This should be OK, since it's rare and the effects are minor and the repository will "self-heal" on the next actual push. Test Plan: - Created an observed repository. - Ran `bin/repository update` and observed a sensible version number appear in the version table. - Pushed to the remote, did another update, saw a sensible update. - Did an update with no push, saw no effect on version number. - Toggled repository to hosted, saw the version reset. - Simulated read traffic to out-of-sync node, saw it do a remote fetch. Reviewers: chad Reviewed By: chad Maniphest Tasks: T4292 Differential Revision: https://secure.phabricator.com/D15986	2016-05-30 09:53:01 -07:00
epriestley	bb16a1b0e2	Fix a possible fatal on the first push to a cluster repository Summary: Fixes T11020. I think this resolves things -- `$new_version` (set above) should be used, not `$new_log` directly. Specifically, we would get into trouble if the initial push failed for some reason (working copy not initialized yet, commit hook rejected, etc). Test Plan: Made a bad push to a new repository. Saw it freeze before the patch and succeed afterwards. Reviewers: chad Reviewed By: chad Maniphest Tasks: T11020 Differential Revision: https://secure.phabricator.com/D15969	2016-05-23 17:54:54 -07:00
epriestley	892a9a1f07	Make cluster repositories more resistant to freezing Summary: Ref T10860. This allows us to recover if the connection to the database is lost during a push. If we lose the connection to the master database during a push, we would previously freeze the repository. This is very safe, but not very operator-friendly since you have to go manually unfreeze it. We don't need to be quite this aggressive about freezing things. The repository state is still consistent after we've "upgraded" the lock by setting `isWriting = 1`, so we're actually fine even if we lost the global lock. Instead of just freezing the repository immediately, sit there in a loop waiting for the master to come back up for a few minutes. If it recovers, we can release the lock and everything will be OK again. Basically, the changes are: - If we can't release the lock at first, sit in a loop trying really hard to release it for a while. - Add a unique lock identifier so we can be certain we're only releasing //our// lock no matter what else is going on. - Do the version reads on the same connection holding the lock, so we can be sure we haven't lost the lock before we do that read. Test Plan: - Added a `sleep(10)` after accepting the write but before releasing the lock so I could run `mysqld stop` and force this issue to occur. - Pushed like this: ``` $ echo D >> record && git commit -am D && git push [master 707ecc3] D 1 file changed, 1 insertion(+) # Push received by "local001.phacility.net", forwarding to cluster host. # Waiting up to 120 second(s) for a cluster write lock... # Acquired write lock immediately. # Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"... # Acquired read lock immediately. # Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized. # Ready to receive on cluster host "local001.phacility.net". Counting objects: 3, done. Delta compression using up to 8 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 254 bytes \| 0 bytes/s, done. Total 3 (delta 1), reused 0 (delta 0) BEGIN SLEEP ``` - Here, I stopped `mysqld` from the CLI in another terminal window. ``` END SLEEP # CRITICAL. Failed to release cluster write lock! # The connection to the master database was lost while receiving the write. # This process will spend 300 more second(s) attempting to recover, then give up. ``` - Here, I started `mysqld` again. ``` # RECOVERED. Link to master database was restored. # Released cluster write lock. To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git 2cbf87c..707ecc3 master -> master ``` Reviewers: chad Reviewed By: chad Maniphest Tasks: T10860 Differential Revision: https://secure.phabricator.com/D15792	2016-04-25 11:37:31 -07:00
epriestley	d0b5dac36b	Make cluster repositories more chatty Summary: Ref T10860. At least in Git over SSH, we can freely echo a bunch of stuff to stderr and Git will print it to the console, so we can tell users what's going on. This should make debugging, etc., easier. We could tone this down a little bit once things are more stable if it's a little too chatty. Test Plan: ``` $ echo D >> record && git commit -am D && git push [master ca5efff] D 1 file changed, 1 insertion(+) # Push received by "local001.phacility.net", forwarding to cluster host. # Waiting up to 120 second(s) for a cluster write lock... # Acquired write lock immediately. # Waiting up to 120 second(s) for a cluster read lock on "local001.phacility.net"... # Acquired read lock immediately. # Device "local001.phacility.net" is already a cluster leader and does not need to be synchronized. # Ready to receive on cluster host "local001.phacility.net". Counting objects: 3, done. Delta compression using up to 8 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 256 bytes \| 0 bytes/s, done. Total 3 (delta 1), reused 0 (delta 0) To ssh://local@localvault.phacility.com/diffusion/26/locktopia.git 8616189..ca5efff master -> master ``` Reviewers: chad Reviewed By: chad Maniphest Tasks: T10860 Differential Revision: https://secure.phabricator.com/D15791	2016-04-25 11:20:57 -07:00
epriestley	dc75b4bd06	Move all cluster locking logic to a separate class Summary: Ref T10860. This doesn't change anything, it just separates all this stuff out of `PhabricatorRepository` since I'm planning to add a bit more state to it and it's already pretty big and fairly separable. Test Plan: Pulled, pushed, browsed Diffusion. Reviewers: chad Reviewed By: chad Maniphest Tasks: T10860 Differential Revision: https://secure.phabricator.com/D15790	2016-04-25 11:20:29 -07:00