Add slightly more cluster repository documentation
Summary: Ref T10751. There are still some missing support tools here, but explain some of this a little better. Test Plan: Read documentation. Reviewers: chad Reviewed By: chad Maniphest Tasks: T10751 Differential Revision: https://secure.phabricator.com/D15764
This commit is contained in:
@@ -19,19 +19,19 @@ advantages of doing this are:
|
|||||||
|
|
||||||
This configuration is complex, and many installs do not need to pursue it.
|
This configuration is complex, and many installs do not need to pursue it.
|
||||||
|
|
||||||
This configuration is not currently supported with Subversion.
|
This configuration is not currently supported with Subversion or Mercurial.
|
||||||
|
|
||||||
|
|
||||||
Repository Hosts
|
Repository Hosts
|
||||||
================
|
================
|
||||||
|
|
||||||
Repository hosts must run a complete, fully configured copy of Phabricator,
|
Repository hosts must run a complete, fully configured copy of Phabricator,
|
||||||
including a webserver. If you make repositories available over SSH, they must
|
including a webserver. They must also run a properly configured `sshd`.
|
||||||
also run a properly configured `sshd`.
|
|
||||||
|
|
||||||
Generally, these hosts will run the same set of services and configuration that
|
Generally, these hosts will run the same set of services and configuration that
|
||||||
web hosts run. If you prefer, you can overlay these services and put web and
|
web hosts run. If you prefer, you can overlay these services and put web and
|
||||||
repository services on the same hosts.
|
repository services on the same hosts. See @{article:Clustering Introduction}
|
||||||
|
for some guidance on overlaying services.
|
||||||
|
|
||||||
When a user requests information about a repository that can only be satisfied
|
When a user requests information about a repository that can only be satisfied
|
||||||
by examining a repository working copy, the webserver receiving the request
|
by examining a repository working copy, the webserver receiving the request
|
||||||
@@ -57,6 +57,17 @@ If it isn't, they block the read until they can complete a fetch.
|
|||||||
Before responding to a write, replicas obtain a global lock, perform the same
|
Before responding to a write, replicas obtain a global lock, perform the same
|
||||||
version check and fetch if necessary, then allow the write to continue.
|
version check and fetch if necessary, then allow the write to continue.
|
||||||
|
|
||||||
|
Additionally, repositories passively check other nodes for updates and
|
||||||
|
replicate changes in the background. After you push a change to a repositroy,
|
||||||
|
it will usually spread passively to all other repository nodes within a few
|
||||||
|
minutes.
|
||||||
|
|
||||||
|
Even if passive replication is slow, the active replication makes acknowledged
|
||||||
|
changes sequential to all observers: after a write is acknowledged, all
|
||||||
|
subsequent reads are guaranteed to see it. The system does not permit stale
|
||||||
|
reads, and you do not need to wait for a replication delay to see a consistent
|
||||||
|
view of the repository no matter which node you ask.
|
||||||
|
|
||||||
|
|
||||||
HTTP vs HTTPS
|
HTTP vs HTTPS
|
||||||
=============
|
=============
|
||||||
@@ -84,6 +95,81 @@ Other mitigations are possible, but securing a network against the NSA and
|
|||||||
similar agents of other rogue nations is beyond the scope of this document.
|
similar agents of other rogue nations is beyond the scope of this document.
|
||||||
|
|
||||||
|
|
||||||
|
Monitoring Replication
|
||||||
|
======================
|
||||||
|
|
||||||
|
You can review the current status of a repository on cluster nodes in
|
||||||
|
{nav Diffusion > (Repository) > Manage Repository > Cluster Configuration}.
|
||||||
|
|
||||||
|
This screen shows all the configured devices which are hosting the repository
|
||||||
|
and the available version.
|
||||||
|
|
||||||
|
**Version**: When a repository is mutated by a push, Phabricator increases
|
||||||
|
an internal version number for the repository. This column shows which version
|
||||||
|
is on disk on the corresponding node.
|
||||||
|
|
||||||
|
After a change is pushed, the node which received the change will have a larger
|
||||||
|
version number than the other nodes. The change should be passively replicated
|
||||||
|
to the remaining nodes after a brief period of time, although this can take
|
||||||
|
a while if the change was large or the network connection between nodes is
|
||||||
|
slow or unreliable.
|
||||||
|
|
||||||
|
You can click the version number to see the corresponding push logs for that
|
||||||
|
change. The logs contain details about what was changed, and can help you
|
||||||
|
identify if replication is slow because a change is large or for some other
|
||||||
|
reason.
|
||||||
|
|
||||||
|
**Writing**: This shows that the node is currently holding a write lock. This
|
||||||
|
normally means that it is actively receiving a push, but can also mean that
|
||||||
|
there was a write interruption. See "Write Interruptions" below for details.
|
||||||
|
|
||||||
|
|
||||||
|
Write Interruptions
|
||||||
|
===================
|
||||||
|
|
||||||
|
A repository cluster can be put into an inconsistent state by an interruption
|
||||||
|
in a brief window immediately after a write.
|
||||||
|
|
||||||
|
Phabricator can not commit changes to a working copy (stored on disk) and to
|
||||||
|
the global state (stored in a database) atomically, so there is a narrow window
|
||||||
|
between committing these two different states when some tragedy (like a
|
||||||
|
lightning strike) can befall a server, leaving the global and local views of
|
||||||
|
the repository state divergent.
|
||||||
|
|
||||||
|
In these cases, Phabricator fails into a "frozen" state where further writes
|
||||||
|
are not permitted until the failure is investigated and resolved.
|
||||||
|
|
||||||
|
TODO: Complete the support tooling and provide recovery instructions.
|
||||||
|
|
||||||
|
|
||||||
|
Loss of Leaders
|
||||||
|
===============
|
||||||
|
|
||||||
|
A more straightforward failure condition is the loss of all servers in a
|
||||||
|
cluster which have the most up-to-date copy of a repository. This looks like
|
||||||
|
this:
|
||||||
|
|
||||||
|
- There is a cluster setup with two nodes, X and Y.
|
||||||
|
- A new change is pushed to server X.
|
||||||
|
- Before the change can propagate to server Y, lightning strikes server X
|
||||||
|
and destroys it.
|
||||||
|
|
||||||
|
Here, all of the "leader" nodes with the most up-to-date copy of the repository
|
||||||
|
have been lost. Phabricator will refuse to serve this repository because it
|
||||||
|
can not serve it consistently, and can not accept writes without data loss.
|
||||||
|
|
||||||
|
The most straightforward way to resolve this issue is to restore any leader to
|
||||||
|
service. The change will be able to replicate to other nodes once a leader
|
||||||
|
comes back online.
|
||||||
|
|
||||||
|
If you are unable to restore a leader or unsure that you can restore one
|
||||||
|
quickly, you can use the monitoring console to review which changes are
|
||||||
|
present on the leaders but not present on the followers by examining the
|
||||||
|
push logs.
|
||||||
|
|
||||||
|
TODO: Complete the support tooling and provide recovery instructions.
|
||||||
|
|
||||||
|
|
||||||
Backups
|
Backups
|
||||||
======
|
======
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user