Support multiple fulltext search clusters with 'cluster.search' config
Summary:
The goal is to make fulltext search back-ends more extensible, configurable and robust.
When this is finished it will be possible to have multiple search storage back-ends and
potentially multiple instances of each.
Individual instances can be configured with roles such as 'read', 'write' which control
which hosts will receive writes to the index and which hosts will respond to queries.
These two roles make it possible to have any combination of:
* read-only
* write-only
* read-write
* disabled
This 'roles' mechanism is extensible to add new roles should that be needed in the future.
In addition to supporting multiple elasticsearch and mysql search instances, this refactors
the connection health monitoring infrastructure from PhabricatorDatabaseHealthRecord and
utilizes the same system for monitoring the health of elasticsearch nodes. This will
allow Wikimedia's phabricator to be redundant across data centers (mysql already is,
elasticsearch should be as well).
The real-world use-case I have in mind here is writing to two indexes (two elasticsearch clusters
in different data centers) but reading from only one. Then toggling the 'read' property when
we want to migrate to the other data center (and when we migrate from elasticsearch 2.x to 5.x)
Hopefully this is useful in the upstream as well.
Remaining TODO:
* test cases
* documentation
Test Plan:
(WARNING) This will most likely require the elasticsearch index to be deleted and re-created due to schema changes.
Tested with elasticsearch versions 2.4 and 5.2 using the following config:
```lang=json
"cluster.search": [
{
"type": "elasticsearch",
"hosts": [
{
"host": "localhost",
"roles": { "read": true, "write": true }
}
],
"port": 9200,
"protocol": "http",
"path": "/phabricator",
"version": 5
},
{
"type": "mysql",
"roles": { "write": true }
}
]
Also deployed the same changes to Wikimedia's production Phabricator instance without any issues whatsoever.
```
Reviewers: epriestley, #blessed_reviewers
Reviewed By: epriestley, #blessed_reviewers
Subscribers: Korvin, epriestley
Tags: #elasticsearch, #clusters, #wikimedia
Differential Revision: https://secure.phabricator.com/D17384
This commit is contained in:
76
src/docs/user/cluster/cluster_search.diviner
Normal file
76
src/docs/user/cluster/cluster_search.diviner
Normal file
@@ -0,0 +1,76 @@
|
||||
@title Cluster: Search
|
||||
@group cluster
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
You can configure phabricator to connect to one or more fulltext search clusters
|
||||
running either Elasticsearch or MySQL. By default and without further
|
||||
configuration, Phabricator will use MySQL for fulltext search. This will be
|
||||
adequate for the vast majority of users. Installs with a very large number of
|
||||
objects or specialized search needs can consider enabling Elasticsearch for
|
||||
better scalability and potentially better search results.
|
||||
|
||||
Configuring Search Services
|
||||
===========================
|
||||
|
||||
To configure an Elasticsearch service, use the `cluster.search` configuration
|
||||
option. A typical Elasticsearch configuration will probably look similar to
|
||||
the following example:
|
||||
|
||||
```lang=json
|
||||
{
|
||||
"cluster.search": [
|
||||
{
|
||||
"type": "elasticsearch",
|
||||
"hosts": [
|
||||
{
|
||||
"host": "127.0.0.1",
|
||||
"roles": { "write": true, "read": true }
|
||||
}
|
||||
],
|
||||
"port": 9200,
|
||||
"protocol": "http",
|
||||
"path": "/phabricator",
|
||||
"version": 5
|
||||
},
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
Supported Options
|
||||
-----------------
|
||||
| Key | Type |Comments|
|
||||
|`type` | String |Engine type. Currently, 'elasticsearch' or 'mysql'|
|
||||
|`protocol`| String |Either 'http' or 'https'|
|
||||
|`port`| Int |The TCP port that Elasticsearch is bound to|
|
||||
|`path`| String |The path portion of the url for phabricator's index.|
|
||||
|`version`| Int |The version of Elasticsearch server. Supports either 2 or 5.|
|
||||
|`hosts`| List |A list of one or more Elasticsearch host names / addresses.|
|
||||
|
||||
Host Configuration
|
||||
------------------
|
||||
Each search service must have one or more hosts associated with it. Each host
|
||||
entry consists of a `host` key, a dictionary of roles and can optionally
|
||||
override any of the options that are valid at the service level (see above).
|
||||
|
||||
Currently supported roles are `read` and `write`. These can be individually
|
||||
enabled or disabled on a per-host basis. A typical setup might include two
|
||||
elasticsearch clusters in two separate datacenters. You can configure one
|
||||
cluster for reads and both for writes. When one cluster is down for maintenance
|
||||
you can simply swap the read role over to the backup cluster and then proceed
|
||||
with maintenance without any service interruption.
|
||||
|
||||
Monitoring Search Services
|
||||
==========================
|
||||
|
||||
You can monitor fulltext search in {nav Config > Search Servers}. This interface
|
||||
shows you a quick overview of services and their health.
|
||||
|
||||
The table on this page shows some basic stats for each configured service,
|
||||
followed by the configuration and current status of each host.
|
||||
|
||||
NOTE: This page runs its diagnostics //from the web server that is serving the
|
||||
request//. If you are recovering from a disaster, the view this page shows
|
||||
may be partial or misleading, and two requests served by different servers may
|
||||
see different views of the cluster.
|
||||
Reference in New Issue
Block a user