Support multiple fulltext search clusters with 'cluster.search' config

Summary:
The goal is to make fulltext search back-ends more extensible, configurable and robust.

When this is finished it will be possible to have multiple search storage back-ends and
potentially multiple instances of each.

Individual instances can be configured with roles such as 'read', 'write' which control
which hosts will receive writes to the index and which hosts will respond to queries.

These two roles make it possible to have any combination of:

* read-only
* write-only
* read-write
* disabled

This 'roles' mechanism is extensible to add new roles should that be needed in the future.

In addition to supporting multiple elasticsearch and mysql search instances, this refactors
the connection health monitoring infrastructure from PhabricatorDatabaseHealthRecord and
utilizes the same system for monitoring the health of elasticsearch nodes. This will
allow Wikimedia's phabricator to be redundant across data centers (mysql already is,
elasticsearch should be as well).

The real-world use-case I have in mind here is writing to two indexes (two elasticsearch clusters
in different data centers) but reading from only one. Then toggling the 'read' property when
we want to migrate to the other data center (and when we migrate from elasticsearch 2.x to 5.x)

Hopefully this is useful in the upstream as well.

Remaining TODO:

* test cases
* documentation

Test Plan:
(WARNING) This will most likely require the elasticsearch index to be deleted and re-created due to schema changes.

Tested with elasticsearch versions 2.4 and 5.2 using the following config:

```lang=json
  "cluster.search": [
    {
      "type": "elasticsearch",
      "hosts": [
        {
          "host": "localhost",
          "roles": { "read": true, "write": true }
        }
      ],
      "port": 9200,
      "protocol": "http",
      "path": "/phabricator",
      "version": 5
    },
    {
      "type": "mysql",
      "roles": { "write": true }
     }
  ]

Also deployed the same changes to Wikimedia's production Phabricator instance without any issues whatsoever.
```

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin, epriestley

Tags: #elasticsearch, #clusters, #wikimedia

Differential Revision: https://secure.phabricator.com/D17384
This commit is contained in:
Mukunda Modell
2017-03-26 08:16:47 +00:00
committed by 20after4
parent a41d158490
commit e41c25de50
36 changed files with 1411 additions and 378 deletions

View File

@@ -0,0 +1,76 @@
@title Cluster: Search
@group cluster
Overview
========
You can configure phabricator to connect to one or more fulltext search clusters
running either Elasticsearch or MySQL. By default and without further
configuration, Phabricator will use MySQL for fulltext search. This will be
adequate for the vast majority of users. Installs with a very large number of
objects or specialized search needs can consider enabling Elasticsearch for
better scalability and potentially better search results.
Configuring Search Services
===========================
To configure an Elasticsearch service, use the `cluster.search` configuration
option. A typical Elasticsearch configuration will probably look similar to
the following example:
```lang=json
{
"cluster.search": [
{
"type": "elasticsearch",
"hosts": [
{
"host": "127.0.0.1",
"roles": { "write": true, "read": true }
}
],
"port": 9200,
"protocol": "http",
"path": "/phabricator",
"version": 5
},
],
}
```
Supported Options
-----------------
| Key | Type |Comments|
|`type` | String |Engine type. Currently, 'elasticsearch' or 'mysql'|
|`protocol`| String |Either 'http' or 'https'|
|`port`| Int |The TCP port that Elasticsearch is bound to|
|`path`| String |The path portion of the url for phabricator's index.|
|`version`| Int |The version of Elasticsearch server. Supports either 2 or 5.|
|`hosts`| List |A list of one or more Elasticsearch host names / addresses.|
Host Configuration
------------------
Each search service must have one or more hosts associated with it. Each host
entry consists of a `host` key, a dictionary of roles and can optionally
override any of the options that are valid at the service level (see above).
Currently supported roles are `read` and `write`. These can be individually
enabled or disabled on a per-host basis. A typical setup might include two
elasticsearch clusters in two separate datacenters. You can configure one
cluster for reads and both for writes. When one cluster is down for maintenance
you can simply swap the read role over to the backup cluster and then proceed
with maintenance without any service interruption.
Monitoring Search Services
==========================
You can monitor fulltext search in {nav Config > Search Servers}. This interface
shows you a quick overview of services and their health.
The table on this page shows some basic stats for each configured service,
followed by the configuration and current status of each host.
NOTE: This page runs its diagnostics //from the web server that is serving the
request//. If you are recovering from a disaster, the view this page shows
may be partial or misleading, and two requests served by different servers may
see different views of the cluster.