Write "Why does Phabricator need so many databases?"
Summary: We will sell you as many new databases as you want, cheap! Just $1 per database! Test Plan: (O).(O) Reviewers: chad Reviewed By: chad Differential Revision: https://secure.phabricator.com/D15249
This commit is contained in:
		| @@ -28,11 +28,10 @@ Databases | ||||
| ========= | ||||
|  | ||||
| Each Phabricator application has its own database. The names are prefixed by | ||||
| `phabricator_` (this is configurable). This design has two advantages: | ||||
| `phabricator_` (this is configurable). | ||||
|  | ||||
|   - Each database is easier to comprehend and to maintain. | ||||
|   - We don't do cross-database joins so each database can live on its own | ||||
|     machine. This gives us flexibility in sharding data later. | ||||
| Phabricator uses a separate database for each application. To understand why, | ||||
| see @{article:Why does Phabricator need so many databases?}. | ||||
|  | ||||
| Connections | ||||
| =========== | ||||
|   | ||||
							
								
								
									
										131
									
								
								src/docs/flavor/so_many_databases.diviner
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										131
									
								
								src/docs/flavor/so_many_databases.diviner
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,131 @@ | ||||
| @title Why does Phabricator need so many databases? | ||||
| @group lore | ||||
|  | ||||
| Phabricator uses about 60 databases (and we may have added more by the time you | ||||
| read this document). This sometimes comes as a surprise, since you might assume | ||||
| it would only use one database. | ||||
|  | ||||
| The approach we use is designed to work at scale for huge installs with many | ||||
| thousands of users. We care a lot about working well for large installs, and | ||||
| about scaling up gracefully to meet the needs of growing organizations. We want | ||||
| small startups to be able to install Phabricator and have it grow with them as | ||||
| they expand to many thousands of employees. | ||||
|  | ||||
| A cost of this approach is that it makes Phabricator more difficult to install | ||||
| on shared hosts which require a lot of work to create or authorize access to | ||||
| each database. However, Phabricator does a lot of advanced or complex things | ||||
| which are difficult to configure or manage on shared hosts, and we don't | ||||
| recommend installing it on a shared host. The install documentation explicitly | ||||
| discouarges installing on shared hosts. | ||||
|  | ||||
| Broadly, in cases where we must choose between operating well at scale for | ||||
| growing organizations and installing easily on shared hosts, we prioritize | ||||
| operating at scale. | ||||
|  | ||||
|  | ||||
| Listing Databases | ||||
| ================= | ||||
|  | ||||
| You can get a full list of the databases Phabricator needs with `bin/storage | ||||
| databases`. It will look something like this: | ||||
|  | ||||
| ``` | ||||
| $ /core/lib/phabricator/bin/storage databases | ||||
| secure_audit | ||||
| secure_calendar | ||||
| secure_chatlog | ||||
| secure_conduit | ||||
| secure_countdown | ||||
| secure_daemon | ||||
| secure_differential | ||||
| secure_draft | ||||
| secure_drydock | ||||
| secure_feed | ||||
| ...<dozens more databases>... | ||||
| ``` | ||||
|  | ||||
| Roughly, each application has its own database, and then there are some | ||||
| databases which support internal systems or shared infrastructure. | ||||
|  | ||||
|  | ||||
| Operating at Scale | ||||
| ================== | ||||
|  | ||||
| This storage design is aimed at large installs that may need more than one | ||||
| physical database server to handle the load the install generates. | ||||
|  | ||||
| The primary reason we a database per application is to allow large installs to | ||||
| scale up by spreading database load across more hardware. A large organization | ||||
| with many thousands of active users may find themselves limited by the capacity | ||||
| of a single database backend. | ||||
|  | ||||
| If so, they can launch a second backend, move some applications over to it, and | ||||
| continue piling on more users. | ||||
|  | ||||
| This can't continue forever, but provides a substantial amount of headroom for | ||||
| large installs to spread the workload across more hardware and continue scaling | ||||
| up. | ||||
|  | ||||
| To make this possible, we put each application in its own database and use | ||||
| database boundaries to enforce the logical constraints that the application | ||||
| must have in order for this to work. For example, we can not perform joins | ||||
| between separable tables, because they may not be on the same hardware. | ||||
|  | ||||
| Establishing boundaries with application databases is a simple, straightforward | ||||
| way to partition storage and make administrative operations like spreading load | ||||
| realistic. | ||||
|  | ||||
|  | ||||
| Ease of Development | ||||
| =================== | ||||
|  | ||||
| This design is also easier for us to work with, and easier for users who | ||||
| want to work with the raw database data to understand and interact with. | ||||
|  | ||||
| We have a large number of tables (more than 400) and we can not reasonably | ||||
| reduce the number of tables very much (each table generally represents some | ||||
| meaningful type of object in some application0. It's easier to develop with | ||||
| tables which are organized into separate application databases, just like it's | ||||
| easier to work with a large project if you organize source files into | ||||
| directories. | ||||
|  | ||||
| If you aren't developing Phabricator and never look at the data in the | ||||
| database, you probably don't benefit from this organization. However, if you | ||||
| are a developer or want to extend Phabricator or look under the hood, it's | ||||
| easier to find what you're looking for and work with the tables and data when | ||||
| they're organized by application. | ||||
|  | ||||
|  | ||||
| Databases Have No Cost | ||||
| ====================== | ||||
|  | ||||
| In almost all cases, creating databases has zero cost, just like organizing | ||||
| source code into directories has zero cost. | ||||
|  | ||||
| Even if we didn't derive enormous benefits from this approach at scale, there | ||||
| is little reason //not// to organize storage like this. | ||||
|  | ||||
| There are a handful of administrative tasks which are very slightly more | ||||
| complex to perform on multiple databases, but these are all either automated | ||||
| with `bin/storage` or easy to build on top of the list of databases emitted by | ||||
| `bin/storage databases`. | ||||
|  | ||||
| For example, you can dump all the databases with `bin/storage dump`, and you | ||||
| can destroy all the databases with `bin/storage destroy`. | ||||
|  | ||||
| As mentioned above, an exception to this is that if you're installing on a | ||||
| shared host and need to jump through hoops to individually authorize access to | ||||
| each database, databases do cost something. | ||||
|  | ||||
| However, this cost is an artificial cost imposed by the selected environment, | ||||
| and this is only the first of many issues you'll run into trying to install and | ||||
| run Phabricator on a shared host. These issues are why we strongly discourage | ||||
| using shared hosts, and recommend against them in the install guide. | ||||
|  | ||||
|  | ||||
| Next Steps | ||||
| ========== | ||||
|  | ||||
| Continue by: | ||||
|  | ||||
|   - learning more about databases in @{article:Database Schema}. | ||||
		Reference in New Issue
	
	Block a user
	 epriestley
					epriestley