blender-open-data/ARCHITECTURE.md

155 lines
7.7 KiB
Markdown

# Architecture
To illustrate how the services are tied together it is important to first know
where the data is stored, what it looks like, which service has the
ownership, and if the data is considered to be public. This is because the ties
between services can be interpreted as the flow of data.
## Data model
All data is stored in PostgreSQL. The reason for this is simplicity and
maintenance burden. We have simplicity due to the fact that all data is in one
place where it can be cross-referenced easily and transformed using SQL (which
is also very capable of dealing with semi-structured data nowadays). We have
a low maintenance burden because the PostgreSQL instance is shared between
other Blender services, so we only pay that cost once.
All models are owned by `/website` meaning that all models are defined by this
service. This implies that we consider PostgreSQL to be part of this service.
### Main models
There are currently two main models around which the whole system is build.
These models are called `RawBenchmarks` and `Benchmarks`.
#### RawBenchmarks
`RawBenchmarks` is a semi-structured model which consists of system information
and benchmark times. The reason it is only semi-structured is that we want the
data samples to be immutable, but still allow us to change the schema over time.
Immutability is important for transparency and reproducibility. It is
[defined](website/opendata_main/models/benchmarks.py) using Django. This model
is considered **public**, and all users can download daily snapshots of the
whole table.
#### Benchmarks
`Benchmarks` is the structured and indexed model derived from the
`RawBenchmarks` which is used for display and querying purposes. It contains the
subset of information we need to visualize the results as well as the verified
and anonymity status of the corresponding user. You might ask yourself why we
not just use the `RawBenchmarks` directly. This is because working with
semi-structured data becomes complex really quickly, we isolate this complexity
by centralizing the data normalization. This model is
[defined](website/opendata_main/migrations/0002_benchmarks.sql) using SQL. Since
this model is derived directly from `RawBenchmarks` it is also considered
**public**.
### User/authentication models
Because benchmarks are run and owned by users we inevitably need to model them.
All models pertaining to users are considered **private** unless the user
explicitly wants his data to be public.
#### Users
`Users` are provided by Django together with Blender ID.
#### RawBenchmarkOwnership
`RawBenchmarkOwnership` are used to track ownership of benchmarks. It contains
nothing more than a pointer to a `User` and a `RawBenchmark`. A natural question
to ask is why not just make a `RawBenchmark` point to a `User` directly? This
is because this information is **private**. We want to be able to accommodate
anonymous submissions and in order to do so we keep private and public
information separate on a data level. It is
[defined](website/opendata_main/models/benchmarks.py) using Django.
#### UserSettings
`UserSettings` is the model which contains the _Open Data specific_ settings for
a given `User`. It contains a flag signalling if a user wants his data to be
anonymous and a flag signalling if we (Blender) trust the data of that specific
user. If we trust his data the user is called a **verified user**. It is
[defined](website/opendata_main/models/user_settings.py) using Django. The
anonymity and verified flag, if provided decoupled from the user, are considered
**public**.
#### LauncherAuthenticationTokens
`LauncherAuthenticationTokens` are tokens that are used to authenticate the
`/launcher` for a specific user. They contain a pointer to a `User` and a
secret value, knowledge of which implies valid authentication for that user.
It is [defined](website/opendata_main/models/tokens.py) using Django.
### Metadata models
We are dealing with multiple Blender versions, scenes, benchmark scripts and
launchers, to simplify dealing with that we store some centralized metadata. All
metadata consists of at least a URL of where to get them and their checksums.
All metadata is considered to be **public**.
#### Launchers
`Launchers` contains metadata for a specific version of `/launcher`. In
addition to a URL and a checksum it also contains a flag signalling if this
launcher is still supported. This allows us to enforce a minimum version and to
point the user to where to get the latest version. It is
[defined](website/opendata_main/models/metadata.py) using Django.
#### BenchmarkScripts
`BenchmarkScripts` contains metadata for a specific version of
`/benchmark_script`. It is [defined](website/opendata_main/models/metadata.py)
using Django.
#### Scenes
`Scenes` contains metadata for a specific Blender scene for which a
benchmark can be run. It is [defined](website/opendata_main/models/metadata.py)
using Django.
#### BlenderVersion
`BenchmarkScripts` consists of metadata about a specific version of
`/benchmark_script`. Besides information about the Blender version it also
points to a required `BenchmarkScript` and one or more available `Scenes`.
It is [defined](website/opendata_main/models/metadata.py) using Django.
## Data flow
Now we know what the data looks like we can talk about how and where it is
exchanged between services.
### User flow
`Users` are created and authenticated by deferring to Blender ID. Once a `User`
is created a corresponding `UserSettings` instance is
[created](website/opendata_main/signals.py) using a Django signal.
### Launcher authentication flow
When authenticating, a `/launcher` connects to `/launcher_authenticator` using a
WebSocket and asks for a new token. In response the launcher receives a new
(unverified) token and a URL pointing to `/website` at which the user can verify
the token. After sending the response the `/launcher_authenticator` waits for
the token to be verified using `LISTEN` in PostgreSQL. The `/launcher` directs
the user to the verification URL and starts waiting on a verification signal
from the `/launcher_authenticator`. After the user verifies the token `/website`
updates the token as belonging to the user and being verified and notifies the
`/launcher_authenticator` of this fact using `NOTIFY` in PostgreSQL. As soon as
the `/launcher_authenticator` receives this signal it forwards it to the
`/launcher` in addition to the name and email of the user. Next, to protect
against the possibility of the verification URL being leaked, the `/launcher`
asks for confirmation of the name and email. If the user confirms, the token
is saved locally and can be used for authenticating the `/launcher` when
submitting benchmarks.
### Benchmark flow
When the user starts the `/launcher` all metadata is fetched from `/website`.
After the user chooses the Blender versions and scenes all required assets will
be downloaded by the `/launcher` according to this metadata. Once all assets are
in place the `/benchmark_script` is invoked within the requested Blender
version. The `/benchmark_script` gathers the required system information and
starts the benchmark while reporting progress to the `/launcher`. After the
benchmark is complete the `/benchmark_script` sends all gathered information to
the `/launcher`. The `/launcher` then submits the resulting `RawBenchmark` to
`/website` using the token obtained in the
[launcher authentication flow](#launcher-authentication-flow). As soon as the
`/website` inserts the `RawBenchmark` into PostgreSQL a
[trigger](website/opendata_main/migrations/0002_benchmarks.sql) fires which
creates the corresponding indexed `Benchmark`.
### UserSettings anonymity/verified flow
If the anonymity/verified flag is toggled on a `UserSettings` instance a
[trigger](website/opendata_main/migrations/0002_benchmarks.sql) fires in
PostgreSQL which updates all corresponding `Benchmarks`.