blender-open-data/ARCHITECTURE.md

7.7 KiB

Architecture

To illustrate how the services are tied together it is important to first know where the data is stored, what it looks like, which service has the ownership, and if the data is considered to be public. This is because the ties between services can be interpreted as the flow of data.

Data model

All data is stored in PostgreSQL. The reason for this is simplicity and maintenance burden. We have simplicity due to the fact that all data is in one place where it can be cross-referenced easily and transformed using SQL (which is also very capable of dealing with semi-structured data nowadays). We have a low maintenance burden because the PostgreSQL instance is shared between other Blender services, so we only pay that cost once.

All models are owned by /website meaning that all models are defined by this service. This implies that we consider PostgreSQL to be part of this service.

Main models

There are currently two main models around which the whole system is build. These models are called RawBenchmarks and Benchmarks.

RawBenchmarks

RawBenchmarks is a semi-structured model which consists of system information and benchmark times. The reason it is only semi-structured is that we want the data samples to be immutable, but still allow us to change the schema over time. Immutability is important for transparency and reproducibility. It is defined using Django. This model is considered public, and all users can download daily snapshots of the whole table.

Benchmarks

Benchmarks is the structured and indexed model derived from the RawBenchmarks which is used for display and querying purposes. It contains the subset of information we need to visualize the results as well as the verified and anonymity status of the corresponding user. You might ask yourself why we not just use the RawBenchmarks directly. This is because working with semi-structured data becomes complex really quickly, we isolate this complexity by centralizing the data normalization. This model is defined using SQL. Since this model is derived directly from RawBenchmarks it is also considered public.

User/authentication models

Because benchmarks are run and owned by users we inevitably need to model them. All models pertaining to users are considered private unless the user explicitly wants his data to be public.

Users

Users are provided by Django together with Blender ID.

RawBenchmarkOwnership

RawBenchmarkOwnership are used to track ownership of benchmarks. It contains nothing more than a pointer to a User and a RawBenchmark. A natural question to ask is why not just make a RawBenchmark point to a User directly? This is because this information is private. We want to be able to accommodate anonymous submissions and in order to do so we keep private and public information separate on a data level. It is defined using Django.

UserSettings

UserSettings is the model which contains the Open Data specific settings for a given User. It contains a flag signalling if a user wants his data to be anonymous and a flag signalling if we (Blender) trust the data of that specific user. If we trust his data the user is called a verified user. It is defined using Django. The anonymity and verified flag, if provided decoupled from the user, are considered public.

LauncherAuthenticationTokens

LauncherAuthenticationTokens are tokens that are used to authenticate the /launcher for a specific user. They contain a pointer to a User and a secret value, knowledge of which implies valid authentication for that user. It is defined using Django.

Metadata models

We are dealing with multiple Blender versions, scenes, benchmark scripts and launchers, to simplify dealing with that we store some centralized metadata. All metadata consists of at least a URL of where to get them and their checksums. All metadata is considered to be public.

Launchers

Launchers contains metadata for a specific version of /launcher. In addition to a URL and a checksum it also contains a flag signalling if this launcher is still supported. This allows us to enforce a minimum version and to point the user to where to get the latest version. It is defined using Django.

BenchmarkScripts

BenchmarkScripts contains metadata for a specific version of /benchmark_script. It is defined using Django.

Scenes

Scenes contains metadata for a specific Blender scene for which a benchmark can be run. It is defined using Django.

BlenderVersion

BenchmarkScripts consists of metadata about a specific version of /benchmark_script. Besides information about the Blender version it also points to a required BenchmarkScript and one or more available Scenes. It is defined using Django.

Data flow

Now we know what the data looks like we can talk about how and where it is exchanged between services.

User flow

Users are created and authenticated by deferring to Blender ID. Once a User is created a corresponding UserSettings instance is created using a Django signal.

Launcher authentication flow

When authenticating, a /launcher connects to /launcher_authenticator using a WebSocket and asks for a new token. In response the launcher receives a new (unverified) token and a URL pointing to /website at which the user can verify the token. After sending the response the /launcher_authenticator waits for the token to be verified using LISTEN in PostgreSQL. The /launcher directs the user to the verification URL and starts waiting on a verification signal from the /launcher_authenticator. After the user verifies the token /website updates the token as belonging to the user and being verified and notifies the /launcher_authenticator of this fact using NOTIFY in PostgreSQL. As soon as the /launcher_authenticator receives this signal it forwards it to the /launcher in addition to the name and email of the user. Next, to protect against the possibility of the verification URL being leaked, the /launcher asks for confirmation of the name and email. If the user confirms, the token is saved locally and can be used for authenticating the /launcher when submitting benchmarks.

Benchmark flow

When the user starts the /launcher all metadata is fetched from /website. After the user chooses the Blender versions and scenes all required assets will be downloaded by the /launcher according to this metadata. Once all assets are in place the /benchmark_script is invoked within the requested Blender version. The /benchmark_script gathers the required system information and starts the benchmark while reporting progress to the /launcher. After the benchmark is complete the /benchmark_script sends all gathered information to the /launcher. The /launcher then submits the resulting RawBenchmark to /website using the token obtained in the launcher authentication flow. As soon as the /website inserts the RawBenchmark into PostgreSQL a trigger fires which creates the corresponding indexed Benchmark.

UserSettings anonymity/verified flow

If the anonymity/verified flag is toggled on a UserSettings instance a trigger fires in PostgreSQL which updates all corresponding Benchmarks.