Add a few more unit tests for the persistence layer. The goal is to have
100% coverage of the happy flow, to aid in conversion from GORM to sqlc.
No functional changes.
Add a function `shellSplit(string)` to the global namespace of job
compiler scripts. It splits a string into an array of strings using
shell/CLI semantics.
For example: `shellSplit("--python-expr 'print(1 + 1)'")` will return
`["--python-expr", "print(1 + 1)"]`.
This gives job type authors more control over how settings are presented
in Blender's job submission GUI. If a job setting does not define a
label, its `key` is used to generate one (like Flamenco 3.5 and older).
Note that this isn't used in the web interface yet.
Reword so that the section starts with the suggestion that each problem
has a solution. And make it an enumerated list to clarify the structure
of the answer.
Updated the troubleshooting section of the FAQ to include guidance on checking the firewall and potential third-party antivirus issues when the Worker cannot connect to the Manager. This enhances the user experience by addressing common connectivity issues more comprehensively.
Remove some Python 3.10 features to make the add-on compatible with py39.
This is the Python version that's bundled with Blender 2.93 LTS, for which
I got a request to see if it could be supported.
The Blender version still isn't officially supported, but this should make
things at least not immediately fail.
Reduce the log level from 'info' to 'debug' on some internal components
of Flamenco Worker. This makes the console output slightly less noisy,
and it's unlikely that these particular messages are commonly needed.
Add a Worker configuration option to configure the Linux out-of-memory
behaviour. Add `oom_score_adjust=500` to `flamenco-worker.yaml` to increase
the chance that Blender gets killed when the machine runs out of memory,
instead of Flamenco Worker itself.
As a safety measure, refuse to delete Workers from the Manager's database
when foreign key constraints are disabled.
In the long term, the underlying problem should be solved. This is a stop-
gap measure to ensure database consistency.
Before deleting a Worker Tag, check that foreign key constraints are
active for the current database connection.
Sometimes GORM decides to create a new database connection by itself,
without telling us, and then foreign key constraints are not active on
it. This commit is a workaround to avoid database corruption.
Move some of the Worker Tags test code into a function of its own, to have
a clearer separation between 'the test' and 'what needs to happen to do
this part of the test'.
Also it'll make an upcoming change easier to implement.
No functional changes.
Explicitly use the `--mode` flag for the webapp development server
(`vite`) to make the web frontend choose the appropriate HTTP and
WebSocket port to communicate with the backend. This also makes sure
that when accessing the frontend via `https://`, the websocket
connection uses `wss://`.
As a side-effect, this also makes port `:8081` usable in production
environments; it would assume it was the development server and try to
access the backend on port `:8080`.
Reviewed-on: #104296
Reviewed-by: Sybren A. Stüvel <sybren@blender.org>
There's still some confusion that this is a thing to solve, whereas it can
usually safely be ignored. Reduced the log level from Warn to Info to make
the message look more innocent.
Pass `-failfast` to the `go test` command, so that it immediately stops
on test failure. This prevents the need to scroll back to see the actual
error, at the expense of only seeing one failure at a time.
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.
No functional changes, except that when tests fail, they now fail
without panicking.
Back in the days when I wrote the code, I didn't know about the
`require` package yet. Using `require.NoError()` makes the test code
more straight-forward.
No functional changes, except that when tests fail, they now fail
without panicking.
Avoid these warnings on the console:
```
WARN (bpy.rna): source/blender/python/intern/bpy_rna.cc:1339
pyrna_enum_to_py: current value '0' matches no enum in 'Scene', 'Scene',
'flamenco_job_type'
```
The solution was two-fold:
- Use a non-empty string as the identifier for the 'Select a Job Type'
choice.
- Give the property a default value.
Set the default MQTT topic prefix to 'flamenco'. It can still be overridden
by the config in the YAML file, but it's nice to have a sensible default
when people don't configure this.
Increase the 'database open' timeout from 5 seconds to 1 minute. This
timeout also covers database migrations, and the recently added one that
adds a bunch of `NOT NULL` clauses could time out with the old 5 sec
limit.
The reason this takes long, is that SQLite doesn't directly support
adding `NOT NULL` clauses to columns. The only way to do this is to
create a new table with the desired schema, copy all data over, then
drop the old table. And with a big enough database, this takes time.
Instead of storing the cached manager info in the Blender preferences,
store the info in a JSON file. The file is located in the user prefs
folder (`~/.config/blender/{version}/config` on Linux).
This also reduces the number of 'refresh' operators to a single one, which
then fetches all necessary info from the Manager.
This fixes an issue (reported via chat) where worker tags were sometimes
not retained across file saves.
Fix the database migration that adds `NOT NULL` clauses. It used
`INSERT INTO temp_x SELECT * from x;`, and the `*` returns the fields in
the order they are defined on the table. Since this might be different from
the order that the `INSERT INTO temp_x` expects, strange problems can
happen where columns get swapped (or constraints can fail on columns that
they should not fail for, because they got fed data from a different
column).
This makes it easier to later also create `query_workesr.sql`,
`query_meta.sql` etc. so that the sqlc-generated code can follow the
same subdivision as the persistence service code itself.
No functional changes.
GORM has certain downsides:
- Code-first approach, where queries have to be translated to the Go code
required to execute them.
- GORM comes with its own SQLite implementation, which doesn't provide an
on-connect callback. This means that new connections cannot correctly
enable foreign key constraints, causing database consistency issues.
[SQLC](https://sqlc.dev/) solves these issues for us.
This commit doesn't fully replace GORM with SQLC, but introduces it for
a few queries. Once all queries have been converted, GORM can be removed
completely.
Split the header into two or three parts, depending on the number of
columns shown. The farm status indicator will be above the middle column
(in 3 col mode) or at the right edge of the left column (in 2 col mode).
Also I reverted the hiding of the farm status when SocketIO has
disconnected, as that disconnect happens when navigation between tabs.
That created a too 'blinky' interface, so now it just shows the last-known
farm status.
The database is polled every 30 seconds to determine the farm status; at
startup the first poll is done after 1 second to get a faster status.
Note that when jobs and workers change their status, the farm status is
always updated.
There are still issues with foreign keys getting disabled, so enable them
in the periodic database consistency check.
A more permanent solution is likely to drop GORM and switch to something
else that gives us an on-connect-callback, which can then be used to
turn on foreign key constraints for every connection made.
This introduces the concept of 'event listener', which is now used by
the farm status service to respond to events on the event bus.
This makes it possible to reduce the regular poll period from 5 to 30
seconds. That's now only necessary as backup, just in case events are
missed or otherwise things change without the event bus logic noticing.
SocketIO has 'rooms' and 'event types'. The 'event type' is set via
reflection of the OpenAPI type of the event payload. This has to be set
up in a mapping, though, and if that mapping is incomplete, an error will
now be logged.
Send an event to the event bus whenever the farm status changes. The event
contains a farm status report (like `{status: "active"}`), and is sent to
the `/status` topic.
Note that at this moment the status is only polled every X seconds, and
thus may lag behind other events.
Add a new API operation to get the overall farm status. This is based on
the jobs and workers, and their status.
The statuses are:
- `active`: Actively working on jobs.
- `idle`: Farm could be active, but has no work to do.
- `waiting`: Work has been queued, but all workers are asleep.
- `asleep`: Farm is idle, and all workers are asleep.
- `inoperative`: Cannot work: no workers, or all are offline/error.
- `starting`: Farm is starting up.
- `unknown`: Unexpected configuration of worker and job statuses.