flamenco-manager

Archived

Author	SHA1	Message	Date
Sybren A. Stüvel	4fe11d99f6	Configurable name in dashboard Now the title & version is also dynamically updated with Vue.	2019-02-22 16:12:41 +01:00
Sybren A. Stüvel	84b9eb2b09	Bumped version to 2.4-dev5	2019-02-21 17:42:58 +01:00
Sybren A. Stüvel	141addc371	build-via-docker.sh: made bundle-creation conditional based on $TARGET This makes it possible to uncomment the if-target-specified-then-don't-bundle condition and bundle for the given target.	2019-02-21 17:41:10 +01:00
Sybren A. Stüvel	72761a60fa	Handle task timing metrics from the Worker They are simply stored & forwarded to the Server, no processing is done.	2019-02-21 17:19:20 +01:00
Sybren A. Stüvel	1143a5b957	Bumped version to 2.4-dev4	2019-02-21 13:45:39 +01:00
Sybren A. Stüvel	eb59c020dd	Worker cleanup: Requeue active tasks before deleting worker	2019-02-21 13:45:22 +01:00
Sybren A. Stüvel	c52c65d2b4	Worker cleanup: configurable set of statuses to auto-remove This allows us to configure the Manager to also auto-delete timed-out workers.	2019-02-21 12:11:53 +01:00
Sybren A. Stüvel	7bbbe3c0c5	Automatically delete offline workers After an initial delay (to allow workers to come back online after the Manager was down) the flamenco_workers collection is scaned for workers that have `status="offline"` and haven't been seen in longer than `worker_cleanup_max_age`. If that setting is zero, auto-removal is disabled.	2019-02-21 12:11:53 +01:00
Sybren A. Stüvel	7db66cb69d	Log server: human-readable sizes in 'Skipped ... bytes' message	2019-02-20 09:47:50 +01:00
Sybren A. Stüvel	ea917be44c	When serving log file, conditionally only show head + tail of the log When the task log file is uncompressed we only show the first X and last Y kilobytes of logging. When the log file is compressed this isn't (easily) possible, so then the file is sent as a GZipped attachment (so forcing the browser to download it to disk instead of loading it all in memory). When the user agent is WGet or Curl, always the entire log is served.	2019-02-19 18:44:19 +01:00
Sybren A. Stüvel	c0de578817	Bumped version to 2.4-dev3	2019-02-19 17:03:42 +01:00
Sybren A. Stüvel	3478225239	Updated changelog	2019-02-19 16:34:15 +01:00
Sybren A. Stüvel	5f9bc2e2d4	Limit number of workers that can retry a task after it failed This defaults to 3 workers, e.g. after three different workers have run the task and failed, it will not be soft-failed, but really stay failed.	2019-02-19 16:14:53 +01:00
Sybren A. Stüvel	e47780a633	Allow soft-failed tasks to be run by other workers For this we keep track of which worker failed which task (in `Task.FailedByWorkers`). The scheduler will not assign a worker with tasks it failed before. When there are no more workers left to run a task (either because of blacklisting or because all workers have tried & failed this particular task) the status will be 'failed', otherwise 'soft-failed'.	2019-02-19 15:38:42 +01:00
Sybren A. Stüvel	8f827a2eb2	TaskUpdateQueue::QueueTaskUpdateWithExtra now expects outer update dict The `extraUpdates` parameter should now be the "outer" update dict, so instead of passing `M{"field": "value-to-set"}` pass `M{"$set": M{"field": "value-to-set"}` This allows future code to pass things like `$unset` or `$addToSet`.	2019-02-19 15:38:36 +01:00
Sybren A. Stüvel	9639e1cb47	Typo fix	2019-02-19 15:38:36 +01:00
Sybren A. Stüvel	2dc054b23f	When a worker times out, its active task is now re-queued This makes it possible for a worker to disappear from the planet and still have the task finished by another worker. For this to work, the `active_task_timeout_interval` setting must be bigger than the `active_worker_timeout_interval` setting.	2019-02-15 17:26:10 +01:00
Sybren A. Stüvel	5a1b95f097	Bumped version to 2.4-dev2	2019-02-14 15:13:11 +01:00
Sybren A. Stüvel	338218f02a	Updated CHANGELOG.md	2019-02-14 15:13:06 +01:00
Sybren A. Stüvel	e9c67553a3	Soft-fail tasks when there are workers left to retry it When a Worker sends a task update with `status='failed'`, that status is actually overridden by the Manager to `status='soft-failed'` if there is a worker that is not blacklisted for that specific task type/job. This happens until the soft-failing worker is actually blacklisted, in which case it is assumed to be an issue with the worker. All the previously soft-failed tasks are set to `'claimed-by-manager'` so that they can be picked up by another worker.	2019-02-14 14:32:00 +01:00
Sybren A. Stüvel	d62e23d6f8	More detailed testing of task updates when blacklisting We now test the actually queued statuses, rather than just the queue size. This didn't uncover any errors, but is a good preparation for introducing new functionality in the future.	2019-02-14 11:52:52 +01:00
Sybren A. Stüvel	46dd7659d4	Bumped version to 2.4-dev1	2019-02-12 15:06:25 +01:00
Sybren A. Stüvel	72c46706ea	Fix T59491: Manager should detect starvation due to blacklisting After blacklisting, the tasks failed by the blacklisted worker are now only requeued if there is still a worker left who can execute it (based on worker's supported task types + blacklist).	2019-02-12 15:05:23 +01:00
Sybren A. Stüvel	0f9fb203b4	Send "this log file does not exist" as log file when it doesn't exist. When the Server asks for a log file that does not exist, just create a log file that states it does not exist, and send that. This makes the Server stop asking us for that file over and over again.	2019-01-11 18:20:24 +01:00
Sybren A. Stüvel	cfb5cc825d	Added missing return statement	2019-01-11 17:48:45 +01:00
Sybren A. Stüvel	84f1718a4b	Bumped version to 2.4-dev0	2019-01-11 11:06:18 +01:00
Sybren A. Stüvel	fa2e914245	Updated example config with more concrete variables Especially the Blender location on macOS is now more realistic.	2019-01-11 10:42:44 +01:00
Sybren A. Stüvel	916333dc25	Bumped version to 2.3 v2.3	2019-01-10 11:54:19 +01:00
Sybren A. Stüvel	5735bbee2e	Upload task log files when requested from the Flamenco Server The server can pass us (job ID, task ID) tuples in the response of the 'task-update-batch' endpoint. These tuples are then used to find the task's log file, compress it, and send it to the Flamenco Manager. The queue of logfiles to send is maintained by the Server. This means we'll repeatedly get the same (job ID, task ID) until we've actually uploaded the logfile to the Server's satisfaction. As a result, we don't persist the requested IDs, but rely on the server to pass us the list again if need be.	2019-01-09 17:00:00 +01:00
Sybren A. Stüvel	65c74bc303	Less strict timeout checks This makes the unit test less likely to fail while the computer is already doing other stuff.	2019-01-09 17:00:00 +01:00
Sybren A. Stüvel	135f195d9c	Don't pass pointer to array Arrays are by-reference structures already, so no need to use pointers.	2019-01-09 14:20:01 +01:00
Sybren A. Stüvel	ea368d5c9b	Dashboard: added checkbox to (de)select all workers	2018-12-18 15:53:24 +01:00
Sybren A. Stüvel	e85a902fb7	Include ffmpeg variable in default settings	2018-12-18 14:35:16 +01:00
Sybren A. Stüvel	9a84e8cb7d	Dashboard: Fixed tiling issue on latest-image viewer	2018-12-18 14:17:39 +01:00
Sybren A. Stüvel	a2377990be	Dashboard: Hide blacklist header when the blacklist is empty	2018-12-18 12:20:48 +01:00
Sybren A. Stüvel	9feccdfe4a	Dashboard: reduced number of columns in worker table The 'blacklist' toggle now toggles 'details' instead, which consists of the blacklist and worker details (currently ID and Address).	2018-12-18 12:17:47 +01:00
Sybren A. Stüvel	cc827807bb	Bumped version to 2.3-dev2	2018-12-18 10:53:50 +01:00
Sybren A. Stüvel	c5d6b6c6c2	Update changelog	2018-12-18 10:53:41 +01:00
Sybren A. Stüvel	cfe561c79e	Fix T58779: allow lazy status change requests Status changes can now be marked as 'lazy', in which case they are only applied when the worker has finished its current task. This only required changes to the 'may-I-run' endpoint; it now ignores lazy requests.	2018-12-18 10:51:55 +01:00
Sybren A. Stüvel	e64ffe098d	Compatibility with older MongoDB	2018-12-17 17:18:05 +01:00
Sybren A. Stüvel	0742684326	Update changelog	2018-12-17 17:11:15 +01:00
Sybren A. Stüvel	f15e445baa	Dashboard: make worker blacklist visible	2018-12-17 17:08:17 +01:00
Sybren A. Stüvel	43050af48b	Vue.js: no need for `<tr is="worker-row">` in `<script>` template The browser would pop out a `<worker-row>` element from a table because it ejects all non-`<tr>` elements there. Apparently this doesn't happen when the template is in a `<script type='text/x-template'>` tag, so we can simplify.	2018-12-17 15:16:47 +01:00
Sybren A. Stüvel	a6e5900f09	Fix T50981 Worker deallocation from job if fails n tasks When a Worker notifies the Manager a task failed, the number of failed tasks of this worker, on this job, of the same task type as the currently failed task is counted. If this count is above a threshold, the (worker ID, job ID, task type) tuple is added to the blacklist. This prevents the worker from getting such tasks. Matching failed tasks are re-queued so that they can be executed by another worker. This requies a new setting `blacklist_threshold`, which indicates the number of failed tasks at which the above behaviour is triggered. It defaults to 3. This means that it's likely that we should also increase the TASK_FAIL_JOB_PERCENTAGE constant in Flamenco Server so that it's more lenient towards failure (as excessive failure will trigger requeueing anyway). Note that there is NO starvation detection. In other words, if a job has certain tasks that were failed by all available workers (and thus all workers are blacklisted for this job & task type) there is no detection that this happened. As a result, the job will be stuck in 'active' status without it ever having a chance of being finished.	2018-12-17 14:28:27 +01:00
Sybren A. Stüvel	014788af1d	Ignore files in default task logs directory	2018-12-17 14:26:15 +01:00
Sybren A. Stüvel	1cf3c1971f	Fixed bug in sleep scheduler It wouldn't handle implicit end times properly when computing the 'next check' timestamp. They are now correctly interpreted as 'midnight the next day'.	2018-12-17 14:26:08 +01:00
Sybren A. Stüvel	5fb98d6591	Dashboard: shortening more (time display + task ID)	2018-12-17 14:05:19 +01:00
Sybren A. Stüvel	e9d6cb017f	Sleep scheduler: only log at debug level when there is nothing to do	2018-12-14 16:39:18 +01:00
Sybren A. Stüvel	0c217968fa	Formatting	2018-12-14 16:24:33 +01:00
Sybren A. Stüvel	b63917b47c	Sorted services in main.go	2018-12-14 16:13:52 +01:00

1 2 3 4 5 ...

Download

What's New

Roadmap

Documentation

Blender Studio

Manual

Benchmark

Blender Conference

Development Fund

One-time Donations

1958 Commits