Gitea server running out of disk space due to archive downloads #29

Closed
opened 2023-02-13 23:53:28 +01:00 by Brecht Van Lommel · 2 comments

What appears to be happening is that something is downloading a lot of archives. That is a .tar.gz or .bundle as can be found for example on this page by clicking on the "..." menu next to the Git URL.
https://projects.blender.org/blender/blender/

Such packages are being downloaded for various revisions, and each is about 500MB for the Blender repository. Each archive is cached for some time and adds up quickly.

This is supposed to be cleared regularly by cron.archive_cleanup, but that doesn't seem to be working and leaving behind older files.

Even if it was working, generating 100s of GB every day just to discard it is not good either. So we should block whatever is downloading these archives, or at least block it from downloading projects.blender.org/*/*/archive/*.

What appears to be happening is that something is downloading a lot of archives. That is a .tar.gz or .bundle as can be found for example on this page by clicking on the "..." menu next to the Git URL. https://projects.blender.org/blender/blender/ Such packages are being downloaded for various revisions, and each is about 500MB for the Blender repository. Each archive is cached for some time and adds up quickly. This is supposed to be cleared regularly by `cron.archive_cleanup`, but that doesn't seem to be working and leaving behind older files. Even if it was working, generating 100s of GB every day just to discard it is not good either. So we should block whatever is downloading these archives, or at least block it from downloading `projects.blender.org/*/*/archive/*`.
Brecht Van Lommel added the
Type
Bug
Type
Deployment
labels 2023-02-13 23:53:28 +01:00

Are you able to check the logs and see what is triggering these archive generting events? Is it a sewtch crawler, some other automated tooling, or is it user behaviour?

Based on the answers to the above, perhaps some ratelimiting could be added, and known crawlers be blocked from the archive endpoint.

Are you able to check the logs and see what is triggering these archive generting events? Is it a sewtch crawler, some other automated tooling, or is it user behaviour? Based on the answers to the above, perhaps some ratelimiting could be added, and known crawlers be blocked from the archive endpoint.
Author
Owner

Closing as a duplicate of #32.

Closing as a duplicate of #32.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: infrastructure/blender-projects-platform#29
No description provided.