Issue search problems #2

Closed
opened 2023-02-08 12:10:38 +01:00 by Francesco Siddi · 8 comments

Tasks

  • Deploy meilisearch
  • Fix missing results on dashboard (upstream pull)
  • Pagination to show more than 50 open + closed results (upstream pull)
  • Pull request search for common terms like "snap" finds nothing (upstream issue)

Analysis

  • Search-results are always limited to max 50 results (with 'db' and 'bleve' indexers)
  • The db indexer works poorly because it's just a simple LIKE comparison and not good full text search, but it's the only one working now.

Also see #3 regarding this issue.

When searching for anything, it would seem that the results-count is constrained to 50 max.
This is true for the 'ISSUE_INDEXER_TYPE = db' as well as the 'ISSUE_INDEXER_TYPE = bleve' settings; however there are significant differences in the details.

db

With 'db', the total amount of results is 50.. So 48 open/2 closed, or 16 open/34 closed... etc..

This is true for extremely common search-terms too, like 'linux', 'radeon','crash', etc...

No paging is provided for results beyond that.
Lunny reports on blender.chat that this is by design . The 'code search' result-page does have pagination. We might need to revisit this.

bleve

The 'bleve' engine does something similar, but it seems to 'forget' results it did return earlier.

The size of the 'issues.bleve' directory also seems to slowly grow during indexing, and then suddenly shrink sharply. This cycle repeats a few times while indexing is going on.

First reports seem to indicate this MIGHT be due to concurrent indexer-processes trying to manage/fill the issues.bleve cache... corrupting/overwriting it. This needs checking.

Originally created by Arnd.

## Tasks - [x] Deploy meilisearch - [x] Fix missing results on dashboard ([upstream pull](https://github.com/go-gitea/gitea/pull/24109)) - [x] Pagination to show more than 50 open + closed results ([upstream pull](https://github.com/go-gitea/gitea/pull/22704)) - [x] Pull request search for common terms like "snap" finds nothing ([upstream issue](https://github.com/go-gitea/gitea/issues/24662)) ## Analysis * Search-results are always limited to max 50 results (with 'db' and 'bleve' indexers) * The db indexer works poorly because it's just a simple LIKE comparison and not good full text search, but it's the only one working now. Also see #3 regarding this issue. When searching for anything, it would seem that the results-count is constrained to 50 max. This is true for the 'ISSUE_INDEXER_TYPE = db' as well as the 'ISSUE_INDEXER_TYPE = bleve' settings; however there are significant differences in the details. ### db With 'db', the total amount of results is 50.. So 48 open/2 closed, or 16 open/34 closed... etc.. This is true for extremely common search-terms too, like 'linux', 'radeon','crash', etc... No paging is provided for results beyond that. Lunny reports on blender.chat that this is by design . The 'code search' result-page does have pagination. We might need to revisit this. ### bleve The 'bleve' engine does something similar, but it seems to 'forget' results it did return earlier. The size of the 'issues.bleve' directory also seems to slowly grow during indexing, and then suddenly shrink sharply. This cycle repeats a few times while indexing is going on. First reports seem to indicate this MIGHT be due to concurrent indexer-processes trying to manage/fill the issues.bleve cache... corrupting/overwriting it. This needs checking. Originally created by Arnd.
Brecht Van Lommel added the
Type
Bug
label 2023-02-08 16:56:18 +01:00
Brecht Van Lommel added the
Type
Deployment
label 2023-02-08 18:22:50 +01:00
Arnd Marijnissen added the
Service
Gitea
label 2023-02-13 14:32:00 +01:00

While we are waiting for this to be fixed, are issues going to be indexed by search engines? Seems that's disabled now:
https://projects.blender.org/robots.txt

Is there a plan to look into allowing Google to index? Should there a separate issue for this? In general some kind of working search is important for triagers to do their work.

While we are waiting for this to be fixed, are issues going to be indexed by search engines? Seems that's disabled now: https://projects.blender.org/robots.txt Is there a plan to look into allowing Google to index? Should there a separate issue for this? In general some kind of working search is important for triagers to do their work.
Brecht Van Lommel changed title from Search results pagination is limited to 50 reults to Issue search problems 2023-02-16 22:18:59 +01:00

The db engine search works poorly, it's not proper text search but just a simple LIKE comparison.

Maybe we just need to try setting up elastic search for issues rather than getting db or bleve working better, both of which we know are not great.

The db engine search works poorly, it's not proper text search but just a simple LIKE comparison. Maybe we just need to try setting up elastic search for issues rather than getting db or bleve working better, both of which we know are not great.
Arnd Marijnissen was assigned by Francesco Siddi 2023-02-17 00:19:27 +01:00

Setting up elastic search for both issue and repo-indexing seems to be the logical thing to try out next. It will require some thought around how to properly support this so that we dont run into trouble, performance and disk-space wise.
Also, given that all these changes of indexers/parameters are requiring a restart of GITEA, it's not a good idea to test this on production.

I will invest time into getting a 'test.projects.blender.org' instance up that will aid in us being able to test intended changes of styling, webhooks and gitea itself (app-settings, etc).

Setting up elastic search for both issue and repo-indexing seems to be the logical thing to try out next. It will require some thought around how to properly support this so that we dont run into trouble, performance and disk-space wise. Also, given that all these changes of indexers/parameters are requiring a restart of GITEA, it's not a good idea to test this on production. I will invest time into getting a 'test.projects.blender.org' instance up that will aid in us being able to test intended changes of styling, webhooks and gitea itself (app-settings, etc).

@lunny's WIP pagination PR: https://github.com/go-gitea/gitea/pull/22704
@fsiddi's alternative idea: https://github.com/go-gitea/gitea/issues/20665

Adding a new search engine would be via interface, an example of one can be found here: https://sourcegraph.com/github.com/go-gitea/gitea/-/blob/modules/indexer/issues/

@lunny's WIP pagination PR: https://github.com/go-gitea/gitea/pull/22704 @fsiddi's alternative idea: https://github.com/go-gitea/gitea/issues/20665 Adding a new search engine would be via interface, an example of one can be found here: https://sourcegraph.com/github.com/go-gitea/gitea/-/blob/modules/indexer/issues/

WIP meilisearch PR: https://github.com/go-gitea/gitea/pull/23136

WIP due to missing docs, and pending maintainer review

WIP meilisearch PR: https://github.com/go-gitea/gitea/pull/23136 WIP due to missing docs, and pending maintainer review

Support for meilisearch landed in Gitea main last week, and is also in blender-merged-develop now.

Support for meilisearch landed in Gitea `main` last week, and is also in `blender-merged-develop` now.

Status update
Fixed:

  • Meilisearch has landed and has been deployed.
  • Missing results for cross-repo meilisearch was included upstream and included in blender-merged
    Waiting:
  • 50+ results seems to be planned for Gitea 1.21 which'll go into feature-freeze around september. No plans for 1.20 work for this.
  • PR search issue relies on rewrite of indexer-code, also planned for 1.21
Status update Fixed: - Meilisearch has landed and has been deployed. - Missing results for cross-repo meilisearch was included upstream and included in blender-merged Waiting: - 50+ results seems to be planned for Gitea 1.21 which'll go into feature-freeze around september. No plans for 1.20 work for this. - PR search issue relies on rewrite of indexer-code, also planned for 1.21

I think we can consider this resolved now.

I think we can consider this resolved now.
Sign in to join this conversation.
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: infrastructure/blender-projects-platform#2
No description provided.