Improve the ranking/list order of add-ons & themes based on their ratings #145

Closed
opened 2024-05-23 01:46:51 +02:00 by HugeMenace · 2 comments
Contributor

The current approach to sorting, both on the home page and in the general listing, results in a less-than-ideal order. Taking the mean of the ratings will artificially inflate the ranking of add-ons with only a few reviews, possibly burying add-ons that users would consider statistically more trustworthy.

Consider this scenario: add-on A has only one 5-star review, resulting in a mean of 5. On the other hand, add-on B has twenty 5-star reviews and five 4-star reviews, leading to a mean of 4.8. The current algorithm ranks add-on A higher, even though users would consider add-on B "better" overall.

There are many more sophisticated algorithms for determining the ranking of items using a 5-star rating system. One such example is this algorithm documented by Evan Miller: https://www.evanmiller.org/ranking-items-with-star-ratings.html

Here's an example implementation of the algorithm in Python that I've used to calculate the score, compared to the current mean score for a few add-ons on the home page.

import math

def starsort(ns):
    """
    http://www.evanmiller.org/ranking-items-with-star-ratings.html
    """
    N = sum(ns)
    K = len(ns)
    s = list(range(K,0,-1))
    s2 = [sk**2 for sk in s]
    z = 1.65
    def f(s, ns):
        N = sum(ns)
        K = len(ns)
        return sum(sk*(nk+1) for sk, nk in zip(s,ns)) / (N+K)
    fsns = f(s, ns)
    return fsns - z*math.sqrt((f(s2, ns)- fsns**2)/(N+K+1))


# Node Wrangler: https://extensions.blender.org/add-ons/node-wrangler/reviews/
# Mean Score: 5.00
# ----> Starsort Score: 3.55
print(starsort([7, 0, 0, 0, 0]))

# ND: https://extensions.blender.org/add-ons/nd/reviews/
# Mean Score: 4.98
# ----> Starsort Score: 4.56
print(starsort([39, 1, 0, 0, 0]))

# Orient and Origin to Selected: https://extensions.blender.org/add-ons/orient-and-origin-to-selected/reviews/
# Mean Score: 4.81
# ----> Starsort Score: 3.77
print(starsort([9, 2, 0, 0, 0]))

If the code example isn't clear, the Starsort function takes in an array of ratings from 5 to 1 stars. So an add-on with twenty 5-star reviews and ten 3-star reviews would be calculated as starsort([20, 0, 10, 0, 0]).

This formula uses the complete frequency distribution, not just the average number of stars. It's logical because an item with ten 5-star and ten 1-star ratings should be considered more uncertain than an item with twenty 3-star ratings.

Overall, this algorithm aligns more (albeit not identically) with the ranking algorithms employed by sites such as Amazon, eBay, etc., and will allow users to browse add-ons and themes more meaningfully.

The current approach to sorting, both on the home page and in the general listing, results in a less-than-ideal order. Taking the mean of the ratings will artificially inflate the ranking of add-ons with only a few reviews, possibly burying add-ons that users would consider statistically more trustworthy. Consider this scenario: add-on A has only one 5-star review, resulting in a mean of 5. On the other hand, add-on B has twenty 5-star reviews and five 4-star reviews, leading to a mean of 4.8. The current algorithm ranks add-on A higher, even though users would consider add-on B "better" overall. There are many more sophisticated algorithms for determining the ranking of items using a 5-star rating system. One such example is this algorithm documented by Evan Miller: https://www.evanmiller.org/ranking-items-with-star-ratings.html Here's an example implementation of the algorithm in Python that I've used to calculate the score, compared to the current mean score for a few add-ons on the home page. ```python import math def starsort(ns): """ http://www.evanmiller.org/ranking-items-with-star-ratings.html """ N = sum(ns) K = len(ns) s = list(range(K,0,-1)) s2 = [sk**2 for sk in s] z = 1.65 def f(s, ns): N = sum(ns) K = len(ns) return sum(sk*(nk+1) for sk, nk in zip(s,ns)) / (N+K) fsns = f(s, ns) return fsns - z*math.sqrt((f(s2, ns)- fsns**2)/(N+K+1)) # Node Wrangler: https://extensions.blender.org/add-ons/node-wrangler/reviews/ # Mean Score: 5.00 # ----> Starsort Score: 3.55 print(starsort([7, 0, 0, 0, 0])) # ND: https://extensions.blender.org/add-ons/nd/reviews/ # Mean Score: 4.98 # ----> Starsort Score: 4.56 print(starsort([39, 1, 0, 0, 0])) # Orient and Origin to Selected: https://extensions.blender.org/add-ons/orient-and-origin-to-selected/reviews/ # Mean Score: 4.81 # ----> Starsort Score: 3.77 print(starsort([9, 2, 0, 0, 0])) ``` If the code example isn't clear, the Starsort function takes in an array of ratings from 5 to 1 stars. So an add-on with twenty 5-star reviews and ten 3-star reviews would be calculated as `starsort([20, 0, 10, 0, 0])`. This formula uses the complete frequency distribution, not just the average number of stars. It's logical because an item with ten 5-star and ten 1-star ratings should be considered more uncertain than an item with twenty 3-star ratings. Overall, this algorithm aligns more (albeit not identically) with the ranking algorithms employed by sites such as Amazon, eBay, etc., and will allow users to browse add-ons and themes more meaningfully.
Author
Contributor

Further, I suggest the current tie-breaking order should be Starsort Score > Last Updated Date > Number of downloads > Name.

The current order emphasizes arguably irrelevant metadata.

https://projects.blender.org/infrastructure/extensions-website/src/branch/main/extensions/models.py#L206

Further, I suggest the current tie-breaking order should be Starsort Score > Last Updated Date > Number of downloads > Name. The current order emphasizes arguably irrelevant metadata. https://projects.blender.org/infrastructure/extensions-website/src/branch/main/extensions/models.py#L206
Pablo Vazquez added the
Type
Suggestion
label 2024-05-27 12:27:41 +02:00
Oleg-Komarov self-assigned this 2024-06-06 14:58:34 +02:00
Owner

Thank you for the suggestion!

I've just deployed the updates to rating sort key.
No tie-breaking implemented yet: current distribution of computed values does have some ties, and the biggest bucket is still non-rated extensions. It would slightly complicate the code structure right now for what I see as little gain: rated extensions fit in a couple of pages, and it's easy to look through all of them.
But as this changes with time, and we get more ratings, I think we will have to revisit this.

Also we keep Extension.average_score and use it to display average rating, since I've found that exposing rating_sortkey in place of average_score would be confusing:

compute_rating_sortkey([5] * 10) = 3.82

seeing average rating lower than minimum rating causes too many questions.

Thank you for the suggestion! I've just deployed the updates to rating sort key. No tie-breaking implemented yet: current distribution of computed values does have some ties, and the biggest bucket is still non-rated extensions. It would slightly complicate the code structure right now for what I see as little gain: rated extensions fit in a couple of pages, and it's easy to look through all of them. But as this changes with time, and we get more ratings, I think we will have to revisit this. Also we keep Extension.average_score and use it to display average rating, since I've found that exposing rating_sortkey in place of average_score would be confusing: compute_rating_sortkey([5] * 10) = 3.82 seeing average rating lower than minimum rating causes too many questions.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: infrastructure/extensions-website#145
No description provided.