Display percentage of registered members that have not rated a Movie

Question

I have the following three tables. See full db<>fiddle here

members

member_id	first_name	last_name
1	Roby	Dauncey
2	Isa	Garfoot
3	Sullivan	Carletto
4	Jacintha	Beacock
5	Mikey	Keat
6	Cindy	Stenett
7	Alexina	Deary
8	Perkin	Bachmann
10	Suzann	Genery
39	Horatius	Baukham
41	Bendicty	Willisch

movies

movie_id	movie_name	movie_genre
10	The Bloody Olive	Comedy,Crime,Film-Noir
56	Attack of The Killer Tomatoes	(no genres listed)

ratings

rating_id	movie_id	member_id	rating
19	10	39	2
10	56	41	1

Now the question is:

Out of the total number registered members, how many have actually left a movie rating? Display the result as a percentage

This is what I have tried:

SELECT CONVERT(VARCHAR,(CONVERT(FLOAT,COUNT([Number of Members])) / CONVERT(FLOAT,COUNT(*)) * 100)) + '%' 
AS 'Members Percentage'
FROM (
    SELECT COUNT(*) AS 'Number of Members'
    FROM members
    WHERE member_id IN (
        SELECT member_id FROM members
        EXCEPT
        SELECT member_id FROM ratings
    )
) MembersNORatings

And my query result is displaying as 100%. Which is obvious that the result is wrong.

Members Percentage
100%

What I figured out was that in the first line of the query: COUNT(*) value is being recognized as the value equivalent to the alias [Number of Members]. That's why it is showing 100%.

I thought of replacing COUNT(*) with SELECT COUNT(*) FROM members but before I try to run the query, it was showing error saying

Incorrect Syntax near SELECT.

What change do I need to make in my existing query in order to get the proper percentage result?

A simple way is use an outer join with count() https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a57a85ce3fc8df99622eef12393ee8a8 — SOS, Apr 02 '22 at 05:22
@SOS ... Using LEFT JOIN makes sense as it returns all the members from members table that are common OR not common with the ratings table. It would be better if you could post it in the answer section. — pro_data, Apr 02 '22 at 06:01
@DaleK - Yeah, DISTINCT is always expensive, even more than APPLY. Though APPLY has higher scan count (table/index scan). A better option would be a cte. Especially if there's an index on member_id and movie_id. With an index on `rating`: Scan count 1, logical reads 2 (cte) the numbers are roughly, Scan count 50, logical reads 100 (your query) and Scan count 2, logical reads 105 (left join), and rough query plan comparison are 19% (cte) vs 23% (your query) and 58% (left join). — SOS, Apr 02 '22 at 06:30
*"..APPLY has higher scan count (table/index scan)"* ... with the index — SOS, Apr 02 '22 at 06:37

Dale K · Answer 1 · 2022-04-02T07:23:04.020

You can use a cross apply to determine using a sub-query whether a given member has left a rating or not (because you can't use a sub-query in an aggregation). Then divide (ensuring you use decimal division, not integer) to get the percentage.

select
    count(*) TotalMembers
    , sum(r.HasRating) TotalWithRatings
    , convert(decimal(9,2), 100 * sum(r.HasRating) / (count(*) * 1.0)) PercentageWithRatings
from #members m
cross apply (
    select case when exists (select 1 from #ratings r where r.member_id = m.member_id) then 1 else 0 end
) r (HasRating);

Returns:

TotalMembers	TotalWithRatings	PercentageWithRatings
50	2	4.00

2

Shouldn't the percentage be 4? – SOS Apr 02 '22 at 05:21

SOS · Accepted Answer · 2022-04-04T15:10:17.930

As mentioned in the comments, there are several ways to approach this. For example:

Option #1 - OUTER JOIN + DISTINCT

SELECT TotalMembers
       , TotalMembersWithRatings
       , CAST( 100.0 * TotalMembersWithRatings 
            / NULLIF(TotalMembers, 0 ) 
       AS DECIMAL(10,2)) AS MemberPercentage
FROM (
       SELECT COUNT(DISTINCT m.member_id) AS TotalMembers
            , COUNT(DISTINCT r.member_id) AS TotalMembersWithRatings
       FROM   members m LEFT JOIN ratings r ON r.member_id = m.member_id
     ) t

Option #2 - CTE + ROW_NUMBER()

WITH memberRatings AS (
   SELECT member_id, ROW_NUMBER() OVER(
             PARTITION BY member_id 
             ORDER BY member_id
          ) AS RowNum
   FROM  ratings
)
SELECT COUNT(mr.member_id) AS TotalMembers
       , COUNT(mr.member_id) AS TotalWithRatings
       , CAST( 100.0 * COUNT(mr.member_id) 
                 / NULLIF(COUNT(m.member_id), 0 ) 
       AS DECIMAL(10,2)) AS MemberPercentage
FROM   members m LEFT JOIN memberRatings mr ON mr.member_id = m.member_id
          AND mr.RowNum = 1

Option #3 - CROSS APPLY

SELECT
    COUNT(*) TotalMembers
    , SUM(r.HasRating) TotalWithRatings
    , CONVERT(decimal(9,2), 100 * sum(r.HasRating) / (count(*) * 1.0)) PercentageWithRatings
FROM members m
       CROSS APPLY (
          SELECT CASE WHEN exists (select 1 from ratings r where r.member_id = m.member_id) THEN 1 
                      ELSE 0 
                 END
       ) r (HasRating);

Execution Plans - Take #1

There's a LOT more to analyzing execution plans than just comparing a single number. However, high level plans do provide some useful indicators.

With the small data samples provided, the plans suggest options #2 (CTE) and #3 (APPLY) are likely to be the most performant (19%), and option #1 (OUTER JOIN + DISTINCT) the least at (63%), likely due to the count(distinct) which can often be slower than alternative options.

Original Sample Size:

TableName	TotalRows
movies	50
members	50
ratings	50

Execution Plans - Take #2

However, populate the tables with more than a few sample rows of data and the same rough comparison produces a different result. Option #2 (CTE) still seems likely to be the least expensive query (9%), but Option #3 (APPLY) is now the most expensive (76%). You can see the majority of that cost is the index spool used due to how APPLY operates:

New Sample Size

TableName	TotalRows
movies	4105
members	29941
ratings	14866

New Execution Plans

With the increased amount of data, STATISTICS IO shows option #2 has far less logical reads and scans and option #3 (APPLY) which as has the most. While Option #1, which appears to have a lower cost overall (15%) it still has a much higher number of logical reads. (Add a non-clustered index on member_id and movie_id and the numbers, while similar, change once again.) So don't just look at a single number.

New Statistics IO

While overall, option #2 (CTE) would seem likely to be most efficient, there are a lot of factors involved (indexes, data volume, statistics, version, etc), so you should examine the actual execution plans in your own environment.

As with most things, the answer as to which is best is: it depends.

Groiq · Answer 3 · 2022-05-04T16:46:06.187

Late to the party, but you don't need to join the tables if you only want to know how many members made a rating, not who.

What you need is

count entries in members table
count (distinct) members in ratings
get quota of 'rating' members (rating members divided by total members)
to get nonrating members, substract the quota from 1.0
multiply with 100 to get the percent value

This is how you could do the calculation step by step using CTEs:

with count_members as (
    select count(member_id) as member_count from members 
), count_raters as (
    select count(distinct member_id) as rater_count from ratings 
), convert_both as (
    select top 1
        cast(m.member_count as decimal(10,2)) as member_count,
        cast(r.rater_count as decimal(10,2)) as rater_count
    from count_members as m cross join count_raters as r
), calculate_quota as (
    select (rater_count / member_count) as quota from convert_both
), invert_quota as (
    select (1.0 - quota) as quota from calculate_quota
) 
select (quota * 100) as percentage from invert_quota;

Alternatively, that's how you could roll it all into one:

select (
    (1.0 - (
        cast((select count(distinct member_id) from ratings) as decimal(10,2))
        / 
        cast((select count(member_id) from members) as decimal(10,2))
    ) ) * 100
) as percentage;

dbfiddle here

Display percentage of registered members that have not rated a Movie

3 Answers3

Linked