1

I have the following three tables. See full db<>fiddle here

members

member_id first_name last_name
1 Roby Dauncey
2 Isa Garfoot
3 Sullivan Carletto
4 Jacintha Beacock
5 Mikey Keat
6 Cindy Stenett
7 Alexina Deary
8 Perkin Bachmann
10 Suzann Genery
39 Horatius Baukham
41 Bendicty Willisch

movies

movie_id movie_name movie_genre
10 The Bloody Olive Comedy,Crime,Film-Noir
56 Attack of The Killer Tomatoes (no genres listed)

ratings

rating_id movie_id member_id rating
19 10 39 2
10 56 41 1

Now the question is:

Out of the total number registered members, how many have actually left a movie rating? Display the result as a percentage

This is what I have tried:

SELECT CONVERT(VARCHAR,(CONVERT(FLOAT,COUNT([Number of Members])) / CONVERT(FLOAT,COUNT(*)) * 100)) + '%' 
AS 'Members Percentage'
FROM (
    SELECT COUNT(*) AS 'Number of Members'
    FROM members
    WHERE member_id IN (
        SELECT member_id FROM members
        EXCEPT
        SELECT member_id FROM ratings
    )
) MembersNORatings

And my query result is displaying as 100%. Which is obvious that the result is wrong.

Members Percentage
100%

What I figured out was that in the first line of the query: COUNT(*) value is being recognized as the value equivalent to the alias [Number of Members]. That's why it is showing 100%.

I thought of replacing COUNT(*) with SELECT COUNT(*) FROM members but before I try to run the query, it was showing error saying

Incorrect Syntax near SELECT.

What change do I need to make in my existing query in order to get the proper percentage result?

SOS
  • 6,430
  • 2
  • 11
  • 29
pro_data
  • 151
  • 6
  • 1
    A simple way is use an outer join with count() https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a57a85ce3fc8df99622eef12393ee8a8 – SOS Apr 02 '22 at 05:22
  • @SOS ... Using LEFT JOIN makes sense as it returns all the members from members table that are common OR not common with the ratings table. It would be better if you could post it in the answer section. – pro_data Apr 02 '22 at 06:01
  • 1
    @DaleK - Yeah, DISTINCT is always expensive, even more than APPLY. Though APPLY has higher scan count (table/index scan). A better option would be a cte. Especially if there's an index on member_id and movie_id. With an index on `rating`: Scan count 1, logical reads 2 (cte) the numbers are roughly, Scan count 50, logical reads 100 (your query) and Scan count 2, logical reads 105 (left join), and rough query plan comparison are 19% (cte) vs 23% (your query) and 58% (left join). – SOS Apr 02 '22 at 06:30
  • *"..APPLY has higher scan count (table/index scan)"* ... with the index – SOS Apr 02 '22 at 06:37

3 Answers3

3

You can use a cross apply to determine using a sub-query whether a given member has left a rating or not (because you can't use a sub-query in an aggregation). Then divide (ensuring you use decimal division, not integer) to get the percentage.

select
    count(*) TotalMembers
    , sum(r.HasRating) TotalWithRatings
    , convert(decimal(9,2), 100 * sum(r.HasRating) / (count(*) * 1.0)) PercentageWithRatings
from #members m
cross apply (
    select case when exists (select 1 from #ratings r where r.member_id = m.member_id) then 1 else 0 end
) r (HasRating);

Returns:

TotalMembers TotalWithRatings PercentageWithRatings
50 2 4.00
Dale K
  • 25,246
  • 15
  • 42
  • 71
2

As mentioned in the comments, there are several ways to approach this. For example:

Option #1 - OUTER JOIN + DISTINCT

SELECT TotalMembers
       , TotalMembersWithRatings
       , CAST( 100.0 * TotalMembersWithRatings 
            / NULLIF(TotalMembers, 0 ) 
       AS DECIMAL(10,2)) AS MemberPercentage
FROM (
       SELECT COUNT(DISTINCT m.member_id) AS TotalMembers
            , COUNT(DISTINCT r.member_id) AS TotalMembersWithRatings
       FROM   members m LEFT JOIN ratings r ON r.member_id = m.member_id
     ) t

Option #2 - CTE + ROW_NUMBER()

WITH memberRatings AS (
   SELECT member_id, ROW_NUMBER() OVER(
             PARTITION BY member_id 
             ORDER BY member_id
          ) AS RowNum
   FROM  ratings
)
SELECT COUNT(mr.member_id) AS TotalMembers
       , COUNT(mr.member_id) AS TotalWithRatings
       , CAST( 100.0 * COUNT(mr.member_id) 
                 / NULLIF(COUNT(m.member_id), 0 ) 
       AS DECIMAL(10,2)) AS MemberPercentage
FROM   members m LEFT JOIN memberRatings mr ON mr.member_id = m.member_id
          AND mr.RowNum = 1

Option #3 - CROSS APPLY

SELECT
    COUNT(*) TotalMembers
    , SUM(r.HasRating) TotalWithRatings
    , CONVERT(decimal(9,2), 100 * sum(r.HasRating) / (count(*) * 1.0)) PercentageWithRatings
FROM members m
       CROSS APPLY (
          SELECT CASE WHEN exists (select 1 from ratings r where r.member_id = m.member_id) THEN 1 
                      ELSE 0 
                 END
       ) r (HasRating);

Execution Plans - Take #1

There's a LOT more to analyzing execution plans than just comparing a single number. However, high level plans do provide some useful indicators.

With the small data samples provided, the plans suggest options #2 (CTE) and #3 (APPLY) are likely to be the most performant (19%), and option #1 (OUTER JOIN + DISTINCT) the least at (63%), likely due to the count(distinct) which can often be slower than alternative options.

Original Sample Size:

TableName TotalRows
movies 50
members 50
ratings 50

Execution plan with sample data

Execution Plans - Take #2

However, populate the tables with more than a few sample rows of data and the same rough comparison produces a different result. Option #2 (CTE) still seems likely to be the least expensive query (9%), but Option #3 (APPLY) is now the most expensive (76%). You can see the majority of that cost is the index spool used due to how APPLY operates:

New Sample Size

TableName TotalRows
movies 4105
members 29941
ratings 14866

New Execution Plans

Execution plan with more data

With the increased amount of data, STATISTICS IO shows option #2 has far less logical reads and scans and option #3 (APPLY) which as has the most. While Option #1, which appears to have a lower cost overall (15%) it still has a much higher number of logical reads. (Add a non-clustered index on member_id and movie_id and the numbers, while similar, change once again.) So don't just look at a single number.

New Statistics IO Statistics IO with more data

While overall, option #2 (CTE) would seem likely to be most efficient, there are a lot of factors involved (indexes, data volume, statistics, version, etc), so you should examine the actual execution plans in your own environment.

As with most things, the answer as to which is best is: it depends.

SOS
  • 6,430
  • 2
  • 11
  • 29
0

Late to the party, but you don't need to join the tables if you only want to know how many members made a rating, not who.

What you need is

  • count entries in members table
  • count (distinct) members in ratings
  • get quota of 'rating' members (rating members divided by total members)
  • to get nonrating members, substract the quota from 1.0
  • multiply with 100 to get the percent value

This is how you could do the calculation step by step using CTEs:

with count_members as (
    select count(member_id) as member_count from members 
), count_raters as (
    select count(distinct member_id) as rater_count from ratings 
), convert_both as (
    select top 1
        cast(m.member_count as decimal(10,2)) as member_count,
        cast(r.rater_count as decimal(10,2)) as rater_count
    from count_members as m cross join count_raters as r
), calculate_quota as (
    select (rater_count / member_count) as quota from convert_both
), invert_quota as (
    select (1.0 - quota) as quota from calculate_quota
) 
select (quota * 100) as percentage from invert_quota;

Alternatively, that's how you could roll it all into one:

select (
    (1.0 - (
        cast((select count(distinct member_id) from ratings) as decimal(10,2))
        / 
        cast((select count(member_id) from members) as decimal(10,2))
    ) ) * 100
) as percentage;

dbfiddle here

Groiq
  • 1
  • 2