Using Percentiles to Bucketize in Beast Mode Query

bdx
bdx ⚪️

Hi All,

I have a table with a text dimension (Name) and an associated measure (Score).

I wish to do the following:

  1. Calculate if the Score for each Name is in the top 20th percentile of all entities (highest)
  2. If the Name is inside the 20th percentile, do not update the Name.
  3. If the name is outside of the 20th percentile, change the Name to 'Other'

TIA

Answers

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🔴

    You can do this by using Magic ETL. Use the Rank & Window tile and Rank your scores. Add a Group By tile to get a Count of the total rows in your dataset. Join it back to the Rank & Window tile so that the total row count is now a column next to every rank. You can then do the math in the ETL or in a Beast mode to determine the percentile. You can also change the name in the ETL if you want as well so you don't have to do any beast modes.

    Hope this makes sense.

  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @bdx

    You can do it with a beast mode which will be responsive to any filtering you apply to your card (the percentiles will be recalculated with respect to your filters)

    CASE WHEN
    
     SUM(SUM(1)) OVER (ORDER BY `Score`) / SUM(SUM(1)) OVER () > .2 THEN ‘Other’ ELSE `Name` END
    
    
  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    while @GrantSmith 's solution will create the buckets, you will still have one row on the axis for each value. it will not aggregate to the bucket level. (in other words you'll see 'other' and 'name' multiple times on the axis instead of having one row "other" and one row "name"

    if that's your desired outcome, you must pre-aggregate the data as Mark has suggested or use DataSet Views to create a similar outcome. Because the data is pre-aggregated and 'bucketized' in ETL or a View, you'll be able to then aggregate on that column.