How to identify and visualize the Top N percent

sem
sem ⚪️
edited April 20 in Card Building

Given a set of companies and a revenue number associated with each one, sorted by revenue in a descending manner, I'd like to visualize the subset of those companies that comprise the Top-10% of all revenue.

For example, if the set consisted of 90+ companies...

  1. Company1 $5 5%
  2. Company2 $5 5%
  3. Company3 $2 2%
  4. Company4 $1 1%
  5. Company5 $1 1%
  6. ...

I would expect to get back the companies comprising the Top-10%...

  1. Company1 $5 5%
  2. Company2 $5 5%

I know this can be achieved through a data-flow for a fixed set. However, I also want to allow the user to filter the set on-the-fly based on other attributes (e.g. company location), so the set of companies will vary based on various filters. For instance, the user may want to show the companies comprising the Top-10% for the country and then separately the companies comprising the Top-10% for a given state.

  • Note: I have found that there are visualizations (e.g., pie & donut charts) for showing the Top-N companies that also allow one to bucket the remaining companies in an "Other" group. However, the goal here is to show the companies that comprise the Top-N%. When working with percentages, the number of companies may vary depending on the filtered set. The result may just be one company or many companies depending on the filtered set.

I have been exploring various Beast Mode approaches but have not yet found a solution given these requirements.

Thanks in advance for any insights.

Answers

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    I'm not sure you will be able to do this in Analyzer/Beast Mode, but I will help you as much as I can with the following information.

    First, you will need to ask your CSM to enable Window Functions in Beast Modes if you haven't already done that.

    To get your percent of total, you would need to create a beast mode like this:

    SUM(`revenue`) / SUM(SUM(`revenue`)) OVER()

    To determine when you cross over a certain percentage line, I think you would need to utilize a LAG function in combination with the percent of total beast mode.

    Here are a couple good posts by @GrantSmith that will be helpful for you.

    You might also want to look into this post by him as well.


  • sem
    sem ⚪️

    Thank you Mark. I'll request that extra capability now and start exploring this approach.

  • sem
    sem ⚪️
    edited April 20

    Success. Thank you Mark for the direction.

    Here is the Beast Mode I settled on, identifying those companies in the Top-10% ...

    ((CASE 
    	WHEN 
                    -- running total (sorted descending) divided by series total is less then or equal to 0.1 (10%)
    		SUM(SUM(`revenue`)) OVER (ORDER BY SUM(`revenue`) DESC) 
    		/ 
    		SUM(SUM(`revenue`)) OVER() <= 0.1 THEN 1
     	ELSE 0
     	END
     )
    

    Sadly though, it appears that aggregations cannot be used in Filters.

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Great work @sem ! Glad I was able to help.

  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @sem

    There's a new beta available to allow filtering on aggregates. You should be able to utilize that with your beast mode. It is in Beta so you’d need to get signed up for the beta program. Start with taking to your CSM.