Beast Mode help-Case When

I am having an issue in my dataset with a CaseWhen statement. I need write a statement that says, if this amount has the same invoice number, check number, "hceck number" then only pull one of the amounts. Currently my data pulls both amounts and then my data is duplicated. 


  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @user003911 , it's really hard to help you.


    can you post a sample dataset? or some screenshots?


    "same invoice number" as what? another row? or another column?  do you actually have duplicate rows of data in your dataset? "pull one of the amounts" ?

  • I have attached the document below. I have circled all of the categories that are the same, but I only need one of the total amount not both. Is there a case when statement I can create for this. 

  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @user003911 


    It appears you have duplicate rows in your dataset. The main reason this happens is because of how you're joining tables together. You have a few options:

    1. Change your query to make sure you're joining on a one to one relationship (most ideal)
    2. Add a DISTINCT keyword to your select statement (quickest)
    3. Utilize an ETL to remove the duplicates (most time consuming and not ideal)
  • I have everything joined correctly, however the issue is when our Accounts payable prints a check and then something on the check is incorrect so they print the check again with the same check number. This creates the duplicate data. Each time a check is printed a unique identifier is coded to the check however all of the check data is the same. Unfortunately there is anything in the raw data that shows the incorrect check, it just shows two checks printed for the same amount. That why I was thinking a Case when statement would be the best here. Stating if the job number, vendor, invoice number, amount and cost code category are all the same and only chose one of the rows of data. 

  • GrantSmith
    GrantSmith Indiana 🔴

    Does your raw data have that unique identifier?  How is your data being infested into the system? If you don't have the unique identifier and using SQL try adding the DISTINCT clause to ignore duplicates before investing into the system


    a beast mode isn't really the right tool in this cas as it analyzes each row individually and not across the entire dataset. 

    you might also be able to use the min (or max) aggregate function on your numbers to the the minimum value (they should all be the same from my understanding) when graphing your numerical fields. 

    ideally though DISTINCT would be your best option if you aren't pulling in the unique identifiers. 

    sorry for formatting. I'm on mobile

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    a CASE statement is not the optimal solution for this problem.

    For maximum dataset reusability, you should implement a Window / Rank function in MagicETL to calculate row_number.  In this case you'll Partition by Check_ID and then ORDER BY date desc or however you identify the 'correct' version of the check.


    Then in your visualization you can filter on row_number = 1 to make sure you only have one row per check_id