I am having an issue in my dataset with a CaseWhen statement. I need write a statement that says, if this amount has the same invoice number, check number, "hceck number" then only pull one of the amounts. Currently my data pulls both amounts and then my data is duplicated.
@user003911 , it's really hard to help you.
can you post a sample dataset? or some screenshots?
"same invoice number" as what? another row? or another column? do you actually have duplicate rows of data in your dataset? "pull one of the amounts" ?
I have attached the document below. I have circled all of the categories that are the same, but I only need one of the total amount not both. Is there a case when statement I can create for this.
It appears you have duplicate rows in your dataset. The main reason this happens is because of how you're joining tables together. You have a few options:
I have everything joined correctly, however the issue is when our Accounts payable prints a check and then something on the check is incorrect so they print the check again with the same check number. This creates the duplicate data. Each time a check is printed a unique identifier is coded to the check however all of the check data is the same. Unfortunately there is anything in the raw data that shows the incorrect check, it just shows two checks printed for the same amount. That why I was thinking a Case when statement would be the best here. Stating if the job number, vendor, invoice number, amount and cost code category are all the same and only chose one of the rows of data.
Does your raw data have that unique identifier? How is your data being infested into the system? If you don't have the unique identifier and using SQL try adding the DISTINCT clause to ignore duplicates before investing into the system
a beast mode isn't really the right tool in this cas as it analyzes each row individually and not across the entire dataset.
you might also be able to use the min (or max) aggregate function on your numbers to the the minimum value (they should all be the same from my understanding) when graphing your numerical fields.
ideally though DISTINCT would be your best option if you aren't pulling in the unique identifiers.
sorry for formatting. I'm on mobile
a CASE statement is not the optimal solution for this problem.
For maximum dataset reusability, you should implement a Window / Rank function in MagicETL to calculate row_number. In this case you'll Partition by Check_ID and then ORDER BY date desc or however you identify the 'correct' version of the check.
Then in your visualization you can filter on row_number = 1 to make sure you only have one row per check_id