Exclusion Join or method to achieve the same type of result

I've got a dataset I'm working with that has a large amount of RepairOrder data, in this particular instance I'm filtering for any RepairOrders that are not closed that have costs applied to them.

This is very straight forward, however I've discovered that in some cases I've got duplicates that exist, one closed and one open.

So I'm only seeing one entry.

The duplicate is not the RepairOrder number, it is what is called the sourcekey which is a unique value from the system the generates the RepairOrder. It is possible to have multiple RepairOrders using the same source key, although it is not accepted practice.

I'd like to be able to somehow eliminate or filter out the records in my dataset that have the same source key ID in the closed Records section. I could possibly do this with a Beast Mode but I'm not sure exactly how I would do that I want to make the right choice on this.

Any help as always will be appreciated.

Thanks

Randy

Comments

  • How to solve it really depends on exactly what you are looking to achieve, and I'm not sure that I am interpreting right, so forgive if this is not spot-on.  

     

    It sounds like you want to see the orders which are not closed, but also don't share the same unique identifier as a closed order.  

     

    ***For both of these solutions, you will want to be careful of what columns you include when you split off to identify dups.  You will not want to include the order number because that won't collapse, but you will need the sourcekey so that you can join back up with the dataset.  

     

    Solution #1 - One way to do this could be to split off your dataflow to group by all fields and count the repair order number.  Then join with the last output before that action, and filter out all of those source keys with a repair order count greater than 1.  This will eliminate all your dups, though, so if you want to include the dups where neither has been closed, you can do something a little different.  

     

    Solution #2 - Use 'Add Constants' action and enter a default value of 1.  Use 'Uncollapse Columns' action to make columns for each status option (Closed, Open, etc.).  The value from your new constant will be what you use to populate your column values.  Use 'Group By' action, grouping by all of your fields, and summing the status columns.  Join this to you last output before these actions, and include only those items whose sourcekey has a value greater than or equal to 1 in the the columns other than 'Closed' AND a null value in the 'Closed' column.  

     

    Let me know if you have any questions, and please remember to mark this as a solution if it helps you.  

     

    Thanks!

  • I think you have a pretty good understanding of what I'm trying to do.

    Your solution 2 is the approach that addresses my issues but how would I make the Uncollapse Column entry if I don't have a static RepairOrderStatus like Closed, Open etc.

    All I have are date fields, so I know the status is closed if the ActualClosedDate field is not null, I know its open if it is.

    It doesn't look like I can use null or is not null in the uncollapse Columns options.

    Thanks for you help so far, appreciate it.

    Randy

  • Hi Randy, 

     

    I am glad we are getting there!  

     

    I think that if you the value mapper, you can make a column called 'Status', with a default value of 'Closed'.  The value to search for in the ActualClosedDate field will be Null Values, and that should enter a value of 'Open'.  

     

    So - The entries in each piece of the transform would be:  ActualClosedDate - Write to new column - Status - Write a default value - Closed - Null - Open

     

    I hope that helps!  ?

     

  • Doesn't look like you can search for Null values in ValueMapper, because its a date field your searching for it is forcing you to input a specific date to search

    Randy