Creating a separate column in transforming data using magic ETL

I am using formula function to add a column that gives out values based on certain condition. I am using case condition for validating the conditions, and I want it to be grouped by Req Id field. because req Id field contains multiple req Ids and have duplication. Idea behind grouping by is that if any of that condition is valid for even one row (Or Req Id) than basically flag all those req id as G or R or R1 etc... based on the condition that it flags against in the case statement.

I wrote the following querry, but it says syntax error. What am I missing?


GROUP BY `Req ID`

(CASE 

WHEN `Req status`='Open' AND (`Offer to be Extended`<>'' OR `Offer Extended` <>'')

THEN 'G'

WHEN `Req status`='Open' AND (`HM applicant review`= 'Is null' AND `Days open`>10)

then 'R1' 

WHEN `Req status`='Open' AND (`Interview`= '' AND `Days open`>15)

THEN 'R2'

WHEN `Req status`='Open' AND (`Req with Offer Accept`= 'Not Specified' AND `Days open`>35)

THEN 'R3'

Else'Y'

End

))

Answers

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    The formula tile doesn't recognize the GROUP BY term in your syntax. You also can't flag other rows based on one row in the formula tile. You would want to build your flag with your statement and then join the data back to itself to flag those other rows that meet that criteria. If you are unaware, you can split out a dataset and perform some function and then re-join the data back together. Below is an example.



  • Ok but I am not flagging the row itself. I am just adding another column in the transformation that will basically do the flagging (Based on case values it evaluates in the Case statement). So its just adding a column

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Ok. Then it sounds like you just need the case statement portion of your formula and you just need to remove your group by line and it should do what you want it to do.

  • Ok but as I mentioned earlier, that I have duplication in Req IDs.

    So lets say a Req ID satisfy 1st case condition, then I want all the rows where that ReqID exist to be written with "G" in that new column I am creating. Even if another row (down the lane) that has the same req ID doesn't meet my case 1 condition. Will this work?

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Right. That is why I mentioned you will want to split off the data and then join it back to the main data and apply that value to the other rows with the same req id. If you are unsure how to do this, I can mock something up for you if provide some sample data.

  • Ok so my case condition are logically arranged, meaning if 1st case condition is executed then all other case conditions are invalid, and that req id where 1st condition is met needs not to go through other case conditions, and hence that req Id doesnt need to be checked for next conditions. So In this case assuming if I had just one case condition (1st one) where I need a column to say 'G' if 1st case condition is met, how do I separate and merge my table? So if you look at the data Req Id 5 in row number 3 it should also have the 'G' status because another duplicate req ID 5 is in row number 4, and that one satisfy my 1st case condition. So when my 1st condition executes it should tag both row number 3 and 4 as 'G' in a new column. Even though row number 3 doesn't satisfy my 1st case, but row number 4 does.




  • Consider that request status is open for all of these

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Ok. Here is a screenshot of what I built that should get you what you want and I will explain the steps below.

    The first Add Formula just has your case statement in it and I created a column called Group. (Note: in your case statement, you are using <> ''. You'll know your data best, but I had to change it to IS NOT NULL because that my blank rows are null. You may want to test for that.)

    The Filter Rows tile filters down to just the rows where Group is not equal to 'Y'. This gives me just the Req ID and the group that you are going to later apply to other rows with the same req id.

    In the Group By tile, I am grouping by Req ID and Group and then doing a count of Req IDs because the Group By tile requires me to aggregate something. I am not actually going to use the count.

    In the Join Data tile, I am doing a Left Join with the source spreadsheet and the Group By tile and joining on Req ID. Make sure the source spreadsheet is on the left side. This will put the Group value next to all Req IDs.

    In the second Add Formula tile, I use the following on the Group column to replace any nulls. IFNULL(`Group`,'Y')


    This should do it for you, but I will recommend again to review your original case statement to ensure it is handling nulls and blanks correctly so that it comes up with the correct value.

  • The first Add Formula just has your case statement in it and I created a column called Group. (Note: in your case statement, you are using <> ''. You'll know your data best, but I had to change it to IS NOT NULL because that my blank rows are null. You may want to test for that.)

    I agree with you 'is not null' is a better alternative

    The Filter Rows tile filters down to just the rows where Group is not equal to 'Y'. This gives me just the Req ID and the group that you are going to later apply to other rows with the same req id.

    I didnt quite get it. So lets say if I only had one case condition- WHEN `Req status`='Open' AND (`Offer to be Extended`<>'' OR `Offer Extended` <>'')

    THEN 'G'

    Do you mean that I shall filter out the rows in that new column with 'G' responses? but again as I mentioned earlier how will I make sure that duplicates are taken care off while running this condition?

    In the Group By tile, I am grouping by Req ID and Group and then doing a count of Req IDs because the Group By tile requires me to aggregate something. I am not actually going to use the count.

    So this is where its like Group By in SQL- I can get group by req ID.. I get it

    In the Join Data tile, I am doing a Left Join with the source spreadsheet and the Group By tile and joining on Req ID. Make sure the source spreadsheet is on the left side. This will put the Group value next to all Req IDs.


    Yes if I can make sure that duplicates are taken care off. As I mentioned in my example in above post with data, then I cna do a left join on Req ID to capture that new column that has 'G' as a response.

    In the second Add Formula tile, I use the following on the Group column to replace any nulls. IFNULL(`Group`,'Y')

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    In the Filter tile, I am filtering out the rows that end up with Y, which is what rows get from your ELSE clause in your CASE statement.

    Have you tried implementing what I detailed out? I would suggest you do that and see if the results match what you expect.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @MarkSnodgrass is correct.



    Formula tile with:

    CASE 
    
    WHEN `Req status`='Open' AND (`Offer to be Extended`<>'' OR `Offer Extended` <>'')
    
    THEN '1 - G'
    
    WHEN `Req status`='Open' AND (`HM applicant review`= 'Is null' AND `Days open`>10)
    
    then '2 - R1' 
    
    WHEN `Req status`='Open' AND (`Interview`= '' AND `Days open`>15)
    
    THEN '3- R2'
    
    WHEN `Req status`='Open' AND (`Req with Offer Accept`= 'Not Specified' AND `Days open`>35)
    
    THEN '4 - R3'
    
    Else '5 - Y'
    
    End
    


    From there, with standard SQL you could take a MIN(<beastMode>) OVER (GROUP BY Req_ID )

    Domo doesn't support MIN in a window function, so your GROUP BY -> JOIN pattern gets the job done.


    Alternatively, if you want to do a Window function without the GROUP -> JOIN , you could implement each condition as a binary (1 or 0) as 4 separate columns. then use the Window function to create a separate cumulative sum of each column sum(isBinary1) over (partition by Req_ID), sum(isBinary2) over (partition by Req_ID) then use a CASE statement to remap the binary column to its original name (case when isBinary1 = 1 then 'G' end), then coalesce(isBinary1Rempa, isBinary2Remap) the results. Coalesce will accept the first non zero value.

    it's more steps, but theoretically more efficient because you avoid the JOIN.

  • @MarkSnodgrass can you provide me a screenshot of group by function configuration in this case?

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    @user026529 Here is what your Group By tile should look like:


  • @MarkSnodgrass

    Thanks for this. In group by configuration, I have configured as below with left join on my original table. The only issue is that now I have ReqID as a common column in my original table and my group by table. If I choose to rename it on the group by table, will it impact group by step?



  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤
    edited May 17

    You can drop the column as you have shown in your screenshot. You won't need it after the join because the Req ID will be in the original table.

  • @MarkSnodgrass

    Ok an update- It doesnt work as planned. In my final update table I have my groupby column that is left join on my main table, however the column has only 'G' values (Wherever there are values) and no R1 or R2s etc.. it looks like only 1st statement in the case statement below is executed and not the rest. Where as the data definitely has some of the req Ids that will qualify for R1 and R2 statements.

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    You will want to look at your formula tile and review your case statement. Run the preview in the Magic ETL and then click on the Formula tile where your case statement is and click on the preview link. Look through the results and see if you see other values besides G. If you don't see anything besides G, then the issue is your CASE statement logic and you will need to look into what your issue is with that logic.

  • @MarkSnodgrass My case statement doesnt show me any error on GRY, a column that i created with this condition (Screenshot below)

    however I dont have preview available in magic ETL




  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Along the top of Magic ETL, there is a Run Preview button. Click on this. Once there is a green checkmark next all of your tiles, click on the Formula tile and then click on Preview. This will show you your data and will show you the column you created in the formula tile. Look through this column and see if you see values besides G.



  • user026529
    user026529 ⚪️
    edited May 17

    @MarkSnodgrass Yes I did that. I ran preview and then tried to see my data in the case formula operation



  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Next to Configuration, click on Preview. This will let you see all rows.


  • @MarkSnodgrass Its funny that it says no rows match your criteria here, yet I do get my 1st condition that creates 'G' in the final table



  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Did you click on the Run Preview at the top of the ETL like I mentioned and then waited for the green checkmarks to appear?


  • @MarkSnodgrass Yes as you can see my previous post all my operations have a green check against them when I run the preview

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    You should check your filter tile and see if you are losing all your rows there. Click on the Filter tile and then on Preview and see if it shows rows. If it doesn't, you need to adjust that. Formula tiles don't drop rows, so you need to look at the previous tiles before it and see where you are losing your rows.

  • user026529
    user026529 ⚪️
    edited May 18

    @MarkSnodgrass Sorry this approach is not working for me as I am not able to see the preview. In the filter workflow (previous to my case formula) doesnt show me any preview of any rows (It has a green check when I run the preview) and yet it gives the rows perfectly in the output table when I run the dataflow

    I tried @jaeW_at_Onyx approach here, and now I have 5 Columns with name of my case condition Ie- Column G, R1, R2, R3, and Y

    My case conditions gives 1 or 0 depending on the criteria condition is met in each of those columns. So I have only 0s or 1s in those 5 columns based on my condition is met or not.

    I didnt understand this part of the solution-  sum(isBinary1) over (partition by Req_ID), sum(isBinary2) over (partition by Req_ID) then use a CASE statement to remap the binary column to its original name (case when isBinary1 = 1 then 'G' end), then coalesce(isBinary1Rempa, isBinary2Remap) the results. Coalesce will accept the first non zero value.

    So I can do a sum of each column using Sum('G') over (partition by Req_ID)----- (Just as an example of sum in column G by req ID) but do I create a new column? for this formula?

    What I didnt understand is how do I use Coalesce function here am I suppose to use this in beast mode formula?

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    I would encourage you to watch this Magic ETL Overview video in the Domo Video Library that might help you be more comfortable with some of the ETL controls. https://www.domo.com/help-center/videos/watch/xjqpW94Nq28/cat/PLc_JazWEwlVrPxLLzsXziS7tOCaDLxFny Around the 1:40 mark they talk about the preview functionality.

    There are many other videos in that library that are vey helpful in getting you proficient in Magic ETL. All of the videos are 5 minutes or less.