What is the point of a full join in Magic ETL?

So I get Left, right, and inner joins. But, what is the point of a full join if it returns null values for any columns that don't have a match? How is that different from an inner join?


Also, how can I make sure that the non-matched items are included in the resulting dataflow?

Best Answer

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟣
    Accepted Answer

    That's hard to say without seeing some sample data, but rather than doing a full join, you should consider doing an append and include all columns from both datasets. It is a commonly recommended approach instead of joining the data.

Answers

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟣

    A full join is going to give you all the records from both tables and join matching records where it can. An inner join would only give you the matching records and exclude all other rows.

    This KB article has some good visuals that explain the various types along with a video.


  • What are the cases where it can't return the data? because I'm banging my head against a wall trying to figure out why the ETL keeps dropping my data on a full join. Also, the linked video is blocked at my work due to being on box.com.

  • @HowDoIDomo - I will often use the full outer join when creating a dataflow in ETL. I also will include an output dataset after each join. This helps me make sure that I am joining the data correctly and the number of rows are what I expect to find.

    The advantage to the full outer join is exactly what you mentioned in your post. By including all rows of both datasets, I can quickly see that when a field from the left dataset is null then whatever I was joining on was not found in the left dataset. Same with the right dataset. This can be very helpful when you are expecting to find a match for all records, or when you expect that every record in the right dataset should have a match in the left dataset.

    For example, let's say that my left dataset contains invoice data and my right dataset contains sales rep information.

    My assumption would be that each invoice should have a corresponding sales rep, but not all sales reps will have an invoice. This assumption would indicate that I should do a right join. However, by using full outer join, I can get a quick validation that my assumption was correct. After the full outer join, I would add a filter tile to only include any rows where I have invoice data, but no Sales rep data (according to my assumption, this should never happen). I then create an output dataset after that filter tile so that I have a dataset that will only get populated with any invoice data that is NOT linked to a sales rep. I can set an alert on that dataset and subscribe the appropriate people to the alert so that anytime we have an unassigned sale, they can correct it.

    Getting back to the data flow, I would then include a filter that would only show the invoices that were assigned and continue with whatever else I had planned to do after the "right join". This essentially mimics a right join, but with the added benefit of alerting you anytime you have unassigned sales.

  • HowDoIDomo
    HowDoIDomo ⚪️
    edited March 12

    So, I am trying to do the append rows, but then everything else becomes null except for the rows that came back null during my joins. I don't know what I'm doing wrong.


    I tried doing select rows to get rid of anything that would be a duplicate, but then I started having null data issue again for 3 columns.

  • GrantSmith
    GrantSmith Indiana 🟣

    Hi @HowDoIDomo

    Do you have some sample / anonymized data you could post to get an understanding of your data format?

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟣

    @HowDoIDomo

    1) the OUTER JOIN is doing what it's supposed to do. just keep in mind, if you're doing an

    tableA.SalesRep

    OUTER JOIN tableB.SalesRep

    on A.SaleRep = B.SalesRep

    50% of the time SalesRep will have data and 50% of the time the column will be NULL because THERE WAS NO MATCH IN TABLE A. In other words, when TableA.SalesRep is null then TableB.SalesRep will be populated.

    Solution, use a formula tile to

    coalesce(TableA.SalesRep,TableB.SalesRep)
    

    Coalesce will take the first non null value. Apply this pattern to all the JOIN columns.


    MORE IMPORTANTLY.

    I have never encountered a use case in Domo where it was appropriate to do an OUTER JOIN.

    You should probably APPEND your data.

    https://www.youtube.com/watch?v=PVbOeLSae9o

Sign In or Register to comment.

Hey, Stranger!

It looks like you're new here. Those who sign in get access to engage with even MORE fire content. To get involved, click one of these buttons!