What is the point of a full join in Magic ETL?

So I get Left, right, and inner joins. But, what is the point of a full join if it returns null values for any columns that don't have a match? How is that different from an inner join?


Also, how can I make sure that the non-matched items are included in the resulting dataflow?

Best Answer

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon πŸ₯·
    Answer βœ“

    That's hard to say without seeing some sample data, but rather than doing a full join, you should consider doing an append and include all columns from both datasets. It is a commonly recommended approach instead of joining the data.




    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.

Answers

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon πŸ₯·

    A full join is going to give you all the records from both tables and join matching records where it can. An inner join would only give you the matching records and exclude all other rows.

    This KB article has some good visuals that explain the various types along with a video.





    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • What are the cases where it can't return the data? because I'm banging my head against a wall trying to figure out why the ETL keeps dropping my data on a full join. Also, the linked video is blocked at my work due to being on box.com.

  • @HowDoIDomo - I will often use the full outer join when creating a dataflow in ETL. I also will include an output dataset after each join. This helps me make sure that I am joining the data correctly and the number of rows are what I expect to find.

    The advantage to the full outer join is exactly what you mentioned in your post. By including all rows of both datasets, I can quickly see that when a field from the left dataset is null then whatever I was joining on was not found in the left dataset. Same with the right dataset. This can be very helpful when you are expecting to find a match for all records, or when you expect that every record in the right dataset should have a match in the left dataset.

    For example, let's say that my left dataset contains invoice data and my right dataset contains sales rep information.

    My assumption would be that each invoice should have a corresponding sales rep, but not all sales reps will have an invoice. This assumption would indicate that I should do a right join. However, by using full outer join, I can get a quick validation that my assumption was correct. After the full outer join, I would add a filter tile to only include any rows where I have invoice data, but no Sales rep data (according to my assumption, this should never happen). I then create an output dataset after that filter tile so that I have a dataset that will only get populated with any invoice data that is NOT linked to a sales rep. I can set an alert on that dataset and subscribe the appropriate people to the alert so that anytime we have an unassigned sale, they can correct it.

    Getting back to the data flow, I would then include a filter that would only show the invoices that were assigned and continue with whatever else I had planned to do after the "right join". This essentially mimics a right join, but with the added benefit of alerting you anytime you have unassigned sales.


    ______________________________________________________________________________________________
    β€œThere is a superhero in all of us, we just need the courage to put on the cape.” -Superman
    ______________________________________________________________________________________________
  • HowDoIDomo
    HowDoIDomo 🟑
    edited March 2021

    So, I am trying to do the append rows, but then everything else becomes null except for the rows that came back null during my joins. I don't know what I'm doing wrong.


    I tried doing select rows to get rid of anything that would be a duplicate, but then I started having null data issue again for 3 columns.

  • GrantSmith
    GrantSmith Indiana πŸ₯·

    Hi @HowDoIDomo

    Do you have some sample / anonymized data you could post to get an understanding of your data format?



    **Was this post helpful? Click Agree or Like below**

    **Did this solve your problem? Accept it as a solution!**
  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟀

    @HowDoIDomo

    1) the OUTER JOIN is doing what it's supposed to do. just keep in mind, if you're doing an

    tableA.SalesRep

    OUTER JOIN tableB.SalesRep

    on A.SaleRep = B.SalesRep

    50% of the time SalesRep will have data and 50% of the time the column will be NULL because THERE WAS NO MATCH IN TABLE A. In other words, when TableA.SalesRep is null then TableB.SalesRep will be populated.

    Solution, use a formula tile to

    coalesce(TableA.SalesRep,TableB.SalesRep)
    

    Coalesce will take the first non null value. Apply this pattern to all the JOIN columns.


    MORE IMPORTANTLY.

    I have never encountered a use case in Domo where it was appropriate to do an OUTER JOIN.

    You should probably APPEND your data.

    https://www.youtube.com/watch?v=PVbOeLSae9o


    Jae Wilson
    Check out my Domo Training YouTube Channel

    **Say "Thanks" by clicking the heart in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"