Looking up values from secondary data source but for multiple columns in primary data source

pstrauss
pstrauss ⚪️
edited March 28 in Dataflows

So I'm having some trouble wrapping my head around a solution for this scenario in Magic ETL:

-I have a primary data source with multiple columns where values are stored in shorthand like "hc_res" but I want to display reports with pretty names like "Healthcare Resources". The mapping of "hc_res" to "Healthcare Resource" exists in a second data source which updates on a pretty regular basis. The issue is that the shorthand names are stored in several columns in the primary data source (i.e. "primary_inquiry_type", "secondary_inquiry_type"), so I need to do the lookup for multiple columns during the ETL.

Normally, I'd use a join to do a lookup for a single column, but I don't think that would work in this scenario unless I did multiple joins - one for each column where the values need to be mapped. Is that my best bet, or is there a better way to achieve this?

I'm also okay replacing the values instead of doing a join if that's possible.

Comments

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🥷

    You would use multiple join tiles and just change the column that you want from your primary datasource to match your secondary datasource. Would look something like this:


    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • Thanks @MarkSnodgrass - that was what I thought I might have to do. It's just that there are like 20 columns where this needs to happen, and new columns get added to the primary data set which need the same sort of mapping on a regular basis. Was hoping for a more scalable solution, but this might be the only approach

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🥷

    You could try and use the unpivot tile and then potentially just join once. If you haven't looked into the dynamic unpivot and unpivot tiles, you could see if that would help you long-term. Here's the KB article.

    https://domohelp.domo.com/hc/en-us/articles/360044951294-New-Magic-ETL-Tiles-Pivot

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • I've used the pivot tile for some other ETL work, but never the unpivot ones. I'll take a look and see if that might help.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @MarkSnodgrass + @pstrauss

    before you pivot, make sure to assign a value to how to prioritize the match.

    i.e.

    matchValue =1 if JOIN on Name

    matchValue = 2 if JOIN on Description

    matchValue =3 if JOIN on


    then Pivot

    then JOIN

    then RANK on MatchValue desc

    then filter where Rank = 1 to only keep one version of the row.


    Good lukc!

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • This is what I ended up doing... Maybe it's not ideal... Since that CSV list never has more than 10 values in it, I joined the data from the external lookup table, then split the column into 10 columns, then unpivoted the 10 columns so each one is now a row. Not particularly elegant, but it seems to do the trick.

  • pstrauss
    pstrauss ⚪️

    So this is still driving me nuts - I used this solution for a single column, but I have about 10 columns where I have to do the same thing - do a lookup from an external table to change the values to match those. Is there any simpler way to do this across multiple columns?

  • So doing the multiple column splits and multiple unpivots is causing the ETL to run extremely slowly - I'm looking for any alternative solutions to handling a bunch of columns that took like this:

    "value1,value2,value3,value4,value5"

    And splitting them and putting them into separate rows for a single new column.

    Any other thoughts?

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @pstrauss splitting multiple multi-value columns into rows would cause massive row explosion.

    imagine one row with 3 values in column a and 3 values in column b. you end up with 9 rows. if you do that across multiple columns, huge row growth.


    instead, consider a dimensional approach.

    take each of your multivalue columns in isolation and create your matches.

    then just join the match value back onto your initial row.

    you might explore different approaches with a row ID (just generate a row number using the rank and window function).

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"