Join with Between Clause in Magic ETL

MarkSnodgrass Portland, Oregon 🟀

I am working on converting some Redshift dataflows to Magic ETL 2.0 dataflows. I have a join statement in Redshift that looks like this:

SELECT d."province_state"
, c."Category" as SeverityRelativityCategory
, n."Category" as SeverityRelativityCategoryMinusNY
FROM "ihme_covid_19_population_death_shares" dΒ 
JOIN "severity_relativity_categories" c on d."SeverityRelativity" BETWEEN c."MinAmt" AND c."MaxAmt"
JOIN "severity_relativity_categories" n on d."SeverityRelativityMinusNY"BETWEEN n."MinAmt" AND n."MaxAmt"

I'm not seeing an easy way to handle the BETWEEN clause using the JOIN tile in Magic ETL 2.0. How can I accomplish this?

Best Answer

  • GrantSmith
    GrantSmith Indiana πŸ”΄
    Accepted Answer

    Hi @MarkSnodgrass

    Magic ETL doesn't support a conditional join, only a straight columnar join.

    The option you can do is perform a Cartesian join and then user a filter tile after the fact. To do that, add a constant of 1 to both of your datasets and then join on that column. The Cartesian join causes the number of records to explode which you then have to filter through so it's the most efficient but should get you what you want.


  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟀

    Thanks @GrantSmith ! That did the trick. Here is a visual of what it ended up looking like for anyone else who needs to do the same thing.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟀
    edited February 18

    @MarkSnodgrass a little late for the party, but here's an alternative to doing a full on cartesian product.

    i'm assuming your dataset is relatively small so multiplying Table A times the number of rows in Table B isn't THAT big a deal.

    For a solution that scales (in certain use cases), you can duplicate table C and N by the row by the number of rows between Min and Max Amount

  • SeanPT
    SeanPT 🟠

    @jaeW_at_Onyx 's video saved my butt because I've gone DEEP down the rabbit hole with Magic v2 and ran face first into needing a BETWEEN JOIN.

    That trick will absolutely work for me .... for now. I add 48 rows every day. So, a year from now ... sum=[n*(n+1)]/2

    153,483,960 rows :/

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟀

    @SeanPT thaaaaats gonna get expensive. do you want to post a screenshot of what you've currently got and describe the JOIN?