Join based on filter criteria

ravimohan911
ravimohan911 Montréal ⚪️

Table A (Completed Evaluations w/ dates & locations)

Table B (Locations Universe, all available locations)

Table A has all the same location columns as Table B. Table A also has completed dates and evaluation data,

We want to show locations with completed evaluations for a user specified date period (i.e. using a date filter card)

Card 1 can show the locations completed (easy enough)

Card 2 we want to show the locations that were not completed during the user specified date criteria.

i.e. Card 2 should show all Table B locations minus the locations completed during the user specified date period. The user chooses the date range for Table A in an embedded dashboard. Then somehow (beast mode?) filter Table B based on the locations not in Table A.

Possible?

Tagged:

Answers

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    Initially, @ravimohan911 sounds like you're describing a simple RIGHT JOIN. (keep all the locations from Table B and only rows from Table A if there's a match).


    But that won't let you know that for that date, there were no evaluations at that location (b/c the date column is in Table A, and would be NULL if there was no activity at that Location).

    You want the universe of all Dates AND Locations.

    In order to do that you need to create a bridge table that defines your Universe (use CROSS APPLY / JOIN to get a list of all the possible Date and Locations), then JOIN the bridge table back to Table A.


    We cover the concept in this tutorial video: https://youtu.be/Xb4QgKYgaqg?t=111

  • ravimohan911
    ravimohan911 Montréal ⚪️

    Cheers Jae. I actually used that video for another similar report, it worked great - thanks, your vids are awesome!

    Our challenge with creating the universe is that our locations table (Table B) has 210,000 rows in it (unique locations) and growing daily! If I create a dataset with every location for every day, it will start out with @ 153 million rows (since our "completes" data goes back to April 2019).

    We are looking for a way to avoid such huge tables if possible. Is there even such a thing as I describe?: user filtering the dates for the completes first and then doing some kind of Beast Mode join to figure out and show the not completed locations on the dashboard (can be the same or a different card).

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    You can't JOIN in a Beast Mode.

    If you limit how far back people can look (i.e. just 7 days, OR limit yourself to the end of a week) then your data becomes more manageable.


    The following approach MIGHT work, but the idea would be to build your Dataset on VIEWS instead of ETL so you get instant results.

    User Input: a webform that sets the StartDate

    then you JOIN that StartDate to a list of numbers (let's say 1 to 90) then use DateADD in a VIEW to calculate the days from StartDate to 90 into the future.

    then JOIN that table on Locations to create your Universe.

    then JOIN that UniverseTable to your Transactions.

  • ravimohan911
    ravimohan911 Montréal ⚪️

    Cheers Jae. I'm not sure, using this approach, can the end user (ie a customer via an embedded dashboard) enter the date criteria ( in a webform) and then see a list of completed and not completed locations for the date range they specified right away? Sounds like they have to make the date range "request" and then we manually run it.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    If your pipeline is built on dataset views then no execution is necessary.

  • ravimohan911
    ravimohan911 Montréal ⚪️

    Awesome - here goes...

  • ravimohan911
    ravimohan911 Montréal ⚪️

    1 - Would we be able to specify both the start and end dates in the webform? So we are not hard coding in the the range of dates we want to filter our completed evaluations for.

    2 - Would we be using just 1 dataset view (all Joins in one view) or do we need multiple views?