ETL Missing Data - Assistance Needed

Hello, I am in need of assistance on trying to figure out why my ETL is some data. (I have tried a Data Fusion as well)

 

I have two tables/datasets where it is from tblCoverage left joined to tblProfile. Both tables are linked by a unique vendor code.

 

tblCoverage has a column named 'State' where it shows what state a vendor covers

tblProfile also has a column named 'State' which shows where the vendor is located and not necessarily what state they cover.

 

When I have a card built just using tblProfile, I am able to see values populated in a column named isActive (T/F). However, when a left join with tblCoverage was created through ETL, there are some records missing from these columns. Note that both datasets have a 'isActive' column, so I am using the column from tblProfile and not from tblCoverage. In addition, I have one filter on the card where I am filtering State = 'DC' (state from tblCoverage datset)

 

I confirmed there is no issue with the data source that the data is pulling from, and I think it is something with the join or how ETL was created.

 

Any assistance would be greatly appreciated and please let me know if you have any questions because I don't know if I explained well enough.

Comments

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    Without knowing any more about your depth of knowledge in SQL, it sounds like you may want to brush up on the implications of what a LEFT JOIN does versus an INNER or OUTER join.

     

    Recall, a LEFT JOIN will keep all records on the left side and remove any records on the right that don't have a match (on Vendor Id).

     

    So in your example, if there's a leftTable.VendorID that does not exist in rightTable.VendorId, you'll still keep the row, but the right side of the fusion will contain NULL values.  Therefore, when you are filtering, you should always prioritize filtering on field from the left side because they will always be populated; whereas there are no guarantees that the right side of the table will have matching values.  (if you could guarantee matches, then you could use an INNER join)

     

    Consider what happens if State exists in both tables (first ... don't do that either DROP the rightTable.State or rename them ex. State_WhereActive and State_Headquartered). 

     

    IF you filter on rightTable.State and that vendor doesn't exist in the right table, then there wouldn't be a result in your card because rightTable.State contains a NULL. 

     

  • Hello Jae,

     

    I do have a matching vendor in both tables but they have differing states.

    Left table state = DC
    Right table state = LA

     

    Is there a way I can filter on the left table's state (LA), and have it show me the right table's unique column such as isActive?

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    Sure. In and Analyzer table card,  filter on LeftTable.State and put RightTable.isActive on the axis.  keep in mind, depending on what else you have in your card, you will either return one row per row in your dataset (i.e. you'll see isActive many times) OR if the table defaults to aggregating, you'll have one row for each unique value.

     

    To see the Vendor, of course you must put Vendor on the axis.  Again, IF you filter on LT.State AND there are vendors in the right table that are NOT represented in the Left Table, those vendors would get filtered out because you don't know which LT.State they are in in.

     

    Your test.

    Build a Fusion Joining your table.  If you have a proper 1:M relationship, then your row count should = rows in left times rows in right.  If you have LESS THAN that, then you don't have 100% representation of vendors in both tables and you need to think about my notes about filtering on left side versus right side.  if you have MORE THAN the 'correct number of rows', then you have cases where the same Vendor appears more than once in both tables (i.e. you have a M:M relationship).

  • Hi Jae,

     

    Here is my current setup:

    Filters:

    State = DC (From left table)

     

    Axis:
    VendorCode (left table)

    State (from left table)

    State (from right table)

    isActive (from right table)

     

    The output data is showing VendorCode correctly, however, the State and isActive from the right table is blank. State from the left table is showing 'DC' as expected because I set my filter to 'DC.' I was expecting the State from the right table to show 'LA' because that's where they are based in.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @SLam reread my responses.  The items to check are all there.