Adding data that matches a date to a date dimension table.

mberkeley 🟡

I have HR data that has Start Date and End Date (or null) for each employee. I am looking to be able to see the number of active employees at any given date. I have added the Domo Date Dimension table to my data set, but I can't figure out how to join so that for any date I can get the count of employees who's start date is before the date and ended after or is null.

I've been able to get start/end date counts per day added to the dimension table, but can't figure out how to get that 'active' record count added.

Any ideas?


Best Answer

  • GrantSmith
    GrantSmith Indiana 🥷
    Answer ✓

    You can utilize the Calendar dataset from the Domo Dimensions calendar to do a conditional join on your HR dataset based on the start date and end date. With Magic ETL 2.0 you can't do a straight conditional join but would need to do a cartesian join then do the filtering after the fact. Depending on your HR dataset size this may take a while to process.

    With Magic 2.0:

    • Input Datasets: Calendar and HR
    • For each dataset add a constant "Join Column" with a value of 1
    • Use a Add Formula Tile with "COALESCE(`End Date`, CURRENT_DATE())" to default any NULLs to today for the end date
    • Left join your HR dataset onto the calendar dataset (have calendar on the left side) based on the Join Column
    • Use a filter tile to filter your dataset based on `Start Date` <= `Calendar Date` and `Calendar Date` <= `End Date`
    • Then feed the resulting dataset into a group by and group by the `Calendar Date` and count the number of employees you have employed at the given date.

    Alternatively you could utilize a MySQL dataflow and do something similar except have your join clause be your filter clause (`Calendar Date` BETWEEN `Start Date` AND `End Date`) in your join clause and not worry about doing your cartesian join. This may be faster due to less data being generated.

    SELECT c.`dt` as "Calendar Date", COUNT(*)
    FROM calendar as c
    LEFT JOIN `hr_dataset` as hr on c.`dt` BETWEEN hr.`Start Date` AND COALESCE(hr.`End Date`, CURRENT_DATE())

    Note: This is all untested pseudo code and you may need to tweak based on your table and column names.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**


  • mberkeley
    mberkeley 🟡

    I like the MySQL version more, but have never written SQL ETL, so I worked with the Magic ETL version. It does exactly what I needed. Thank you!

  • GrantSmith
    GrantSmith Indiana 🥷

    @mberkeley If you could accept my answer to your question so others can find it easier in the future I'd appreciate it. Thanks.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**