Fill in missing dates

Ashleigh
Ashleigh Florida 🟢

I am needing to find a way to fill in some missing dates. My data looks like this. 

 

Project           Start Date          End Date

Project 1        4/1/2020           5/1/2020

Project 2         4/10/2020       6/1/2020

 

I want to make a new rows for every date between the start and end date. I saw some way to do this with R but I am not familiar with R. 

Best Answer

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤
    Accepted Answer

    SQL TO THE RESCUE.  Use a JOIN on BETWEEN.

     

    SELECT 

    t.*

    dd.date

    FROM date_dim dd

    JOIN transactions t

    ON

    dd.date betweent t.startDate and t.endDate

     

    EDIT:: can't remember if mySQL supports BETWEEN

    SELECT 

    t.*

    dd.date

    FROM date_dim dd

    JOIN transactions t

    ON

    dd.date >= t.startDate and

    dd.date < t.endDate

Answers

  • Ashleigh
    Ashleigh Florida 🟢

    @jaeW_at_Onyx  I am currently in an ETL with a few other manipulations so I am trying to keep it in there. I was able to get it working with python!

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    Fair enough, if you must use Python do it. 

     

    If it were me, i would have an ETL where i do the 'easy transformations' in magic, and then a second ETL where i change the granularity of the data to one row per day.

     

    WHY?

    1) changing granularity is probably a requirement specific to a dashboard requirement.  Whereas, a cleaned-up version of the data at one row per project would be valuable in many places.

    2) i would avoid Python if possible b/c it's harder to support, it's a premium feature AND data transfer in and out of Python is comparable to data transfer in and out of MySQL.