Adding missing day rows

Hi, 

 

I am trying to get data for everyday in a particular date range even if there is no data for that day.

 

For example,

Item

Location

Date

Count

A

USA

Jan 3, 2018

5

A

USA

Feb 10, 2018

12

A

USA

Feb 25, 2018

13

A

Canada

Jan 7, 2018

14

A

Canada

Feb 10, 2018

10

A

Canada

Mar 3, 2018

5

    

 

Desired O/P:

Item

Location

Date

Count

A

USA

Jan 1, 2018

0

A

USA

Jan 2, 2018

0

A

USA

Jan 3, 2018

5

A

USA

Jan 4, 2018

5

A

USA

Jan 5, 2018

5

A

USA

Jan 6, 2018 ….Feb 9,2018

5

A

USA

Feb 10, 2018

12

A

USA

Feb 11 …Feb 24

12

A

USA

Feb 25

13

A

USA

Feb 26… Current Day

13

A

Canada

Jan 1.. Jan 6

0

A

Canada

Jan 7, 2018

14

A

Canada

Jan 8…Feb 9

14

A

Canada

Feb 10, 2018

10

A

Canada

Feb 11 – Mar 2

10

A

Canada

Mar 3

5

A

Canada

Mar 4 – Current Day

5

 

I came up with this SQL Procedure:

CREATE PROCEDURE add_missing_days()
BEGIN

DECLARE DONE INT DEFAULT 0;
DECLARE ITEMID INTEGER;
DECLARE LOCID integer;

DECLARE FND INTEGER;
DECLARE DT DATE;
DECLARE END_DT DATE;
DECLARE PREV_RATE VARCHAR(255);

DECLARE item_ids CURSOR FOR SELECT DISTINCT `ITEM_ID`,`LOCATION_ID` FROM `transform_data_1` ORDER BY `ITEM_ID`,`LOCATION_ID`, `Transaction Date` ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET DONE = 1;

OPEN item_ids;
item_ids_loop : LOOP
FETCH item_ids INTO ITEMID, LOCID;

IF DONE = 1
THEN CLOSE item_ids;
LEAVE item_ids_loop;
END IF;


SET DT = str_to_date('2018-01-01', '%Y-%m-%d');
SET END_DT = CURRENT_DATE;

WHILE DT < END_DT DO
SET PREV_RATE = 0;
SET FND = ( SELECT COUNT(*) FROM `transform_data_1`
WHERE `ITEM_ID` = ITEMID AND `LOCATION_ID` = LOCID and `Transaction Date` = DT );
IF ( FND = 0 ) THEN
INSERT INTO `transform_data_1`( `ITEM_ID`, `LOCATION_ID`, `Transaction Date`, `Inventory Count` )
VALUES( ITEMID, LOCID, DT, PREV_RATE );
ELSE
SET PREV_RATE = ( SELECT `Inventory Count` FROM `transform_data_1`
WHERE `ITEM_ID` = ITEMID AND `LOCATION_ID` = LOCID AND `Transaction Date` = DT );
END IF;

SET DT = DATE_ADD( DT, INTERVAL 1 DAY );
END WHILE;
END LOOP item_ids_loop;
END

This code works for just 1 month Date range . But if I try running it for the entire date range , it times-out and fails. Considering I am working on a big dataset and adding a row for each day for each Item and Location it takes a lot of processing time. For now I have broken the dataflow into different months and eventually combining all the dataflows into one.

However this is not dynamic and I will have to keep adding new dataflows for each month over time. I just want to achieve this in a single dataflow.

So  I was wondering if there was a more efficient way to achieve this.

 

Thanks,

Prajwal

Best Answer

  • Property_Ninja
    Accepted Answer

    Hi Prajwal,

     

    We have created a dim_date dataset that we can use in these situations with all dates from 1/1/2000 to 12/31/2030. When we want all dates even if data does not appear. Then, in your case we would do something like

    select a.`date`
    ,b.Item
    ,b.Location
    from dim_date a
    cross join your_table b
    where a.`date` <= current_date and a.`date` >= (select min(`Date`) from your_table)

    This will give you a table with all possible dates and attributes for each day of each month. You can then do a left join on Date, Item, and Location to get the Count. You can do something like ...

     

    ifnull(Count,0) as Count to mark your days with no activity.

     

    Hope this helps,

     

    Brian

Answers