Recursive ETL

Jones01
Jones01 🟑
edited August 16 in Dataflows

Hi guys,

I am pulling data from from our db looking like

Date|Name|Value

DATE and NAME is the unique key

I sort of understand the recursive ETL and it will append and replace old data with new data but I am missing the point slightly. Should the new data do a complete pull from the db or a subset?

I am looking to just get changes from our db since the last pull and merge those in rather than keep pulling years worth of data all of the time?

Having just watched a domo domopalooza am I right in thinking my query to our db would just pull changes say in the last 10 days to make the set smaller?

Any help would be appreciated.

Best Answer

  • GrantSmith
    GrantSmith Indiana πŸ₯·
    Answer βœ“

    Correct, you'd only pull in the changes that you'd need to be applied to keep your data processing quicker (less records). You'd need an initial pull of all your data to establish your baseline but then can just pull in the records that changed since the last time you've run it.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**

Answers

  • Jones01
    Jones01 🟑

    @GrantSmith great thanks.

    Yes I believe I have this all working now. Pulling changes from the source every 30 mins and checking two keys on the records seems to be working.

  • Jones01
    Jones01 🟑

    My dataset has about 5.6 million records and the recursive etl to bring in new records takes 35 seconds.

    Does that sound reasonable?

  • GrantSmith
    GrantSmith Indiana πŸ₯·

    Yeah that sounds about reasonable. The one caveat to recursive dataflows is the don't scale the best as the larger the dataset grows the longer it will take to run the ETL (more data to transfer means more time).

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**