How to get Batch_Run_ID with google sheets?

Hallo, I am looking for a solution to a problem where I need to compare data from today with yesterday's. But the date column is not provided in the data. Also, google sheets does not get batch run id and date. So I have no reference point to compare data. I tried recursive data modelling, but could not solve the problem. Could somebody please give me some idea to reach the solution for this problem? I just need to get some reference point on ETL level to compare data for two consecutive days. Thanks.

Best Answers

  • Ashwin_SG
    Ashwin_SG ⚪️
    Accepted Answer

    Hallo @GrantSmith! Yes I did, as I thought it would then get updated automatically everyday and that would be my point of reference for comparison. But it only appears in the first update (or first batch) as I had get rid of the date column for the append to happen in next batch. So, the appending is working but date column is empty. It brought me back to same situation.

  • GrantSmith
    GrantSmith Indiana 🔴
    Accepted Answer

    You'd need to add it to your input dataset as the first step, then when you go to add it to your recursive dataset it should have the different values stored in there for when the input is added. Do you have a screen shot of your ETL?

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤
    Accepted Answer

    @Ashwin_SG if you use a recursive dataflow to datestamp your ETL based on when you run the ETL, it can be done... but comes with risks you need to consider.


    If you assume that you ONLY run your recursive once a day then you kind of defeat the purpose of a recursive (which should allow you to run your dataflow as often as you want without detrimental impact).


    You could extend your logic to keep the last set of rows per day. that isn't part of the standard KB documentation but should be easy enough to implement using a rank / window function after you APPEND historical and new datasets together.

    If it were me, i'd alter the googlesheet to include a now() function. I assume that would always calculate the time the dataset was ingested. it certainly would timestamp the raw dataset based on ingest time instead of ETL execution time.

Answers

  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @Ashwin_SG

    When you tried your recursive dataflow did you add a new column / constant contained the current date and time before your output dataset?

  • Thank you Grant. I found the solution for my problem. I don't know how would I solved this problem without your advice. Since I did not had any date related column in my dataset, I built a recursive dataflow and added Date column in there. Then I took that dataset into another ETL and did comparison for today and yesterday's data with reference to that date column. So, as my google sheets gets updated, so does the date column and it appends in the process.

  • GrantSmith
    GrantSmith Indiana 🔴

    @Ashwin_SG Glad you got it sorted out! If you could accept my solution so that others who are searching for your issue could find it more easily I'd appreciate it.