Display Row count at every step in Magic ETL

Currently if you want to know your row count in Magic ETL at any stage other than the out put, you have to place a few temporary tiles to get there. I.E. add a rank and window tile to establish a Row Number column then aggregate with a group by to get the max row.

Would really love if each tile could show the rows contained within, so if you're losing or blowing up data it's easier to find where the changes are happening. Would save a TON of time.

I realize there may be limitations because Magic can only preview 400k rows - so just like on a table card if you're over the limit of data that can load, it would be fair to say so. I'd rather be limited to how much data I can know about than not get any row counts at all.

Broadway + Data
12
12 votes

Active Β· Last Updated

Comments

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR πŸ”΄

    @RobynLinden like ... just in the preview? or as an output dataset? would you want it as part of the execution details page?


    in the immediate term, I just always tell people to use a GROUP BY or REMOVE DUPLICATES tile before they do a JOIN and that solves any row growth problems... unless there's a NULL in the column. Don't join on NULL.

    Jae Wilson
    Check out my πŸŽ₯ Domo Training YouTube Channel πŸ‘¨β€πŸ’»

    **Say "Thanks" by clicking the ❀️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • Right, in the preview - maybe right here.


    I want to know how many rows are contained in the step. So if I have 1000, I blow it up to 2000 on a join, but then land with 500 due to a group by -- just tell me that in the header of each tile after I run a preview. If 500 is what I wanted, then I'm happy.


    Broadway + Data
  • Ashleigh
    Ashleigh Florida 🟣

    Love this idea! I always have to put output datasets just to check row count when I am troubleshooting to make sure I did not blow up a join.

  • This would also be helpful to see as an additional column in the run history so we can compare "Rows Processed" to "Output Rows" at each step.