Help on recursive dataflow

I have a recursive dataflow that keeps getting hung up from an issue on the join between the recursiveprod and the main data that is being groupby. The error it is showing is DUPLICATE COLUMN NAMES. So I go and change the column name and i am able to get it to run. it then takes that new unique column name and adds back in to the recursiveprod input and when I try to run it again i get that error. it just keeps going around and around.

I have tried dropping the column and it has still ran successfully. The issue is when I have done that the cost column that i am displaying get doubled every time it runs. So i seem to have to rename that column every time. Below you will see that March and May are significantly inflated, time 3 or 4 compared to the rest of the months.

What I am wanting to find out is there a way to have the recursive dataflow not needing the column to be renamed every single time in order for it to run, and how can I go back and get May and march to only reflect the single instance they were uploaded without the multiplication?

Snippet of Recursiveprod:

Best Answer

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🔴
    Accepted Answer

    Here is a screenshot of a recursive dataflow that I have set up and I don't have to manipulate each time. Basically, I am using the select columns tile to only have the ID column from my recursive dataset because that is what I am using to join on. I rename it at the join clause and filter where it is null. My append tile is set to only include shared columns.