If multiple data sets that are joined, for example, each one is sampled individually and the joint dataset is most likely empty
Any adivce on how to use it better?
When you are testing the logic, I would add on a lot of filters to narrow down the data sets. Rather than sales accross the globe, focus on one country, or even one city. If you apply the same filters to each dataset as you are "previewing" the data; you should be able to see how the joins are performing. Once you are happy with it, remove those filters and let the data set run. Then validate the output dataset.
If I understand correctly, you advice me to create sampled versions of my input data sets for ETL debugging and avoid using the Run Preview feature altogether? Cause for that's what we've beeen doing here and are very pleased with how it works. We would, however, find it much smoother to use a run preivew button that uses some smart sampling mechanism (event random would be better than simply taking the first 100 or so rows)