Effect of adding new column to imported data on an existing dataset/dataflow

Good afternoon! Not sure if I'm using the correct search terms but we haven't been able to find anything related to this question by one of our managers, and before changing anything we wanted to confirm whether or not it would cause any dataset/dataflow or card issues.

 

We currently upload a file to Box each week, which is then imported into a dataset that's converted to a dataflow (certain columns not being used are removed, that's all), & then joined with another dataflow that serves as the final dataflow for cards.

 

Would adding a column to the original Box file that's NOT going to be used in any of the dataflows cause issues, or will it be fine since it's not being pulled utilized (basically, it's going to be imported into the dataset, nothing further)? We're trying to eliminate as much manual manipulation for uploads to Box where an API is not available.

 

If you need anything further to assist on this, please don't hesitate to ask! And as always, thanks for your time and any assistance/information you can provide!  Smiley Happy

Best Answer

  • Valiant
    Valiant 🔵
    Accepted Answer

    I haven't tested this specifically with a Box dataset. However, we do this quite often with ODBC datasources in our environment. It has yet to cause any issues on our end with cards/dataflows. If we need the newly added columns, it's a matter of opening the dataflows and adding those columns where needed.

     

    Hope that helps,

    ValiantSpur

Answers

  • Thanks, @Valiant, appreciate your reply! Obviously, we can download the existing Box file and attempt to add the column to see what, if anything, happens and revert back if there are issues so I think we'll give it a shot based on your experience with regards to ODBC datasets. As always, thanks for your assistance and clarification!

  • @John-Peddle You're very welcome. Glad I could help!

  • I use Box religiously, and have not had any issues when input files have additional fields.  

     

    As @Valiant says, even if you want to use those fields, the update to the dataflow is quick and easy.  

     

    If you have datasets with formats that might change, it would be wise to build in a QC mechanism which will alert you if the input columns do change, even though they don't cause an issue.  It can be as simple as verifying the number of columns which upload.  I also sometimes include QC outputs in my dataflows so that I can check in and make sure everything is ship shape.