Timing Out Redshift
Backstory: We have a datset provided by an outside consultant which is an export from SPSS.
11.3K rows of 1174 columns.
The columns contain replies to a survey.
The results are in categorical values, for example the answers to How much do you agree with.... are coded as 1 strongly disagree-5 strongly agree.
The end user of the dataset wants the values recoded to store the text rather than the variable, and the columns renamed from the question number to a plaintext of the question (which breaks my relational database thinking a bit, but I understand the reasoning).
We have a datamap file which give the plaintext meaning of the different values, and I have to recode the columns
I had to split the dataset into 6 subsets to get an SQL data set to run.
I've managed to get 4 of the 6 working as MySQL dataflows, but 2 of them will not run.
my coding is 250-odd select statements of select the appropriate value from the datamap based on the possible responses
eg of a few lines with identifying information removed:
(select "Selection" from "dataset" where "Source Value"="Q50_21" and "Check"='Yes/No') as "Value Q50 From the list below, plea",
(select "Selection" from "dataset" where "Source Value"="Q51" and "Check"='Q51') as "Now please think about",
(select "Selection" from "dataset" where "Source Value"="Q52_1" and "Check"='Appeal') as "Value Q52 X- How appealing do",
When I run as MySQL I get Row size is wider than MySQL supports. Please reduce the row size or number of columns.
When I run in Redshift, it runs for an hour and makes 74% progress and then: The database reported a syntax error. [Amazon](600001) The server closed the connection.
I'm pretty sure I'm not doing it in the most efficient way possible, but this was a once off survey which won't be repeated, so the dataset will never be changed.
I literally need it to run correctly once, and to be honest after two days I don't care about efficiency anymore, I just want it done.
Any help anyone can suggest would be great.