My first try at using Redshift. Trying to use ORDER BY and not getting the expected result. The output table should be sorted by column "a".
Here are the steps:
1. Create tbl_data
SELECT 3 as "a", 30 as "b" UNION ALL SELECT 3 as "a", 30 as "b" UNION ALL SELECT 2 as "a", 20 as "b" UNION ALL SELECT 2 as "a", 20 as "b" UNION ALL SELECT 1 as "a", 10 as "b" UNION ALL SELECT 1 as "a", 10 as "b"
2. Order and output as pd_a1
SELECT * FROM "tbl_data" ORDER BY "a"
Here is the result:
Thanks for your attention.
This has to be a bug. I am having the same issue that @BruceP is having with the order by clause.
Have you submitted a ticket yet?
I wrote a simple query to order by closedate and got some strange but consistent results. I tried to dig into the database details of how this is being sorted, like sortkey and distkey, but didn't really know what i was doing, and creating a temp table and applying the sortkey/distkey using custom sql didn't seem to make a difference. Any ideas?
Recently I have had the same problem!
My objective was to perform a cumulative sum of an ordered dataset. I tried to run Order By with different kinds of data type (date, string of date, row number column ecc.) but for all the tests the same result = no order.
I choose to switch to the ETL dataflow and perform a window & ranking function.
Maybe there is any window functions that coul help to force the ordering in redshift (?)
I'm not sure why this works. But, I was getting the same odd sorting that you were:
However, if I added a rank function to it, the sort worked:
... or it did when I included one extra column???
Very odd. In conclusion... I would say to cautiously use the ORDER BY in redshift. If you do, you will want to thoroughly check your results. This has to be a bug. I was not able to find any documentation from redshift stating that the order by clause was broken, but it seems to be in whichever version Domo is using for these dataflows.
Hey everyone - here's the explanation provided by the support team at Domo - "The ORDER BY does apply for the purpose of the query that you use it in, but Redshift returns the data back to the preview in whatever order it wants. Unfortunately the ORDER BY only applies to the query itself and not the return data."
I am a Tier II Technical Advisor on the Domo Support team, and I actually field this question about Redshift a lot.
What you observe about the behavior with ORDER BY in Redshift is actually expected:
When Redshift processes a query that contains an ORDER BY clause, it does honor the specified order while processing the logic of that query (i.e. RANK() functions and the like do work properly, and as expected). However, when it has finished running the query, it re-orders the data to optimize the actual storage of the data (Redshift stores data tables distributed across many nodes, and splits the data up according to it's own storage optimization methods).
When Domo pulls up a preview of the data, we load it out of the table after it has been re-ordered and stored by Redshift.
When working with Redshift, it's important to include an ORDER BY clause in every individual query that relies on that specific ordering of the data, and never to rely on the order persisting from a prior transform.
In fact, this is best-practice when working with MySQL as well, in case somebody alters your dataflow without understanding that the order of a transform's output is important.
EDIT: I'm also getting this information added to our Dataflow Troubleshooting Knowledge Base article here: https://knowledge.domo.com/Prepare/DataFlow_Tips_and_Tricks/DataFlow_and_DataFusion_Troubleshooting_...
Thanks for bringing it up for discussion, and happy data processing!