Redshift order by

Reply
Highlighted
Yellow Belt

Redshift order by

My first try at using Redshift.  Trying to use ORDER BY and not getting the expected result.  The output table should be sorted by column "a".

 

Here are the steps:

 

1. Create tbl_data

SELECT 3 as "a", 30 as "b"
UNION ALL
SELECT 3 as "a", 30 as "b"
UNION ALL
SELECT 2 as "a", 20 as "b"
UNION ALL
SELECT 2 as "a", 20 as "b"
UNION ALL
SELECT 1 as "a", 10 as "b"
UNION ALL
SELECT 1 as "a", 10 as "b"

2. Order and output as pd_a1

SELECT *
FROM "tbl_data"
ORDER BY "a"

Here is the result:

https://drive.google.com/open?id=1s6L4LDPijFJY6Hjrp9F6Nsh9JBeo6yH3

 

Thanks for your attention.

Highlighted
Major Blue Belt

Re: Redshift order by

I have the same problem, but it seems to only be a problem in the preview as the end result comes out correct.

Highlighted
Yellow Belt

Re: Redshift order by

For me the preview and the output are the same.

Highlighted
Black Belt

Re: Redshift order by

This has to be a bug.  I am having the same issue that @BruceP  is having with the order by clause.

 

Have you submitted a ticket yet?  


______________________________________________________________________________________________
“There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
______________________________________________________________________________________________
Highlighted
White Belt

Re: Redshift order by

I wrote a simple query to order by closedate and got some strange but consistent results. I tried to dig into the database details of how this is being sorted, like sortkey and distkey, but didn't really know what i was doing, and creating a temp table and applying the sortkey/distkey using custom sql didn't seem to make a difference. Any ideas? orderby date.PNG

Highlighted
Yellow Belt

Re: Redshift order by

I am having issues with the order_by too,
is it a bug, eventually? any solution?

Highlighted
White Belt

Re: Redshift order by

Recently I have had the same problem!

My objective was to perform a cumulative sum of an ordered dataset. I tried to run Order By with different kinds of data type (date, string of date, row number column ecc.) but for all the tests the same result = no order.

 

I choose to switch to the ETL dataflow and perform a window & ranking function.

Maybe there is any window functions that coul help to force the ordering  in redshift (?)

Highlighted
Black Belt

Re: Redshift order by

Hi Kevin,

 

I'm not sure why this works.  But, I was getting the same odd sorting that you were:redshift order by.png

 

 

However, if I added a rank function to it, the sort worked:

redshift rank.png

 

 

... or it did when I included one extra column???

redshift rank final.png

 

 

Very odd.  In conclusion... I would say to cautiously use the ORDER BY in redshift.  If you do, you will want to thoroughly check your results.  This has to be a bug.  I was not able to find any documentation from redshift stating that the order by clause was broken, but it seems to be in whichever version Domo is using for these dataflows.


______________________________________________________________________________________________
“There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
______________________________________________________________________________________________
Highlighted
Yellow Belt

Re: Redshift order by

Hey everyone - here's the explanation provided by the support team at Domo - "The ORDER BY does apply for the purpose of the query that you use it in, but Redshift returns the data back to the preview in whatever order it wants. Unfortunately the ORDER BY only applies to the query itself and not the return data."

Highlighted
Purple Belt

Re: Redshift order by

Hi everyone,

 

I am a Tier II Technical Advisor on the Domo Support team, and I actually field this question about Redshift a lot.

 

What you observe about the behavior with ORDER BY in Redshift is actually expected:

 

When Redshift processes a query that contains an ORDER BY clause, it does honor the specified order while processing the logic of that query (i.e. RANK() functions and the like do work properly, and as expected). However, when it has finished running the query, it re-orders the data to optimize the actual storage of the data (Redshift stores data tables distributed across many nodes, and splits the data up according to it's own storage optimization methods).

 

When Domo pulls up a preview of the data, we load it out of the table after it has been re-ordered and stored by Redshift.

 

When working with Redshift, it's important to include an ORDER BY clause in every individual query that relies on that specific ordering of the data, and never to rely on the order persisting from a prior transform.

 

In fact, this is best-practice when working with MySQL as well, in case somebody alters your dataflow without understanding that the order of a transform's output is important.

 

EDIT: I'm also getting this information added to our Dataflow Troubleshooting Knowledge Base article here: https://knowledge.domo.com/Prepare/DataFlow_Tips_and_Tricks/DataFlow_and_DataFusion_Troubleshooting_...

 

Thanks for bringing it up for discussion, and happy data processing!

 

Starlord
**Say “Thanks” by clicking the “heart” in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"
Announcements
Coming Soon: Weekly Virtual User Meet-ups to answer all your questions! Stay tuned for more details to come this week.