[Bizgres-general] Sort performance improvements..

Simon Riggs simon at 2ndquadrant.com
Wed Mar 29 08:30:40 GMT 2006


On Tue, 2006-03-28 at 21:28 -0800, Ayush Parashar wrote:

> Simon, can you please throw some more light on the enhancements made..
> 
> I have included some performance test results for external sort that proves
> the dramatic improvements.
> 
> Query: 
> select count(*) from (select L_PARTKEY from lineitem order by L_PARTKEY) as
> a;
> 
> Time taken in seconds for work_mem in MBytes:
> ------------------------------------------------------------------------
> work_mem         Bizgres 0.9        Bizgres 0.8
> ------------------------------------------------------------------------
> 16            788.9            1176.7
> 32            531.8            986.7
> 64            439.4            931.9
> 128            383.5            912.1
> 256            391.0            924.7
> 512            407.4            1011.3
> ------------------------------------------------------------------------

These results are great and give a good indication of the overall
performance improvements achieved. Improvements explained below.

The user need not take any action to take advantage of these new tuning
features.

Overall, remember to use partitioning to reduce the potential size of
sorted data and set work_mem as high as possible - since that may allow
the optimizer to choose a more efficient plan than sorting if one
exists.

- - -

There are some subtle effects in the above test results that need
analysis before we can draw recommendations for general use: The above
results seem to show that 128 MB is the optimal amount of memory for
sorting, which it is *for that query*. If you don't know how big a query
is ahead of time, then giving it more than 128 MB memory can still be
beneficial especially since unused memory will not be kept in use any
longer than necessary now. Giving it less might be OK too - the above
test is performed on just under 1 GB data, so we might expect that a
sort of 100 MB data would have a less noticeable effect from using lower
work_mem values. 

There are a number of recent improvements to sort:
- algorithmic improvements allow a dynamically increasing number of runs
as work_mem increases, minimizing overall sort duration for large sorts
that are larger than 6*work_mem. Lots of specific tuning, but overall
the main result is: give sorts big work_mem and they will go faster

- the first column in the sort key is placed at the front of sort data
during sorting. This will greatly improve sorts where the first column
provides high differentiation (i.e. is somewhat unique), or there is
only one column.

- the optimizer now recognises when it needs to prepare the sort result
for random access and when it does not; sort-merge joins and scrollable
cursors require this - other SELECT statements are now better optimised

- sort optimizer costs have changed to reflect algorithmic changes, so
in some cases plans may change for existing large queries

If all your sort data fits within work_mem then Bizgres will perform a
fast, in-memory qsort. If data is larger than memory we use a 3-phase
external sort:
1. initial run building
2. intermediate merge 
3. final merge. 
During initial run building the more work_mem you have the less work
you'll do in the second phase - possibly none at all. Intermediate
merging will only be required when the amount of data to be sorted
exceeds the half the square of work_mem: 
	0.5*(work_mem)^2
This is *not* the same thing as saying that the optimum work_mem to use
for sorting is the square root of the sort data, but its a strong hint
that big data needs big work_mem settings.

The final merge phase may overlap with other sorts when there are more
than one sort per SELECT statement. At this stage we know exactly how
much data has been sorted, so we are able to use that to optimize the
memory usage during final merge. This greatly reduces the amount of
physical memory in use by a query and allows more concurrent users the
chance to execute with large work_mem also. (Currently, a Bizgres only
feature).

Sorts are frequently required for ORDER BY, GROUP BY, DISTINCT,
count(DISTINCT) and merge joins, so multiple sorts in large queries are
frequently possible.

Best Regards, Simon Riggs



More information about the Bizgres-general mailing list