[Bizgres-general] Off-list Re: [ENG] Re: Statement Queuing take II - Resource Scheduling (Running with cost and cursors)

Mark Kirkwood mkirkwood at greenplum.com
Mon Aug 7 02:22:20 UTC 2006


Luke Lonergan wrote:
>
> The purpose of having our own timeslicing scheduler instead of just
> using the one built into the OS is to avoid the I/O conflicts that the
> OS is not aware of, but we are.
>
>   
Right, but if our time slicing conflicts or confuses the OS's, or is not 
really lessening the resource load, then we are losing out.

> Point by point:
> - Postgres can and will only use O(1GB) of RAM per query within it's
> process segment and we can establish the amount beforehand.  Therefore
> we can control the number of runnables in the queues to ensure that
> there will not be swapping.
>
>   

If we have enough resources to let K elements of a size N queue be time 
sliced, we have enough to let K be active (i.e unsliced - managed by the 
queues "active count" limit). The reasoning for this is essentially my 
point last time - time slicing K queries each of  which needs M of a 
resource is typically going to use KM of the resource - and letting 'em 
run as usual will typically use KM of the resource - probably more 
efficiently. 

Now we could save to disk (i.e serialize or create continuation) the 
resource(s) at the end of each query's time slice - (i.e. the sort set, 
executor state and data for the query progress, plus the partially 
generated tuple results sets, hash joins, loops etc) - but this could 
well cause more problems than we are solving (e.g - it might take KM of 
the resource to save each query's M if it!).  However even assuming it 
is not that bad - for some resources (e.g. memory), it may not be 
released back to the OS in time for the next few time slices to use! 
(I've just noticed that Gavin has just discussed this issue more completely)

> - I/O readahead buffering is O(16MB) and is retained in the OS I/O cache
> unless there is memory pressure.  Even if an I/O buffer is flushed, it
> takes milliseconds to refill one, so our intra-queue timeslicing can be
> chosen at a number like 30 seconds to ensure that we're not wasting
> work.
>
>   
With respect to read-ahead, ISTM that one of the main reasons for this 
queuing work is that there is typically always memory pressure. With 
respect to the buffer cache,  30 seconds is enough time for query N to 
completely flush the buffer cache records that query N-k was using, thus 
every query  will be typically seeing a completely cold cache for its 
time slice. This is going to increase system resource usage and 
generally result in poorer performance than executing the queries 
sequentially.

> Do these address the issues you identified?
>
>
>   

I don't think so (despite the fact that it would be fun to program :-) ).

I think the suggestion for defining queues of statement workload is 
sound, and the current queue design with its various limits (cost, 
active count, +) should work well with it, and do pretty much what you 
are wanting to achieve. We probably need to thrash around the whole user 
-> queue vs statement -> queue association a bit more to see if we can 
somehow get the best of both approaches! (I think Gavin hinted at this 
too...)


Cheers

Mark


More information about the Bizgres-general mailing list