[Bizgres-general] Performance - WAL bypass + parse

Luke Lonergan llonergan at greenplum.com
Thu Jun 2 04:39:25 GMT 2005


Mark,

On 6/1/05 6:56 PM, "Mark Kirkwood" <markir at paradise.net.nz> wrote:

> Decided to check this out after seeing the message on -hackers,
> very nice, more speed is always good!

Cool!  Glad to see someone else who has the pain...
 
> One thing I am wondering about - what do you see if you load files
> bigger than the RAM size of the machine (e.g. 4G in your case)? Does the
> performance difference still persist? I am raising this because:

Sure - often.
 
> i) It is a more realistic DW scenario
> ii) 146M is only 6x the disk cache of 3 drives (assuming 8M for each)
> iii) You dont get much chance to measure the impact of the bgwriter or
> checkpointer (speed may be throttled on these?)

Understood.  We routinely load large data (100GB+).  We've traced the issues
to CPU consumption without question, but I'm happy to prove it.

The big issue is: we're not anywhere near saturating the I/O subsystem for
loading or scanning data with Postgres, and it has nothing to do with the
Executor or the I/O interface, it's just lack of optimized code paths in
unexpected places.  We're going to fix that :-)
  
> If you guys have not got the time to do some experiments along these
> lines, I could look into it, however I don't have such flash HW ... :-).

Well - the fastest case is the single column case with parse improvements
and WAL bypass, so I'll run that on a 3GB file (1.5x memory, it's a 2GB
machine).

Input file size: 2,909,128,332 bytes

Sample row:
card following server to includes 128 mesh to any away free 2 Therefore the
turn visual includes can Find a drastically fast com Digital Mbyte

------- with fast parse and WAL bypass -----------
Database directory size after loading: 3,757,084,000 bytes

Time to load using psql copy: 104.191 seconds
Rate = 2909/104.191 = 27.92MB/s

I've attached the test program we use FYI.  The data generator defaults to
15 columns, if you want to change it, edit the file data-generator/main.c
and change the lines that look like this:

  numcols = 1;
  col_types[0]  = VARCHAR; col_mins[0]  = 24; col_maxes[0]  = 26;

To suit your needs.  You should also change the table definition in
create_db.sh and the ctl file generation in load_data.sh if you change the
number of columns.

- Luke

-------------- next part --------------
A non-text attachment was scrubbed...
Name: IVP.tgz
Type: application/octet-stream
Size: 12962 bytes
Desc: not available
Url : http://pgfoundry.org/pipermail/bizgres-general/attachments/20050602/28e5281c/IVP.obj


More information about the Bizgres-general mailing list