[Bizgres-general] Optimizer: Default assumption of table size

Luke Lonergan llonergan at greenplum.com
Sun Oct 16 04:47:43 GMT 2005


All you optimizer guys: Is there a downside to making the assumption that
tables are 2x the size of available RAM (in ipc/shmem)?  If so, what is it?

>From what we (Joan Hoenow and I) can tell, the current PostgreSQL assumption
that tables are size 1 row unless proven otherwise is appropriate for an
OLTP type environment where tables start at 0 and grow over time (or not)
and the nested loop join plan would be the best for tiny tables.

In the BI/DW use-case though, the assumption of tiny tables can be
disastrous because tables are often created large (CTAS or SELECT INTO or a
big COPY).  A nested loop join on large tables may never complete, and often
causes an application to hang.

So - it seems that if there is no disastrous downside, the appropriate thing
to do is assume that tables start out large, say some low multiple of shared
memory.  That would prevent the well-known and often catastrophic case where
a nested loop join is chosen on large tables, while only making some small
table queries run a bit too slow.  ANALYZE becomes an optional and
infrequent mechanism, instead of the current "do or die" requirement.

Even better is keeping fresh COUNT statistics on all tables, a topic that
has been discussed a lot on the postgres hackers list.

I think the big table assumption is a must-have feature for BI/DW and we
should implement it.

- Luke




More information about the Bizgres-general mailing list