[Bizgres-general] 8.0.3 update applied to bizgres, patch for better plan costing

Luke Lonergan llonergan at greenplum.com
Sat Jul 23 15:34:27 GMT 2005


I've applied the 8.0.3 update to bring the bizgres postgresql source from
8.0.2 to 8.0.3.  I've also applied a patch (Tom Lane's) that improves plan
costing to reduce emphasis on startup cost (Fuzzy_path_cost improvement, see
discussion below).

Testing and bugfixes will continue through next week prior to another Beta
release.

-----------------------------------------------------------
Summary of postgres development patches in Bizgres:
-----------------------------------------------------------
- Bitmap scan
- Fuzzy_path_cost improvement

-----------------------------------------------------------
Features in Bizgres and not (yet?) in postgres development:
-----------------------------------------------------------
- COPY speed improvements
- Partitioning 1, Constraint Exclusion

-----------------------------------------------------------
Features exclusive to Bizgres:
-----------------------------------------------------------
- KETL - the open source (Extract/Transform/Load) system
- Loader - the java-based data loading tool

-----------------------------------------------------------
Features bundled with Bizgres:
-----------------------------------------------------------
- JasperReports - the open source reporting development
  library

-----------------------------------------------------------
Known issues:
-----------------------------------------------------------
- we've identified a bug in bitmap scan for which we'll be issuing a patch
for both postgres devel and bizgres.

- Luke


-----------------------------------------------------------
Fuzzy_path_cost discussion
-----------------------------------------------------------

Tom Lane wrote:
> Dawid Kuroczko <qnex42 at gmail.com> writes:
>> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;

>> Limit  (cost=15912.20..15912.31 rows=1 width=272)
>> ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)

>> If I set enable_hashjoin=false:

>> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT
1;

>> Limit  (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216
>> rows=1 loops=1)
>> ->  Nested Loop Left Join  (cost=0.00..144295895.01 rows=47044336
>> width=272) (actual time=74.204..74.204 rows=1 loops=1)

> This is quite strange.  The nestloop plan definitely should be preferred
> in the context of the LIMIT, considering that it has far lower estimated
> cost.  And it is preferred in simple tests for me.

After a suitable period of contemplating my navel, I figured out
what is going on here: the total costs involved are large enough that
the still-fairly-high startup cost of the hash is disregarded by
compare_fuzzy_path_costs(), and so the nestloop is discarded as not
having any significant potential advantage in startup time.

I think that this refutes the original scheme of using the same fuzz
factor for both startup and total cost comparisons, and therefore
propose the attached patch.

========== Comments from testers ==============
Works great!!!

With LIMIT below 4 000 000 rows (its 47-milion row table) it prefers
nested loops, then it starts to introduce merge joins.

I've just built a patched version as well and it appears to be doing
what I think is the right thing now.  I.e. actually picking the
plan with the lower cost.

> Looks good. I think it explains a few other wierd perf reports also.

Could be.  I went back to look at Sam Mason's report about three weeks
ago, and it definitely seems to explain his issue.  The "fuzzy cost
comparison" logic is new in 8.0 so it hasn't had all that much
testing...

            regards, tom lane




More information about the Bizgres-general mailing list