[Bizgres-general] A bitmap index access method is available in
bizgres CVS tip
Mark Kirkwood
markir at paradise.net.nz
Mon Dec 19 02:13:30 GMT 2005
Jie Zhang wrote:
> A bitmap index access method is now available in bizgres CVS tip. The
> bitmap index is a highly efficient structure to index high dimensional
> data, especially when the attributes to be indexed have very low
> cardinalities.
>
> To create a bitmap index on desirable attributes, use the following command:
>
> create index <index_name> on <table_name> using bitmap(<att_list>);
>
> Here, <att_list> can be a single attribute or multiple attributes in
> table <table_name>. Multiple attributes are separated by a comma.
>
> Note: to use this feature, you need to do initDB.
>
Very cool,
I thought I would have a play (see below).
What I am seeing is that vanilla Pg 8.1.1 with btree indexes on several
low cardinality columns seems to be considerably faster than Bizgres
with on-disk bitmaps (for my test case). I would have thought that it
was an ideal candidate to show the advantages of on-disk representation
(several low cardinality columns) - any ideas?
The only contributors I can think of that are helping 8.1.1 are the
count optimization and the virtual tuples (or are these in Bizgres
already?), however previous testing did *not* show a factor of 2 boost
from these two alone.
Both products were built with the same flags on FreeBSD 6.0, and the
servers were configured with the same postgresql.conf (essential changes
being shared_buffers=12000, work_mem=20480, effective_cache_size=100000).
Bizgres was CVS HEAD from Dec 15 (NZST).
Cheers
Mark
The test
========
The table is pretty basic:
--
-- Create a table with columns for testing bitmap indexes.
--
CREATE TABLE bitmaptest (
id INTEGER NOT NULL,
val0 INTEGER,
val1 INTEGER,
val2 INTEGER,
fil TEXT
);
Indexes:
--
-- Index the table.
--
CREATE INDEX bitmaptest_id ON bitmaptest (id);
CREATE INDEX bitmaptest_val0 ON bitmaptest USING bitmap (val0);
CREATE INDEX bitmaptest_val1 ON bitmaptest USING bitmap (val1);
CREATE INDEX bitmaptest_val2 ON bitmaptest USING bitmap (val2);
Interestingly, after index creation you see:
bitmap=# \d
List of relations
Schema | Name | Type | Owner
--------+------------------------------------+-------+----------
public | bitmaptest | table | postgres
public | bm_internal_t2200bitmaptest_val0_0 | table | postgres
public | bm_internal_t2200bitmaptest_val1_0 | table | postgres
public | bm_internal_t2200bitmaptest_val2_0 | table | postgres
(4 rows)
Once loaded with data (default 10000000 rows), the (hopefully relevant)
column statistics look like;
bitmap=# SELECT attname, n_distinct, correlation FROM pg_stats
WHERE tablename LIKE 'bitmaptest';
attname | n_distinct | correlation
---------+------------+-------------
id | -1 | 1
val0 | 10 | 0.0923104
val1 | 10 | 0.105123
val2 | 100 | 0.0475281
fil | 1 | 1
(5 rows)
So this is a table with several low cardinality columns, each of which
has a pretty low correlation.
The test query is:
--
-- Select from a table using many non-selective conditions.
--
SELECT count(*)
FROM bitmaptest
WHERE val0 = 1
AND val1 = 4
AND val2 = 79
;
The estimated selectivity = 0.1 * 0.1 * 0.01 = 0.0001, so we are
expecting 10000000 * 0.0001 = 1000 rows (assuming no cross column
correlation, which is true - data specifically generated to avoid this.
The real number is 1020).
Plan is (For Bizgres current and 8.1.1):
QUERY PLAN
---------------------------------------------------------------------------------------------------
Aggregate (cost=16098.20..16098.20 rows=1 width=0)
-> Bitmap Heap Scan on bitmaptest (cost=12291.38..16095.62
rows=1033 width=0)
Recheck Cond: ((val2 = 79) AND (val0 = 1) AND (val1 = 4))
-> BitmapAnd (cost=12291.38..12291.38 rows=1033 width=0)
-> Bitmap Index Scan on bitmaptest_val2
(cost=0.00..1017.21 rows=96631 width=0)
Index Cond: (val2 = 79)
-> Bitmap Index Scan on bitmaptest_val0
(cost=0.00..5202.34 rows=956668 width=0)
Index Cond: (val0 = 1)
-> Bitmap Index Scan on bitmaptest_val1
(cost=0.00..6071.34 rows=1116668 width=0)
Index Cond: (val1 = 4)
(10 rows)
Performance:
Completely uncached (after filesystem umount and mount):
Bizgres current 17.5 s
Postgres 8.1.1 9.1 s
Here are the EXPLAIN ANALYZEs for the two products:
Bizgres current:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
--------------------------
Aggregate (cost=16098.20..16098.20 rows=1 width=0) (actual
time=17954.446..17954.452 rows=1 loops=1)
-> Bitmap Heap Scan on bitmaptest (cost=12291.38..16095.62
rows=1033 width=0) (actual time=10925.598..17943.158 rows=1019
loops=1)
Recheck Cond: ((val2 = 79) AND (val0 = 1) AND (val1 = 4))
-> BitmapAnd (cost=12291.38..12291.38 rows=1033 width=0)
(actual time=10900.404..10900.404 rows=0 loops=1)
-> Bitmap Index Scan on bitmaptest_val2
(cost=0.00..1017.21 rows=96631 width=0) (actual time=5236.011..5236.0
11 rows=100251 loops=1)
Index Cond: (val2 = 79)
-> Bitmap Index Scan on bitmaptest_val0
(cost=0.00..5202.34 rows=956668 width=0) (actual time=2718.016..2718.
016 rows=999883 loops=1)
Index Cond: (val0 = 1)
-> Bitmap Index Scan on bitmaptest_val1
(cost=0.00..6071.34 rows=1116668 width=0) (actual time=2852.007..2852
.007 rows=999790 loops=1)
Index Cond: (val1 = 4)
Total runtime: 17958.290 ms
(11 rows)
Postgres 8.1.1:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
------------------------
Aggregate (cost=15744.90..15744.91 rows=1 width=0) (actual
time=8810.296..8810.301 rows=1 loops=1)
-> Bitmap Heap Scan on bitmaptest (cost=12119.68..15742.44
rows=984 width=0) (actual time=2240.410..8801.211 rows=1019 lo
ops=1)
Recheck Cond: ((val2 = 79) AND (val0 = 1) AND (val1 = 4))
-> BitmapAnd (cost=12119.68..12119.68 rows=984 width=0)
(actual time=2228.319..2228.319 rows=0 loops=1)
-> Bitmap Index Scan on bitmaptest_val2
(cost=0.00..546.48 rows=95852 width=0) (actual time=213.143..213.143
rows=100251 loops=1)
Index Cond: (val2 = 79)
-> Bitmap Index Scan on bitmaptest_val0
(cost=0.00..5293.02 rows=930004 width=0) (actual time=979.228..979.22
8 rows=999883 loops=1)
Index Cond: (val0 = 1)
-> Bitmap Index Scan on bitmaptest_val1
(cost=0.00..6279.69 rows=1103339 width=0) (actual time=946.390..946.3
90 rows=999790 loops=1)
Index Cond: (val1 = 4)
Total runtime: 8813.490 ms
(11 rows)
More information about the Bizgres-general
mailing list