[Bizgres-general] A bitmap index access method is available in bizgres CVS tip

Mark Kirkwood markir at paradise.net.nz
Mon Dec 19 02:13:30 GMT 2005


Jie Zhang wrote:
> A bitmap index access method is now available in bizgres CVS tip. The
> bitmap index is a highly efficient structure to index high dimensional
> data, especially when the attributes to be indexed have very low
> cardinalities.
> 
> To create a bitmap index on desirable attributes, use the following command:
> 
> create index <index_name> on <table_name> using bitmap(<att_list>);
> 
> Here, <att_list> can be a single attribute or multiple attributes in
> table <table_name>. Multiple attributes are separated by a comma.
> 
> Note: to use this feature, you need to do initDB.
> 

Very cool,

I thought I would have a play (see below).

What I am seeing is that vanilla Pg 8.1.1 with btree indexes on several 
low cardinality columns seems to be considerably faster than Bizgres 
with on-disk bitmaps (for my test case). I would have thought that it 
was an ideal candidate to show the advantages of on-disk representation 
(several low cardinality columns) - any ideas?

The only contributors I can think of that are helping 8.1.1 are the 
count optimization and the virtual tuples (or are these in Bizgres 
already?), however previous testing did *not* show a factor of 2 boost 
from these two alone.

Both products were built with the same flags on FreeBSD 6.0, and the 
servers were configured with the same postgresql.conf (essential changes 
being shared_buffers=12000, work_mem=20480, effective_cache_size=100000).

Bizgres was CVS HEAD from Dec 15 (NZST).

Cheers

Mark


The test
========

The table is pretty basic:

--
-- Create a table with columns for testing bitmap indexes.
--
CREATE TABLE bitmaptest (
   id   INTEGER NOT NULL,
   val0 INTEGER,
   val1 INTEGER,
   val2 INTEGER,
   fil  TEXT
);


Indexes:

--
-- Index the table.
--
CREATE INDEX bitmaptest_id ON bitmaptest (id);
CREATE INDEX bitmaptest_val0 ON bitmaptest USING bitmap (val0);
CREATE INDEX bitmaptest_val1 ON bitmaptest USING bitmap (val1);
CREATE INDEX bitmaptest_val2 ON bitmaptest USING bitmap (val2);

Interestingly, after index creation you see:

bitmap=# \d
                        List of relations
  Schema |                Name                | Type  |  Owner
--------+------------------------------------+-------+----------
  public | bitmaptest                         | table | postgres
  public | bm_internal_t2200bitmaptest_val0_0 | table | postgres
  public | bm_internal_t2200bitmaptest_val1_0 | table | postgres
  public | bm_internal_t2200bitmaptest_val2_0 | table | postgres
(4 rows)


Once loaded with data (default 10000000 rows), the (hopefully relevant) 
column statistics look like;

bitmap=# SELECT attname, n_distinct, correlation FROM pg_stats
          WHERE tablename LIKE 'bitmaptest';
  attname | n_distinct | correlation
---------+------------+-------------
  id      |         -1 |           1
  val0    |         10 |   0.0923104
  val1    |         10 |    0.105123
  val2    |        100 |   0.0475281
  fil     |          1 |           1
(5 rows)


So this is a table with several low cardinality columns, each of which 
has a pretty low correlation.


The test query is:


--
-- Select from a table using many non-selective conditions.
--
SELECT count(*)
FROM bitmaptest
WHERE val0 = 1
AND   val1 = 4
AND   val2 = 79
;

The estimated selectivity = 0.1 * 0.1 * 0.01 = 0.0001, so we are 
expecting 10000000 * 0.0001 = 1000 rows (assuming no cross column 
correlation, which is true - data specifically generated to avoid this. 
The real number is 1020).


Plan is (For Bizgres current and 8.1.1):

                                             QUERY PLAN
---------------------------------------------------------------------------------------------------
  Aggregate  (cost=16098.20..16098.20 rows=1 width=0)
    ->  Bitmap Heap Scan on bitmaptest  (cost=12291.38..16095.62 
rows=1033 width=0)
          Recheck Cond: ((val2 = 79) AND (val0 = 1) AND (val1 = 4))
          ->  BitmapAnd  (cost=12291.38..12291.38 rows=1033 width=0)
                ->  Bitmap Index Scan on bitmaptest_val2 
(cost=0.00..1017.21 rows=96631 width=0)
                      Index Cond: (val2 = 79)
                ->  Bitmap Index Scan on bitmaptest_val0 
(cost=0.00..5202.34 rows=956668 width=0)
                      Index Cond: (val0 = 1)
                ->  Bitmap Index Scan on bitmaptest_val1 
(cost=0.00..6071.34 rows=1116668 width=0)
                      Index Cond: (val1 = 4)
(10 rows)


Performance:


Completely uncached (after filesystem umount and mount):

Bizgres current  17.5 s
Postgres 8.1.1    9.1 s

Here are the EXPLAIN ANALYZEs for the two products:

Bizgres current:

 
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
--------------------------
  Aggregate  (cost=16098.20..16098.20 rows=1 width=0) (actual 
time=17954.446..17954.452 rows=1 loops=1)
    ->  Bitmap Heap Scan on bitmaptest  (cost=12291.38..16095.62 
rows=1033 width=0) (actual time=10925.598..17943.158 rows=1019
  loops=1)
          Recheck Cond: ((val2 = 79) AND (val0 = 1) AND (val1 = 4))
          ->  BitmapAnd  (cost=12291.38..12291.38 rows=1033 width=0) 
(actual time=10900.404..10900.404 rows=0 loops=1)
                ->  Bitmap Index Scan on bitmaptest_val2 
(cost=0.00..1017.21 rows=96631 width=0) (actual time=5236.011..5236.0
11 rows=100251 loops=1)
                      Index Cond: (val2 = 79)
                ->  Bitmap Index Scan on bitmaptest_val0 
(cost=0.00..5202.34 rows=956668 width=0) (actual time=2718.016..2718.
016 rows=999883 loops=1)
                      Index Cond: (val0 = 1)
                ->  Bitmap Index Scan on bitmaptest_val1 
(cost=0.00..6071.34 rows=1116668 width=0) (actual time=2852.007..2852 
.007 rows=999790 loops=1)
                      Index Cond: (val1 = 4)
  Total runtime: 17958.290 ms
(11 rows)


Postgres 8.1.1:

 
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
------------------------
  Aggregate  (cost=15744.90..15744.91 rows=1 width=0) (actual 
time=8810.296..8810.301 rows=1 loops=1)
    ->  Bitmap Heap Scan on bitmaptest  (cost=12119.68..15742.44 
rows=984 width=0) (actual time=2240.410..8801.211 rows=1019 lo
ops=1)
          Recheck Cond: ((val2 = 79) AND (val0 = 1) AND (val1 = 4))
          ->  BitmapAnd  (cost=12119.68..12119.68 rows=984 width=0) 
(actual time=2228.319..2228.319 rows=0 loops=1)
                ->  Bitmap Index Scan on bitmaptest_val2 
(cost=0.00..546.48 rows=95852 width=0) (actual time=213.143..213.143
rows=100251 loops=1)
                      Index Cond: (val2 = 79)
                ->  Bitmap Index Scan on bitmaptest_val0 
(cost=0.00..5293.02 rows=930004 width=0) (actual time=979.228..979.22
8 rows=999883 loops=1)
                      Index Cond: (val0 = 1)
                ->  Bitmap Index Scan on bitmaptest_val1 
(cost=0.00..6279.69 rows=1103339 width=0) (actual time=946.390..946.3
90 rows=999790 loops=1)
                      Index Cond: (val1 = 4)
  Total runtime: 8813.490 ms
(11 rows)




More information about the Bizgres-general mailing list