[Bizgres-general] A bitmap index access method is available in bizgres CVS tip

Simon Riggs simon at 2ndquadrant.com
Wed Dec 21 20:10:42 GMT 2005


On Tue, 2005-12-20 at 22:31 -0800, Jie Zhang wrote:
> 
> On 12/18/05 10:12 PM, "Mark Kirkwood" <markir at paradise.net.nz> wrote:
> 
> > Mark Kirkwood wrote:
> > 
> > Looking at the original setup again (the 3 indexes), a quick display of
> > the pagesizes of the various relations shows:
> > 
> > Bizgres current:
> > bitmap=# select relname,relpages from pg_class
> >           where relname like 'bitmaptest%'
> >           or relname like 'bm_internal_t2200bitmaptest%';
> >                relname               | relpages
> > ------------------------------------+----------
> >   bitmaptest                         |    93458
> >   bitmaptest_id                      |    21899
> >   bitmaptest_val0                    |    19363
> >   bitmaptest_val1                    |    19360
> >   bitmaptest_val2                    |    70125
> >   bm_internal_t2200bitmaptest_val0_0 |        1
> >   bm_internal_t2200bitmaptest_val1_0 |        1
> >   bm_internal_t2200bitmaptest_val2_0 |        1
> > (8 rows)
> > 
> > 
> > Postgres 8.1:
> > bitmap=# select relname,relpages from pg_class
> >           where relname like 'bitmaptest%'
> > bitmap-# ;
> >       relname                        | relpages
> > ------------------------------------+----------
> >   bitmaptest                         |    88496
> >   bitmaptest_id                      |    21899
> >   bitmaptest_val0                    |    21899
> >   bitmaptest_val1                    |    21899
> >   bitmaptest_val2                    |    21899
> > (5 rows)
> > 
> > 
> > That (bitmapped) bitmaptest_val2 is relatively big at ~600Mb, so I'm
> > wondering if the 100 distinct values is just at the point where the
> > bitmap is getting inflated. Thoughts?
> > 
> 
> For each distinct value of val2, the number of bitmap pages can be roughly
> calculated as follows.
> 
> There are total 10,000,000 uncompressed bits for each value. Consider the
> word size is 32, then there are 10,000,000/32 = 312,500 words. Since the
> values are evenly distributed in the table, then every 4 words(120 bits)
> will be compressed into about 2 words, because bit 1 will happen in every
> 100 bits. So there are 312,500/2 = 156,250 words in a compressed bit vector.
> Currently, each bitmap page can store about 225 words. Therefore, the number
> of bitmap pages for a value can be 156,250/225=695. That is about
> 695*8192=5.693M. So for 100 distinct values, the bitmap index will take up
> about 569M, which matches what you get there, and this should be the worse
> case.

...adding on to this...

This is definitely a worse case situation because of the assumption of
uniform yet random distribution.

The size of the index varies not just according to the distinct values,
but also the distribution of those values.

Real data is seldom uniformly randomly distributed when there are more
than a few values. The most frequently occurring natural distribution is
Zipf. With that, as we add more distinct values the frequency of
occurrence of those additional values reduces and therefore the
compressibility of the additional bitmaps increases also. Think about
how big the bitmap for a value would be if only 1 row had that value.

Best Regards, Simon Riggs



More information about the Bizgres-general mailing list