[Bizgres-general] A bitmap index access method is available in bizgres CVS tip

Simon Riggs simon at 2ndquadrant.com
Wed Dec 21 22:44:22 GMT 2005


On Thu, 2005-12-22 at 10:57 +1300, Mark Kirkwood wrote:

> > Real data is seldom uniformly randomly distributed when there are more
> > than a few values. The most frequently occurring natural distribution is
> > Zipf. With that, as we add more distinct values the frequency of
> > occurrence of those additional values reduces and therefore the
> > compressibility of the additional bitmaps increases also.
> 
> 
> > Think about
> > how big the bitmap for a value would be if only 1 row had that value.
> > 
> 
> Err - for a conventional bitmap it's always the same, so matter how many 
> rows actually have the value (e.g. for 10e7 rows like the example):
> 
> bitmap for 1 value size = 10000000/8 ~ 1.1Mb

Key point: the bitmaps are compressed on disk...
In memory, you're right: all bitmaps are the same size, no matter how
many rows have that value.

So the size on-disk depends upon the compressibility of the bit map. The
compressibility depends upon the number of rows that have that value,
and their distribution within the bitmap. The size is not linked to the
size of the data being indexed, as it is with B-trees because each index
row has the actual data value on it.

So I think the test case you gave was good, in that it shows up the
least optimal case so we all understand it. But it also allows us to
understand that on-disk index bloat isn't a problem for 90% of real
world data distributions.

Best Regards, Simon Riggs




More information about the Bizgres-general mailing list