[Bizgres-general] Re: Mapping from bitmap array position to heap location (was : A bitmap index access method is available...)

Jie Zhang jzhang at greenplum.com
Mon Jan 2 19:06:40 GMT 2006


On 1/1/06 8:07 PM, "Mark Kirkwood" <markir at paradise.net.nz> wrote:

> Hi,
> 
> I've been reading the code - and am a little confused about how the
> mapping from a given bitmap bit-position to its corresponding location
> in the heap relation is being done.
> 
>  From the code (src/backend/access/bitmap/bitmapinsert.c) it seems that
> the mapping is:
> 
>    tidoffset = ((uint64)blocknum *
>                          (1<<(8*sizeof(OffsetNumber))))
>                          + ((uint64)offset);
> 
> with inverse (src/backend/access/bitmap/bitmapsearch.c):
> 
>    blocknum = tidoffset / (1<<(8*sizeof(OffsetNumber)));
>    offset = tidoffset % (1<<(8*sizeof(OffsetNumber)));
> 
> Now on my system 1<<(8*sizeof(OffsetNumber)) = 1<<16 = 65536.
> 
> Now ISTM that this makes for a very sparse mapping (i.e. a great many
> possible bit positions do not correspond to actual blocknum|offset
> combinations). I also thought (incorrectly as it transpires) that this
> could make the bitmap a lot bigger than it needs to be.

That's right. This does make the bitmap bigger than it needs to be. However,
by using this sparse mapping -- introducing gaps between two pages, we do
not need to store tids in the bitmap index, which significantly reduces the
overall size of the index, and increases the query performance.

> 
> 
> To experiment I patched these files to use a different mapping:
> 
>    tidoffset = ((uint64)blocknum * (MaxOffsetNumber + 1))
>                          + ((uint64)offset);
> 
> with inverse:
> 
>    blocknum = tidoffset / (MaxOffsetNumber + 1);
>    offset = tidoffset % (MaxOffsetNumber + 1);
> 
> Since I thought this would result in smaller bitmaps (as MaxOffsetNumber
> is 2048 for this system). Well, despite that fact that it worked ok, the
> resulting index sizes and query execution times were slightly *bigger*
> and *slower* respectively, so I'm guessing that the original mapping
> works better with compression (or perhaps |, & operations)... or am I
> missing something else (or even quite a lot else)?

Current implementation sets the word size to 32-bit, which compresses each
gap (either 65536 or 2048) into one word. So we should get the same index
size for using 65526 or 2048 as the gap between two pages. The reason you
get slightly bigger index with 2048, I guess, is that you use

>    tidoffset = ((uint64)blocknum * (MaxOffsetNumber + 1))
>                          + ((uint64)offset);

Instead  of

    tidoffset = ((uint64)blocknum * (MaxOffsetNumber))
                          + ((uint64)offset);

The former one adds one more "0" into the bitmap for each gap.

One interesting point is that you choose 2048 as max offset number. Is this
true for all cases? If so, we can set the word size to 8-bit to further
increase the bitmap compression rate.

Best,
Jie




More information about the Bizgres-general mailing list