[Bizgres-general] A bitmap index access method is available
in bizgres CVS tip
Jie Zhang
jzhang at greenplum.com
Thu Dec 22 06:11:37 GMT 2005
On 12/21/05 2:37 AM, "Luke Lonergan" <llonergan at greenplum.com> wrote:
> Jie,
>
> On 12/20/05 10:31 PM, "Jie Zhang" <jzhang at greenplum.com> wrote:
>
>> There are total 10,000,000 uncompressed bits for each value. Consider the
>> word size is 32, then there are 10,000,000/32 = 312,500 words. Since the
>> values are evenly distributed in the table, then every 4 words(120 bits)
>> will be compressed into about 2 words, because bit 1 will happen in every
>> 100 bits. So there are 312,500/2 = 156,250 words in a compressed bit vector.
>> Currently, each bitmap page can store about 225 words. Therefore, the number
>> of bitmap pages for a value can be 156,250/225=695. That is about
>> 695*8192=5.693M. So for 100 distinct values, the bitmap index will take up
>> about 569M, which matches what you get there, and this should be the worse
>> case.
>
> Yup - thx for the calculation.
>
> We should try the 8-bit words to increase the compression - though the
> access speed might suffer a bit as per the above:
>
> 8-bit words:
> 13 words = 104 bits
> Compressed size is 2 words
>
> The disadvantage is that our header word now only carries 4 references, so
> the header overhead would be 1 header per 4 words instead of the current 1
> per 16 - but it sounds like we'd gain big overall storage efficiency for
> these low cardinality cases.
This is definitely worth a try. I will get the code ready soon.
Best,
Jie
More information about the Bizgres-general
mailing list