[Bizgres-general] A bitmap index access method is available in bizgres CVS tip

Mark Kirkwood markir at paradise.net.nz
Thu Dec 22 00:52:48 GMT 2005


Simon Riggs wrote:

> 
> So the size on-disk depends upon the compressibility of the bit map. The
> compressibility depends upon the number of rows that have that value,
> and their distribution within the bitmap. The size is not linked to the
> size of the data being indexed, as it is with B-trees because each index
> row has the actual data value on it.
> 
> So I think the test case you gave was good, in that it shows up the
> least optimal case so we all understand it. But it also allows us to
> understand that on-disk index bloat isn't a problem for 90% of real
> world data distributions.
> 

Simon - Good points. My test is a little too far away from real world.

I made a modification to generate Zipfian frequencies - however this did 
not help that much (val2 index size decreased 600Mb -> 400Mb), I suspect 
due to the data distribution (I've uploaded the modified version of the 
test in case you guys want to try it out). So distribution looks to be a 
very significant factor.

Anyway, I think it's time for a different test - the TCP-H data that (I 
think) you guys used (see attachment for details):


Some interesting points to note:


I definitely see Bizgres doing bitmap index creation faster than 
Postgres 8.1.1 does the equivalent btree - for the *larger* indexes 
(Postgres 8.1.1 is quicker for the smaller ones, but that is not really 
significant).

Bizgres bitmaps are smaller than Postgres 8.1.1 btrees for the *larger* 
indexes - Bizgres total index pages at approx 78000 vs Postgres 8.1.1 at 
150000. Notice the nationkey index is bigger in Bizgres, however this is 
a very small index, so again, probably not significant here.

So for creation time and sizes I think I am seeing the same sort of 
thing that your own testing did.

For query performance however, Postgres 8.1.1 is the same speed or 
faster in all cases! Note that the filesystem was umount|mounted as well 
as the databases restarted before running the queries, to ensure 
retrieval from disk was tested. Also note that the count patch made 
little if any difference.


Finally to check that neither Postgres nor Bizgres where disadvantaged 
by unfortunate disk locations, I ran a number of 'SELECT count(*) FROM 
...' type queries. With the count patch applied, both Bizgres and 
Postgres take about the same time to perform these (8.1.1 is *slightly* 
faster, but that is expected).

Best wishes

Mark
-------------- next part --------------
Bitmap Indexes Using TCP-H Dataset
==================================

Mark Kirkwood 21-12-2005

Intro
-----

The dataset is two tables from the TCP-H dataset: customer, orders.
I am contrasting Bizgres current with Postgres 8.1.1.
In addition, a patched version of Bizgres was tested that included the count
optimization from Postgres 8.1, just in case count cpu time was significant.


Setup
-----

Compile dbgen with EOL_HANDLING defined. Generate customer and orders with 
scalefactor 10:

$ ./dbgen -s 10 -T c
$ ./dbgen -s 10 -T O

Create the tables from dss.ddl (comment out everything except customers and 
orders).
Create the following indexes (omit USING bitmap for 8.1.1):


CREATE INDEX customer_c_custkey ON customer (c_custkey);
CREATE INDEX customer_c_nationkey ON customer USING bitmap (c_nationkey);
CREATE INDEX customer_c_mktsegment ON customer USING bitmap (c_mktsegment);

CREATE INDEX orders_o_custkey ON orders (o_custkey);
CREATE INDEX orders_o_orderstatus ON orders USING bitmap (o_orderstatus);
CREATE INDEX orders_o_orderpriority ON orders USING bitmap (o_orderpriority);


Script the queries to run:


-- Customer.
SELECT count(distinct(c_name)) 
FROM customer customer_alias 
WHERE c_nationkey=5;

SELECT count(distinct(c_name)) 
FROM customer customer_alias 
WHERE c_nationkey>8;

SELECT count(distinct(c_name)) 
FROM customer customer_alias 
WHERE c_mktsegment IN ('BUILDING', 'FURNITURE', 'HOUSEHOLD');

SELECT count(distinct(c_name)) 
FROM customer customer_alias 
WHERE c_nationkey=4 
AND c_mktsegment='AUTOMOBILE';


-- Orders.
SELECT count(*) 
FROM orders orders_alias 
WHERE o_orderstatus='P';

SELECT count(distinct(o_orderkey)) 
FROM orders orders_alias 
WHERE o_orderstatus= 'F';

SELECT count(distinct(o_orderkey)) 
FROM orders orders_alias 
WHERE o_orderpriority IN ('1-URGENT', '4-NOT SPECIFIED','5-LOW');

SELECT count(distinct(o_orderkey)) 
FROM orders orders_alias 
WHERE o_orderpriority IN ('1-URGENT', '3-MEDIUM') AND o_orderstatus='P';


-- Customers and Orders.
SELECT count(*) 
FROM customer customer_alias, orders orders_alias 
WHERE o_orderpriority='2-HIGH' AND c_mktsegment='MACHINERY' 
AND c_custkey=o_custkey;

SELECT count(*) 
FROM customer customer_alias, orders orders_alias 
WHERE o_orderstatus='O' AND c_nationkey=8 AND c_custkey=o_custkey;

SELECT count(*) 
FROM customer customer_alias, orders orders_alias 
WHERE (o_orderstatus='F' OR o_orderstatus='P') 
AND (c_nationkey=5 OR c_nationkey=17) AND c_custkey=o_custkey; 

SELECT count(distinct(o_orderkey)) 
FROM orders orders_alias 
WHERE o_orderpriority ='1-URGENT' AND o_orderstatus='P';


Results
-------

Index creation time:

                                    Bizgres current  Postgres 8.1.1
                                                 (s)            (s)
customer_c_custkey                               20              24
customer_c_nationkey                             42              14
customer_c_mktsegment                            33              21
orders_o_custkey                                366             416
orders_o_orderstatus                            295             275
orders_o_orderpriority                          337             408


Index sizes:

                                    Bizgres current  Postgres 8.1.1
                                         (8K pages)      (8K pages)
customer_c_custkey                             3288            3288
customer_c_mktsegment                          2406            5779
customer_c_nationkey                           7682            3288
orders_o_custkey                              32847           32847
orders_o_orderpriority                        20878           65876
orders_o_orderstatus                          10700           41131
(7 rows)



Queries (started with clean cache):


                     Bizgres current  Bizgres patch  Postgres 8.1.1
                                 (s)            (s)             (s)
Customers
1                                  8              8               7
2                                 18             18              17
3                                 17             17              16
4                                  3              3               1

Orders
1                                 44             43              41
2                                 87             85              86
3                                109            106              98
4                                 61             63              55 

Customer Orders
1                                 85             83              50 
2                                 65             64              59
3                                 81             80              60
4                                 46             46              46 


More information about the Bizgres-general mailing list