[Bizgres-general] A bitmap index access method is available in
bizgres CVS tip
Mark Kirkwood
markir at paradise.net.nz
Thu Dec 22 00:52:48 GMT 2005
Simon Riggs wrote:
>
> So the size on-disk depends upon the compressibility of the bit map. The
> compressibility depends upon the number of rows that have that value,
> and their distribution within the bitmap. The size is not linked to the
> size of the data being indexed, as it is with B-trees because each index
> row has the actual data value on it.
>
> So I think the test case you gave was good, in that it shows up the
> least optimal case so we all understand it. But it also allows us to
> understand that on-disk index bloat isn't a problem for 90% of real
> world data distributions.
>
Simon - Good points. My test is a little too far away from real world.
I made a modification to generate Zipfian frequencies - however this did
not help that much (val2 index size decreased 600Mb -> 400Mb), I suspect
due to the data distribution (I've uploaded the modified version of the
test in case you guys want to try it out). So distribution looks to be a
very significant factor.
Anyway, I think it's time for a different test - the TCP-H data that (I
think) you guys used (see attachment for details):
Some interesting points to note:
I definitely see Bizgres doing bitmap index creation faster than
Postgres 8.1.1 does the equivalent btree - for the *larger* indexes
(Postgres 8.1.1 is quicker for the smaller ones, but that is not really
significant).
Bizgres bitmaps are smaller than Postgres 8.1.1 btrees for the *larger*
indexes - Bizgres total index pages at approx 78000 vs Postgres 8.1.1 at
150000. Notice the nationkey index is bigger in Bizgres, however this is
a very small index, so again, probably not significant here.
So for creation time and sizes I think I am seeing the same sort of
thing that your own testing did.
For query performance however, Postgres 8.1.1 is the same speed or
faster in all cases! Note that the filesystem was umount|mounted as well
as the databases restarted before running the queries, to ensure
retrieval from disk was tested. Also note that the count patch made
little if any difference.
Finally to check that neither Postgres nor Bizgres where disadvantaged
by unfortunate disk locations, I ran a number of 'SELECT count(*) FROM
...' type queries. With the count patch applied, both Bizgres and
Postgres take about the same time to perform these (8.1.1 is *slightly*
faster, but that is expected).
Best wishes
Mark
-------------- next part --------------
Bitmap Indexes Using TCP-H Dataset
==================================
Mark Kirkwood 21-12-2005
Intro
-----
The dataset is two tables from the TCP-H dataset: customer, orders.
I am contrasting Bizgres current with Postgres 8.1.1.
In addition, a patched version of Bizgres was tested that included the count
optimization from Postgres 8.1, just in case count cpu time was significant.
Setup
-----
Compile dbgen with EOL_HANDLING defined. Generate customer and orders with
scalefactor 10:
$ ./dbgen -s 10 -T c
$ ./dbgen -s 10 -T O
Create the tables from dss.ddl (comment out everything except customers and
orders).
Create the following indexes (omit USING bitmap for 8.1.1):
CREATE INDEX customer_c_custkey ON customer (c_custkey);
CREATE INDEX customer_c_nationkey ON customer USING bitmap (c_nationkey);
CREATE INDEX customer_c_mktsegment ON customer USING bitmap (c_mktsegment);
CREATE INDEX orders_o_custkey ON orders (o_custkey);
CREATE INDEX orders_o_orderstatus ON orders USING bitmap (o_orderstatus);
CREATE INDEX orders_o_orderpriority ON orders USING bitmap (o_orderpriority);
Script the queries to run:
-- Customer.
SELECT count(distinct(c_name))
FROM customer customer_alias
WHERE c_nationkey=5;
SELECT count(distinct(c_name))
FROM customer customer_alias
WHERE c_nationkey>8;
SELECT count(distinct(c_name))
FROM customer customer_alias
WHERE c_mktsegment IN ('BUILDING', 'FURNITURE', 'HOUSEHOLD');
SELECT count(distinct(c_name))
FROM customer customer_alias
WHERE c_nationkey=4
AND c_mktsegment='AUTOMOBILE';
-- Orders.
SELECT count(*)
FROM orders orders_alias
WHERE o_orderstatus='P';
SELECT count(distinct(o_orderkey))
FROM orders orders_alias
WHERE o_orderstatus= 'F';
SELECT count(distinct(o_orderkey))
FROM orders orders_alias
WHERE o_orderpriority IN ('1-URGENT', '4-NOT SPECIFIED','5-LOW');
SELECT count(distinct(o_orderkey))
FROM orders orders_alias
WHERE o_orderpriority IN ('1-URGENT', '3-MEDIUM') AND o_orderstatus='P';
-- Customers and Orders.
SELECT count(*)
FROM customer customer_alias, orders orders_alias
WHERE o_orderpriority='2-HIGH' AND c_mktsegment='MACHINERY'
AND c_custkey=o_custkey;
SELECT count(*)
FROM customer customer_alias, orders orders_alias
WHERE o_orderstatus='O' AND c_nationkey=8 AND c_custkey=o_custkey;
SELECT count(*)
FROM customer customer_alias, orders orders_alias
WHERE (o_orderstatus='F' OR o_orderstatus='P')
AND (c_nationkey=5 OR c_nationkey=17) AND c_custkey=o_custkey;
SELECT count(distinct(o_orderkey))
FROM orders orders_alias
WHERE o_orderpriority ='1-URGENT' AND o_orderstatus='P';
Results
-------
Index creation time:
Bizgres current Postgres 8.1.1
(s) (s)
customer_c_custkey 20 24
customer_c_nationkey 42 14
customer_c_mktsegment 33 21
orders_o_custkey 366 416
orders_o_orderstatus 295 275
orders_o_orderpriority 337 408
Index sizes:
Bizgres current Postgres 8.1.1
(8K pages) (8K pages)
customer_c_custkey 3288 3288
customer_c_mktsegment 2406 5779
customer_c_nationkey 7682 3288
orders_o_custkey 32847 32847
orders_o_orderpriority 20878 65876
orders_o_orderstatus 10700 41131
(7 rows)
Queries (started with clean cache):
Bizgres current Bizgres patch Postgres 8.1.1
(s) (s) (s)
Customers
1 8 8 7
2 18 18 17
3 17 17 16
4 3 3 1
Orders
1 44 43 41
2 87 85 86
3 109 106 98
4 61 63 55
Customer Orders
1 85 83 50
2 65 64 59
3 81 80 60
4 46 46 46
More information about the Bizgres-general
mailing list