The case I'm looking at is for a hypothetical ticket booking agency, for very large events (rock bands playing large stadiums, etc.). There's three basic tables:
- seat_info
- booking_info
- booking_seats
booking_info is a list of the bookings, so far, and whatever metadata is required.
booking_seats relates seats to bookings, and events.
booking_seats is the table that's interesting; it will be the largest, and hottest, by far. The schema elements for booking_seats is below:
create table booking_seats (
booking_id varchar(10) not null,
seat_number smallint not null,
seat_row int not null,
event_id int not null,
constraint booking_seats_unique_hash unique (seat_number, seat_row, event_id)
);
partition table booking_seats on column seat_row;
create index event_booking_hash_idx on booking_seats (event_id);
booking_id is a foreign key for booking_info.
seat_number and seat_row form a composite key into the seat_info table.
event_id will be a foreign key relationship "in the future".
seat_row is used for partitioning the booking_seats table.
The VoltDB user manual has an example of a flight booking schema that's been partitioned by flight_id, analogous to event_id in my example. The use case here is different, and so the partitioning scheme needs to be different.
- event_id would cause all of the traffic for the ticket booking system would be directed to a single partition when tickets for that event were put on sale - which sort of defeats the purpose of partitioning the data in the first place.
- booking_id could be used for partitioning, but as it's not a part of the unique constraint, the constraint would be expensive to enforce.
- seat_number isn't a good choice for partitioning because buying multiple tickets is normal, and each transaction should be kept within a single partition, if possible.
- seat_row is my choice for partitioning, because most bookings tend to be in the same row, and because it's a part of the unique constraint that prevents seats being double-booked.
In theory booking_id, or even event_id, may make a sound choice for partitioning. For this example I really would like to have aggressive partitioning of the data to
This is a "not bad" schema; but the query for finding available seats is a bit ugly. I added the index on event_id, to optimise the query for available seats for a given event, but testing is really essential to see if it helps or hinders performance overall.