Thursday, December 11, 2014

VoltDB first steps - schema

The first, and most critical, step for working with VoltDB is figuring out what table(s) will be partitioned, and how. So, the schema...

The case I'm looking at is for a hypothetical ticket booking agency, for very large events (rock bands playing large stadiums, etc.). There's three basic tables:
  1. seat_info
  2. booking_info
  3. booking_seats
seat_info is a list of all of the seats, and whatever metadata is required.
booking_info is a list of the bookings, so far, and whatever metadata is required.
booking_seats relates seats to bookings, and events.

booking_seats is the table that's interesting; it will be the largest, and hottest, by far. The schema elements for booking_seats is below:

create table booking_seats (
  booking_id varchar(10) not null,
  seat_number smallint not null,
  seat_row int not null,
  event_id int not null,
  constraint booking_seats_unique_hash unique (seat_number, seat_row, event_id)
);

partition table booking_seats on column seat_row;

create index event_booking_hash_idx on booking_seats (event_id);


booking_id is a foreign key for booking_info.
seat_number and seat_row form a composite key into the seat_info table.
event_id will be a foreign key relationship "in the future".

seat_row is used for partitioning the booking_seats table.

The VoltDB user manual has an example of a flight booking schema that's been partitioned by flight_id, analogous to event_id in my example.  The use case here is different, and so the partitioning scheme needs to be different.

  • event_id would cause all of the traffic for the ticket booking system would be directed to a single partition when tickets for that event were put on sale - which sort of defeats the purpose of partitioning the data in the first place.
  • booking_id could be used for partitioning, but as it's not a part of the unique constraint, the constraint would be expensive to enforce.
  • seat_number isn't a good choice for partitioning because buying multiple tickets is normal, and each transaction should be kept within a single partition, if possible.
  • seat_row is my choice for partitioning, because most bookings tend to be in the same row, and because it's a part of the unique constraint that prevents seats being double-booked.

In theory booking_id, or even event_id, may make a sound choice for partitioning. For this example I really would like to have aggressive partitioning of the data to see what breaks stress VoltDB's capabilities.

This is a "not bad" schema; but the query for finding available seats is a bit ugly. I added the index on event_id, to optimise the query for available seats for a given event, but testing is really essential to see if it helps or hinders performance overall.

Sunday, December 07, 2014

Explorations of VoltDB

I've started on a little project to see how VoltDB lines up against its promises.

The Unique Selling Point (sort of) of VoltDB is that you can horiztonally scale an OLTP database, in a shared-nothing environment. That's quite a trick.

Horizontally-scalable OLAP systems are well-known, even passe; but getting an OLTP system that scales elegantly has been hard/impossible for a reasonable cost. Could you scale out an Oracle RAC? Sure, if you have a few million bucks for licences and hardware. Can you scale out MySQL? Sure, if you have an army of programmers willing and able to implement sharding in their applications.

The idea is that VoltDB takes the headache of managing sharding and scalable storage, from the children developers, and hands it to the crazy people DBAs.

VoltDB has a base price of Free, which I like, so I decided to give it a go. The idea is to come up with a little benchmark problem, to stress its ACIDity. My example case is for a ticketing agency, such as those types who manage major events like rock bands that will sell-out a 100,000 person stadium. At this point, I'm just mocking up a little data, and a few queries. I hope I'll get around to building a little erlvolt app to go with it.

Getting up and running was a breeze. The documentation and build is excellent in this respect, I can't fault it. I took the hardest possible route and compiled the code from scratch (on Ubuntu) without reading the documentation (which I've since read) - and it went brilliantly well, with an idiot-resistant build that coaxed me towards the correct answer when I got things wrong (generally missing dependencies which are "ant" and "g++").

For sane people, you can just register on their website, download a binary, and install it. There's an (oh-so-cool) docker image available, but I got bored of playing with that in less time than it took to download, so install it, it's easy.

I've found that by being badly behaved, I can crash the VoltDB server (and/or my browser) on my ancient little laptop. This worries me (only) a little. I have been pretty rough on the DB, throwing all kinds of broken rubbish at it, but a DB really does need to be bullet-proof. In terms of running queries, procedures, etc. voltdb just fine, and does what it says on the tin.

If anyone from VoltDB reads this, please address bug ENG-2526 - it forces developers to build misleading interfaces, and that's a really bad idea.

VoltDB looks promising, so far, and has all kinds of interesting features. I'd like to see VoltDB extend with more data types, and functions, and maybe referential integrity enforcement... and a pot of gold, and a pony...


Friday, August 08, 2014

So.... Google made a mess of your AdWords profile?

Google makes a mess of accounts, pretty regularly. With a reliability that comes with extensive automation, Google also ignores any and all appeals for intervention to fix busted accounts.

My problem: My Google account can do all kinds of stuff on Google, except manage the ads on my blog. I had to create a new account, exclusively to manage ads on my account.

Salvation came in the form of fiddling about in the AdWords/AdSense console. I poked about here, on the basis of "follow the money". While publishers provide content, which is nice; advertisers provide cash money; which is life itself, for Google. In a not-at-all-surprising twist, the console for placing ads (rather than hosting) has heaps of functions, is beautiful, and actually works! One of the AdWords functions is to allow multiple users for a given AdWords account, and additional users can be administrators! Woo!

To re-enable my account with AdSense/AdWords, I followed the following steps.:

  1. If you haven't already, go make a new Google account.
  2. If you haven't already, go make a blog. It may need a published post, but the content doesn't matter.
  3. Login to http://adwords.google.com/ using your new account.
  4. Go to the settings menu (under that little cog thing in the top-right corner of the screen).
  5. Go to the "Access and Authorization" menu (listed down the left side of the screen).
  6. Go to the "User Management" menu (listed down the left side of the screen).
  7. Add the Email address for your broken GMail account. DO NOT check the "Administrator" checkbox, yet.
  8. In your GMail account, open the email, and accept the invitation.
  9. Back in the "User Management" menu (for the new Google account), you can now tick the "Administrator" checkbox new to your older, previously broken, Google account.
You'll need to have two browsers (not just windows, but separate browsers, like IE and Chrome) to make this easier, one for each Google account. You can do this with one browser, but there'll be a crazy amount of logout/login processes, and that gets boring, and can cause weird problems.

I hope this works for you. After this, I now have One Google account managing all of my blog stuff, including AdWords/AdSense; which is nice.

Wednesday, August 06, 2014

Adventures in Ruby and Capistrano

My my current project has seen a revisit of an old role - automating the deployment of distributed systems. Owing to the vagaries of consultancy, I found myself the developer and maintainer of a Ruby-themed deployment environment designed by someone else. I was thrown in the deep end, which is ok - the water was warm, and the current, gentle.

The (non-Ruby) components:
GitLab - centralised git repository management.
Jenkins - schedules, catalogues, and organises builds, and test runs.
Puppet - setup infrastructure and install underlying software. I'm aware that Puppet is built using Ruby, but as a user, this is well-hidden.

The Ruby bits:
rvm - The Ruby Version Manager. Installs and maintains the Ruby environment.
gem - a packaged up Ruby library. Analogous to a lightweight rpm, or (more of a stretch) msi file.
geminabox - server and caching proxy for gems built, and used.
bundle - a tool for managing gems.
rake - make in Ruby, can use gems.
Capistrano - a set of tools to do stuff (mostly rake) on other machines (via ssh).

Thursday, July 10, 2014

Light touch configuration

This is a great little blog post on how to automatically fill out templates for system configuration. What it lacks in 'completeness', it makes up for in flexibility and simplicity. Sadly, it's Ruby-based, but that's ok on my current project.

http://findingscience.com/linux/sysadmin/ruby/2010/10/27/config-template-class.html


Tuesday, June 03, 2014

Oracle JDK on Linux, done right!

http://d.stavrovski.net/blog/post/how-to-install-and-setup-oracle-java-jdk-in-centos-6

Wooo! There's a way forward.