An interesting new player in the Database world
You are welcome to extract excerpt of this article, but do not copy it entirely on your blog/web site.
You are only allowed to use this post content in an printed article or a newsletter by clearly exposing the source URL of this article.
I just stumbled over a very interesting announcement just a couple of minutes ago.
I company named Zvents just released an implementation of Google's distributes database specification "BigTable".
You can see the papers written by 9 Google top engineers "Bigtable: A Distributed Storage System for Structured Data which is dating from 2006.
The Zvents implementation has been called HyperTable and is still declared "beta".
I won't enter into the details too much, I didn't knew anything about it yesterday, but I find this highly interesting. On a geeky level first, as each time I had to develop a really database intensive product, the question of the scalability and the replication comes to my mind.
I'm a pro postgresql, but I never had to implement a replication procedure to it. I did setup a MySQL cluster some years ago though. For postgresql, I've heard good echoes from Command prompt's Mammoth PostgreSQL + Replication implementation, but never gave it a go.
What I didn't liked in those case, is that's a "master - slave" setup.
You have a master server which dispatch queries to the registered slaves, and return the results from the slaves. It works ok when every servers are on the same network segment, and that a high bandwidth between them is available.
But what in the case that you need a decentralized system ?
Imagine that you want a application to be really scalable, and you plan to deploy it through multiples data-centers...
The whole "master - slave" scheme is an aberration.
What you would need, would be a grape of independent nodes, that can replicate their content without the burden of a centralized server who federate every operations.
That way, any node can work alone, balancing it's load, and every grapes would be able to work independently of the others.
This HyperTable implementation looks like a very promising implementation, and I will take some time in the next weeks to dig it.
I'm currently giving the last touches to a website screenshots service API (not quite fully ready now, but I'll blog about it soon), and the idea of having multiples instances working independently would be something perfect. They would be able to share the cached screenshots, without being tied to a unique database entry point. As I have already abstracted the database enough to port it from postgresql to mysql with a single switch, I'd very much like to be able to implement this in my service.
Oh boy, I like this...
Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to my feed and get articles like this delivered automatically each day to your feed reader.
Trackbacks & Pingbacks
[…] 1.An interesting new player in the Database world. 2.Years of Experience Does Matter. 3.Java SE 6 Update N(Formerly consumer JRE) Early Access. 4.JSR 666: Solving Java’s Problems. 5.First make it right. Then make it fast. 6.Reflection in Action. 7.GWT-Ext 2.0 released. […]
[…] new player in the database market - Bigtable Added on 02/12/2008 at 10:33AM new player in the database market - Bigtable […]
Comments
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Truly high-scale database applications are in a league of their own. We’ve been having trouble with distributed queries in SQL Server not being delegated to the foreign server to perform joins. Instead an entire table is copied into temporary storage and parsed using a table scan. We’ve been starting to do some innovating things to fight the issue … but that distracts you from writing the actual functionality you’re working on.
I’m very eager to hear the results of your testing.
Hypertable is still in Alpha release
It looks like they’re leveraging the Hadoop infrastructure for this project. It’s nice to see such large-scale open-source projects implementing systems that have proved successful in the industry.
I don’t think you’d be able to use this database system to replace PostgreSQL or MySQL, Thierry. I’m sure for some applications, it would work well but you wouldn’t be able to (AFAICT) join tables and perform complex queries because the model they expose is not relational. In fact, you’d even lose constraints on columns and referential integrity. I think Bigtable and HyperTable are attempts at solving a smaller, specialized set of problems in a more efficient way than a relational database.
It’s not exactly the idea I had in mind…
My idea was more to have several databases nodes (I would call them “grapes”) that where independent, and that would be able to replicate to another “grapes”.
A bit like a peer2peer sharing software works, aggregating several clients.
As I stated, I’ve designed a web page screenshoting service (waiting for a real server to be up before announcing it, I don’t want to hammer my virtual server), and I specifically thought about the picture storage here.
I imagined that having independant implementations that would do the capture, and store the picture in a distributed database would allow
A) To have independant nodes that can serve request
B) To allow the exchange of cached shots between several grapes, to lighten the resources.
I’m nowhere near this, but thinking about it is something that makes me wonder if anything like this exists today.
The more I read about it, and I try to visualize how to implement it, the more I see that’s is a very specialized area, and it would not work the way I would like, but this makes me wonder nonetheless…
It especially remembers me of PeerCast which would relay web-radios streaming through the gnutella peer2peer network protocol.
The idea was really good, and I liked it very much.
It was a bit of bitTorrent before it’s time, distributing radio bandwidth between it’s listener.