Databases, by their own nature, become large beasts. This is a fact of life and whilst a lot of databases don’t encroach on “unusable” sizes, some do.
There is a practise called sharding that allows the database content to be split across servers.
If you were to take a very simple example, take a customer database and look at the main table, “customers” – this is going to be an ever growing table of customers details. If the business in question is very successful, the database may grow very quickly indeed. Historicaly you would have all the details in one table. Customer names, addresses, phone numbers, etc. This isn’t the most efficient way of storing a lot of data.
We have two methods of making it more efficient. Horizontal and Vertical partitioning. You may have heard of partitioning before in relation to disk drives, and its loosely based on the same principle. Putting data in different places to keep things tidy and more efficient.
- Horizontal Partitioning is known as sharding. This involves splitting the data and tables and putting some data in one and some in others. In our example, we could have all customers starting with the letters A-M in one table, and the N-Z customers in another. Now potentially the tables are halved in size. And then you just have to change your lookup service to distinguish which table to get the data from. You could split it even further, down to Country, or Post Code, etc.
- Vertical Partitioning is where you take the customers table, and split the fields out into separate tables. So you might have all the static details for the customer (stuff that doesn’t change very often, like name, address, email, etc) in one table, and data that changes often in another table. The static tables get cached and therefore become quicker, and the table size shrinks as it hasn’t got all the data in it. This is also sometimes called database normalisation.
That was just a small explanation about Database Sharding for you.
3 thought on “What is database sharding?”
Be caruful about “database normalisation”.
I’m not 100% sure here, but db normalisation is something slightly different – it’s not connected to the way you store the data (on physical device(s)).
Normalisation is saying how to store data correctly in relational database model (how to store without redundancies, what shoud primary and foreign keys look like etc).
Normal forms might (and might not) be also valid for objective databases (working without primitive types like int or date – http://en.wikipedia.org/wiki/Object_database). Such databases are really complicated to build while many things that are quite siple in relational databases are getting very complicated on objective databases (imagine large object used as a primary key of a row).
I didn’t mean that vertical partitioning was the same, I meant that splitting up the table and moving fields to other tables was also known as normalisation. They are loosely related.
Chris its its your turn to make a tea