Convert a MySQL database to UTF8

This is a “script” to convert your latin1 myisam database to utf8 innodb
Dump the database

mysqldump -u root -p database_to_convert > db.latin1.sql

Make a copy of the dump

cp db.latin1.sql db.utf8.sql

Change any default table charset to be utf8

sed -i 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/g' db.utf8.sql

Now, remove all field collations for Latin1

sed -i 's/COLLATE=latin1_general_ci//g' db.utf8.sql

Change any MyISAM engines to InnoDB

sed -i 's/ENGINE=MyISAM/ENGINE=InnoDB/g' db.utf8.sql

Re-import the database

mysql -u root -p database_to_convert < db.utf8.sql

And thats it.

What is database sharding?

Databases, by their own nature, become large beasts. This is a fact of life and whilst a lot of databases don’t encroach on “unusable” sizes, some do.

There is a practise called sharding that allows the database content to be split across servers.

If you were to take a very simple example, take a customer database and look at the main table, “customers” – this is going to be an ever growing table of customers details. If the business in question is very successful, the database may grow very quickly indeed. Historicaly you would have all the details in one table. Customer names, addresses, phone numbers, etc. This isn’t the most efficient way of storing a lot of data.

We have two methods of making it more efficient. Horizontal and Vertical partitioning. You may have heard of partitioning before in relation to disk drives, and its loosely based on the same principle. Putting data in different places to keep things tidy and more efficient.

  1. Horizontal Partitioning is known as sharding. This involves splitting the data and tables and putting some data in one and some in others. In our example, we could have all customers starting with the letters A-M in one table, and the N-Z customers in another. Now potentially the tables are halved in size. And then you just have to change your lookup service to distinguish which table to get the data from. You could split it even further, down to Country, or Post Code, etc.
  2. Vertical Partitioning is where you take the customers table, and split the fields out into separate tables. So you might have all the static details for the customer (stuff that doesn’t change very often, like name, address, email, etc) in one table, and data that changes often in another table. The static tables get cached and therefore become quicker, and the table size shrinks as it hasn’t got all the data in it. This is also sometimes called database normalisation.

That was just a small explanation about Database Sharding for you.