Databases 101: How to create a schema that doesn't suck!

Discussion in 'Resources' started by Geoff Winans, Jan 5, 2011.

Thread Status:
Not open for further replies.
  1. Offline

    Geoff Winans

    In almost every database engine, indexes offer a tremendous performance boost. In fact, I can't think of a single database that doesn't gain from having an index.

    What is an index?

    The most succinct analogy I've ever heard for Index is the following:

    An INDEX is the equivalent of a Library Call Number.

    You can either walk into the library and roam through the various sections until you stumble upon the book you want or you can visit the card catalog and find the book via the Call Number. This is not intended to be an in-depth tutorial on how to normalize data or how to even structure your tables. This is intended to give you good reason for selecting Indexes.

    ===========================

    As far as questions relating to "why should I use InnoDB instead of MyISAM?"..

    Here's a bit of DBA-funny:

    MyISAM tables support "Crash With Corruption".
    InnoDB tables support "Crash with Recovery".

    You should never use the MyISAM engine unless you absolutely must have a fulltext index. InnoDB offers damn-near the same performance on selects without read/write locks causing contention and it's a transaction-safe engine. What you put into it, you're getting back out of it even if the MySQL Daemon eats dirt.

    Below, I will detail a fairly generic table with 1 million rows of (fake) data.

    ===========================

    Code:
    CREATE TABLE `test_table`
    (
        t1 varchar(32),
        t2 INT
    ) ENGINE=InnoDB;
    
    Now, we need to fill this table with some random data. I did it with the following PHP Script.

    Code:
    #!/usr/bin/php
    <?php
    for($i = 0; $i < 1000000; $i++):
            echo "insert into `test`.`test_table` values ('user_{$i}', $i);\n";
    endfor;
    ?>
    
    I actually ran the script by doing this:

    Code:
    --([email protected])-(/root)--
    --(10:53PM:Jan 05, 2011:$)-- sh test.php | mysql -u${user} -p${pass} -S /var/lib/mysql/mysql.sock test
    
    This created one million rows of data for this test. Now, lets see how "fast" a basic select is along with an EXPLAIN against that query.

    Code:
    --([email protected])-(/root)--
    --(10:53PM:Jan 05, 2011:$)-- mysql -u$user -p$pass -S /var/lib/mysql.sock test -e "SELECT * FROM test_table WHERE t1 = 'user_12832'"
    
    +----------+------+
    | t1       | t2   |
    +----------+------+
    | user_982 |  982 |
    +----------+------+
    1 row in set (0.41 sec)
    
    0.41 seconds? Hah, that's fast!

    WRONG. Lets take a look at an EXPLAIN of that query.

    Code:
    mysql> explain select * from test_table where t1 = 'user_987232';
    +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
    | id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
    +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
    |  1 | SIMPLE      | test_table | ALL  | NULL          | NULL | NULL    | NULL | 1000000 | Using where |
    +----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
    
    Lets look at the good parts:
    1.) We're using a WHERE clause!

    Lets look at the bad parts.

    1.) We are not using an index (one doesn't exist!)
    2.) We're doing a full table scan with each query. This is bad. Every time a query hits this table for info, it has to spin through 1 million rows of data to find 1 entry.

    We know we aren't using an INDEX because possible_keys and keys is null. So, how do we fix this? It's simple.

    Code:
    ALTER TABLE `test_table` ADD INDEX(`t1`);
    
    Now, lets look at an EXPLAIN on that same query.

    Code:
    explain select * from test_table where t1 = 'user_987232';
    +----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
    | id | select_type | table      | type | possible_keys | key  | key_len | ref   | rows | Extra       |
    +----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
    |  1 | SIMPLE      | test_table | ref  | t1            | t1   | 35      | const |    1 | Using where |
    +----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
    
    HUGE difference. We're examining 999,999 less rows of data. How long does the query take now?

    Code:
    select * from test_table where t1 = 'user_9832';
    +-----------+------+
    | t1        | t2   |
    +-----------+------+
    | user_9832 | 9832 |
    +-----------+------+
    1 row in set (0.00 sec)
    
    So, what does this all come down to?

    INDEXes will make or break your application in regards to performance.

    If you're going to use a WHERE clause against the field, INDEX IT!


    Book Recommendations:

    SQL Antipatterns: Avoiding the Pitfalls of Database Programming


    MySQL 5.0 Certification Study Guide
    - This is actually very gentle and very good introduction to MySQL.

    High Performance MySQL
    - A fairly advanced book dealing with ... performance! Must have for any (aspiring) DBA.

    MySQL Internals
    - A very, very difficult read unless you're also very interested in how the guts of MySQL work. This is a must for high-level DBAs.

    Any PostgreSQL book. Why? You can still learn things from other DB engines.
     
    obnoxint, root, Lamp and 3 others like this.
  2. Offline

    EvilSeph Retired Staff

    Very nice writeup. Plugin developers should read this and heed his advice when using databases. :)
     
  3. Offline

    Atroxide

  4. Offline

    Killie01

    where do you do this
    is it a program/is this an already made db
    is this java (

    haha lol: No experience with it :D
     
  5. Offline

    lego6245

    All you've done for me is serve to confuse. If this is supposed to be a 101 tutorial, then it fails in that regard.
    I know a little bit about MySQL, though not terms like "schema" and "index", and it's a little difficult to infer what your actions are doing. What does setting an index do? What if I wanted to search for something related to t2 instead of t1? Would I still have the time savings? How does setting an index allow you to avoid searching through 999,999 rows of data?

    Give a man a fish and he'll eat for a day. Teach a man to fish and he'll eat for a lifetime.
    --- merged: Jan 6, 2011 1:28 PM ---
    This is done in PHP, and MySQL, using something like PHPMyAdmin to run commands on the database.
     
  6. Offline

    Killie01

    a ok
    also a sqlite database is easy
    just like a : acces 2007 database?
    --- merged: Jan 6, 2011 1:32 PM ---
    nvm i know it can bd db3 files
     
  7. Offline

    Geoff Winans

    "Schema" is little more than the description of the Table.

    aka;

    Code:
    CREATE TABLE test_table
    (
        t1 varchar(32),
        t2 int
    );
    
    The above is a schema. Also referred to as "DDL" (Data Definition Language).

    The easiest way I've seen Indexes explained is this:

    When you walk into a library with a certain book in mind but no idea where to look for it, you have two options:

    Roam the library ("Scan the entire table") or look at the Index that most libraries have and find the exact location (Floor, shelf, Call Number).

    It is inherently more efficient to use the index. I'm not going to spend the time going into the technical details of Indexes since that alone is quite a rough read. If you really want to know - look it up.
    --- merged: Jan 6, 2011 2:18 PM ---
    The only real issue I have with this post is his recommendation to use Views. Views are slow and can be quite unreliable. They also fall WAY outside the scope of plugins that people are going to develop for this game.

    If you would really like to geek out on MySQL, read blogs like:
    http://mysqlperformanceblog.com (Run by the Percona guys)
    http://dom.as/ (Run by one of the lead MySQL guys @ Facebook)
     
  8. Offline

    matejdro

    Thank you! It looks like I'm going to learn something even before bukkit is out so i can start working on my plugin.
     
  9. Offline

    root

    Good tutorial, but I think you should get a bit more up about the basic stuff that plugin authors need to know. I mean, they really should start elsewhere but there's a few important intermediate-level concepts like database relations and normalisation, using standard procedures whenever possible (keep it simple, stupid rule!) and the caveats/reasons for choosing each table type. Also, probably most importantly, sanitizing and structuring data before it's passed to the DB (though I'm pretty sure Java makes that easy, it'll probably be more important to plugins written in other languages).
     
  10. Offline

    AnonymousJohn

    This is a great topic (I sort-of had to infer for myself what the indexes do, but the library metaphor works pretty well), and I think I'll check it out for myself. I'll probably be incorporating this into my plugins, and that'll probably increase the speed of it by a lot. Thanks!
     
  11. Offline

    sunkid

    I think you have the best intentions with this, but you set yourself a very lofty goal with your title. What you have described so far is how to create an index for a table. This is far from a "schema that doesn't suck".

    That said, your book references are very good, although I think they may be a little overkill for this audience (simply judging by the questions you got). This may also be true for the topic itself...

    FWIW, a schema is actually the description of the tables and their relationships in a database and, while you can use the data description language (DDL) aspect of SQL to express what your schema looks like, you can also use UML or natural language to do so. The DDL and the schema are two different things. Schema optimization goes well beyond setting up a single index for a single table! For example, it entails normalization, use of numeric primary keys, foreign key indexing, and much more.

    Indexes are indeed a powerful way to speed up queries over one or more tables. Like you point out, they are much like the old card indexes used in libraries that make finding a specific book very easy by providing an alphabetized and condensed catalog of titles and authors for example. This makes searching the index very fast and once you find the listing for the book that you are looking for, it tells you where to find the actual book. In the database context, the index essentially stores the column you specified to be indexed in a format that is organized in a specific way together with the number(s) of the row(s) where that specific piece of information actually can be found.

    @Killie01: check out http://www.mysql.com/ for more on this open-source relational database management system. You'll find downloads for all major OSs that include both server as well as client software. Java and PHP (as well as most other major programming languages) provide client APIs for accessing mysql database servers (or any of the other free or commercial RDBMSs).
     
  12. Offline

    resba

    Protip: When PHP/MySQL application developing, PHPMyAdmin is your friend. Trust me.
     
  13. Offline

    Geoff Winans

    Protip: Learning how to use MySQL without a web-based interface is your friend. Get decent with it and phpMyAdmin will actually be a slowdown :p
     
    EvilSeph likes this.
  14. Offline

    resba

    Has been defeated. Geoff gained 9001 exp points
    *gets slapped for stupid internet meme refrence*

    Guess they don't call you the Bukkit DB Monkey for nothing :p
     
  15. Offline

    feverdream

    Are you saying that people will need to support db's for plugins? I'm seeing a lot I dont agree with.

    Because some hosts charge people money for such services.. and it adds complication and memory usage that otherwise would not hinder a server... and I really love flat files as they are so simple to modify, etc.
     
  16. Offline

    root

    It depends on whether you want to target the larger servers/more technical and serious server operators or the small-party home users. Personally, I think it's best to implement both methods so the end-user has a choice.
     
  17. Offline

    feverdream

    Exactly; and for the record the more technical ordedicated may just not have the option of using SQL.

    I run a dedicated linux box for my minecraft server; I also don't want to mess with adding services I don't otherwise use and possibly opening a security hole I dont know about from non-mc software bugs/etc.
     
  18. Offline

    croxis

    As I said in the other forum, SQLite is flatfile. You are thinking of MySQL, which is a database server application. SQL is Structured Query Language. It is a language, nothing more.

    Also, considering many, many web applications run on SQL-type databases and some are open source, I am sure security is quite tight.
     
  19. Offline

    Geoff Winans

    MySQL security can be fairly easily locked down with either firewall rules or using "skip-networking" in the config and only connect to the DB using the socket.

    Either that or bind-address=127.0.0.1

    In any case, SQLite will win the day, and if/when bukkit gets popular (WHEN!), hosts will allow sqlite at the least. They won't really have a choice if they want to keep customers.
     
  20. Offline

    feverdream

    I'm a dev, I know what SQL is.. thank you anyway, as no doubt you were just trying to help, and I appreciate that.

    Either way, I dont want to have to run a full database just to run Bukkit/MC, as that would mean installing things on my server I do not already have installed and do not otherwise need. It just reeks of bloat. And why complicate things? All you are storing is simple key->value pairs and usually such things are singleton objects they dont get read by once, so argument about speed are invalid.
     
  21. Offline

    croxis

    You do not need to install anything additional for SQLite other than a small library, which will probably come with bukkit anyway.
     
  22. Offline

    Blue_Flame

    Is it programmed in Java? Or something similar? Java and VB are the only languages I know.
     
  23. Offline

    Obsidian

    Developing, sure, but not administrating. Sometimes it's easier to read through the data in the table with PMA, if only because it uses a lot better visual separation (HTML tables and such).

    SQL is actually its own language and is used within other languages when working with a database.
     
  24. Offline

    Isabaellchen

    That is really nice, but how do i include the database drivers into my server?
    Where should i put the mysql-connector-java-bin.jar so my plugin can find the driver when i run the server.
     
  25. Offline

    phondeux

    I asked this same question in another thread; you put it in the same directory/level as the craftbukkit.jar. I haven't actually tried this yet but will tonight as my plugin is going to make extensive use of the database.
     
  26. Offline

    Joshua Burt

Thread Status:
Not open for further replies.

Share This Page