What is with you kids and DBs?

Discussion in 'Plugin Development' started by hash, Feb 8, 2011.

Thread Status:
Not open for further replies.
  1. Offline

    Borch

    Yeah so for example I'm running a vBulletin forum which's DB is 300MiB. It fits into RAM, so I should not use mysql to manage it. Instead I should come up with a neat flatfile format for all the shit that it stores, write a parser that loads it all up, represent everything in memory, implement all the algorithms that do the "cool stuff" like JOINs and GROUPs. And after 2 years of hard work I have my forum running again just using memory, no stupid database.
     
  2. Offline

    phaed

    The main premise of the OPs post is 100% dead on. Exceptions can be made of course (BigBrother) but lets not forget that close to zero other bukkit plugins have the scope which *require* an RDBMS. You can find edge case examples to disprove this yes, but dont forget that you are just setting up strawman arguments. It's use is overkill on almost all current plugins implementing one, as most are trading performance for convenience.
     
  3. Offline

    Plague

    What does this have to do with DB in bukkit plugins?
     
  4. Offline

    hash

    Somebody already brought up the point of web services based on languages like PHP which, being interpreted languages, run approximately 300 times slower than languages like java and C. We also discussed the point that PHP is subject to a number of other interesting constraints, namely that when run over apache the administrator is given almost zero control over multi-thread and/or multi-process execution. For the same reason, it is essentially impossible to actually -do- what I originally said about caching the data in memory, since PHP scripts tend to terminate after processing a single request. The conversation on this thread has already treated this entire subject quite completely: yes, there are other strictures in that situation that make databases somewhat tasty. (I considered editing my original post to include those caveats, because they are indeed valid and I should have thought of them to begin with, but I decided to assume people would read the entire thread if interested in the subject and I didn't feel the need to edit history to make myself look smarter. I am considering writing an article on this subject elsewhere on the interwebs and integrating some of this feedback, though.)

    And at hundreds of megs with an expectation of growth, your example is also already qualified as being in the scale where databases make sense regardless of language, because you're already in an area where those relational operators will quite probably make it possible for the database to load less than the entire dataset into memory.


    Oh, also, this in particular:
    It's interesting that you bring this up. I see all of those as wheels that shouldn't have to be reinvented. I have seen them reinvented, though, and many times, including by myself... this is a waste that should be stopped.

    At this point, I've collected a fairly large library of java code I use to take care of all of my serialization needs, and it uses some fairly advanced reflection techniques to translate any object to types representing generic maps and arrays, and through another independently configurable interface translate that data to either json or a more efficiently parsable binary length-delimited format. I'm beginning to think I need to write up a tutorial to its APIs and release it, because hearing that people are resorting to databases just because they're not willing to deal with the guts of parsing is really disappointing. (This, I guess, is an official shout-out to any developers out there who would be interested in joining me in testing and documenting this project and adding YAML, which seems to be more the defacto standard around here than JSON.)
     
  5. Offline

    Byteflux

    While I don't disagree with your topic, you're over-exaggerating how slow interpreted languages are in comparison to Java.

    Python (CPython) and Perl are probably the fastest among them, about 20 times slower than Java on average. Given that the native PHP and Ruby implementations are generally 3 to 4 times slower than Python and Perl, 300 times slower is an extreme overstatement.
     
  6. Offline

    hash

    Mmmm... well, I didn't intend to pull a number out of my ass; I do believe I recollect seeing graphs I trusted the source of that showed numbers in that ballpark (and this was back when PHP was my language of choice, so if anything I'd have been biased to under-report the difference), but it might take me a while to dig those reports back up. I could be off, though.

    I also suppose I ought to put a few more conditions on any statement like that: it depends immensely on what version of PHP you're using and what kind of precompilation/code-caching accelerating sort of stuff might be involved.

    60 to 80 times slower than the closer-to-metal languages databases tend to be written in is still way more than enough of a bias to make it impossible for someone writing in the interpreted language to be able to match the db's perfomance on even simple things like the hashmaps and sorting you'd use instead of a relational select, though. (More conditions: depends on how much your interpreted language passes those off to native code; I'm pretty sure PHP's sort() method just hands the work off to a native implementation of quicksort, for example.)
     
  7. Offline

    Byteflux

    I wouldn't be surprised if Java was 300 times or even a thousand times faster at specific operations, so the graph you saw may have been special case examples where Java is exponentially faster.

    RE PHP: Indeed. PHP 5.3 introduced a nearly 30% average performance increase to existing code.
     
  8. Offline

    \\slashies

    I'm lazy, but I'm not so lazy that I won't think.

    It seemed to me that the logical solution to data persistence was to have a nice little library to do it for me. I also knew what goals I had when writing for minecraft.

    What I came up with happened to be backed by a database. Reason being was that I decided that I wanted to store objects wholesale using Java's Serialize functions. This generates binary data and I didn't want to put it into flat files.

    I wrote a few hundred lines of code that holds a cache of the data, writes to the DB when I tell it to, loads from the DB when needed, and has 3 damn calls for regular use.

    3 calls that I can use anywhere I want to persist minecraft data. I also made the DB more of a key value store. Each "family" has it's own keyspace so that if you use a natural key (like a player's name) it won't conflict.

    Simple as pie, fast since the cache is a hashtree. It even uses a separate thread for the DB writer instead of waiting around for the disk and coalesces individual updates and inserts into transactions that are committed any time it can (by looking to see if the "pipe" is empty and then committing only then so that if the server is having a bad day for the disk we don't add overhead by committing every little thing). It even waits at the end of the java process for the disk committer to quiesce!

    The whole exercise wasn't based on a holy war or anything like that, I just didn't want to write a whole new getter / setter pair for every god damned thing I wanted to put to a disk. Still use flat files for config, but everything else ends up in the SQLite backed key-value store.

    The only improvements I could see were making it work with flat files and giving the option of storing non-obect data for more flexibility. This is on my github page if anyone is interested, but I'm not here to hawk that.

    My point is: no matter where you store data, shouldn't there be a standard that you can reuse or (joyful day) is standard?

    Lets not argue about this, lets make a solution that just works, works fast, is easy to use, and gives everyone's arguments consideration?
     
  9. Offline

    Borch

    Nothing.
    To be honest, I did not read the whole thread, just the first page and random posts from the others. I took the time now to read all your statements made later, and now you sound much more reasonable.
    But your OP really sounded like "everyone using a DB for less than a few GB of data is stupid" to me.
    Of course there is still no doubt that using a database for a list of warp locations is pointless.

    But even if those concurrency issues would not exist in my example, if the PHP processes would not terminate (let's say its a ROR fcgi app), and even if I knew there would only be at most 10 users browsing my forum, I'd still choose a DB engine. Just because its simple to use, well tested, probably more bug-free then a library that does all this in memory that one guy wrote and most people never heard of.
    The fact that it would be slower than keeping data in memory directly in your application is also not that important (even for a Minecraft plugin), as long as you do not have a couple of queries every second, or don't fire any crazy SELECT statements that process heaps of data and don't use indexes. Because otherwise you could also argue that using Java is nonsense, since you can write faster and less memory consuming code that does the same in C or assembler.

    To get back to bukkit and Minecraft, in my case, I'm writing a simple statistics plugin that currently just has two tables, one containing all players, with stats like online time and a permission bitmask, another one that I log player positions to (every 10 seconds). Now this could clearly be solved by using flatfiles and a logfile, which I did in the beginning, but I wanted to be able to review data in a simple manner, so I converted everything to using MySQL and wrote a simple php script where I can see stuff in a nice table in my browser and search through the movement data and filter by player, limit the time frame, or the area for which I want to see movements.
    Of course I could have parsed the flatfile-log in my php script every time I'm looking for something (which just sounds quite inefficient), or even created some interface in my plugin so that the php script can request the data from it (which sounded like more work than switching to mysql). I think especially if you already have a DBS running, it is not *that much* of on overkill to just use it and save some time.
     
  10. Offline

    Plague

    All the users responding to the OP, I presume.
     
  11. Offline

    \\slashies

    How many times per second does digging tick? I've seen plugins fire SQL on that. Terrible, I know. Server lags like it was put together by Dr. Doom and the Grinch and they wanted to ruin your day.

    At many points there is data that deserves to be held in memory for MC plugins. Surely, not all data and surely not all by the same means, but for a majority of the raw data (as opposed to configuration) for most plugins should be stored in memory but synced to disk. There are system out that that are horrible overkill for this, but so often people roll their own. It often works, but it also has a lot of subtle ways that errors sneak in or it over-complicates the whole plugin because of poor foresight.

    Some can do it, those are the really good plugins as I laud everyone who handles persistence elegantly. Too many don't. Lets demonstrate to them the best ways to do these things, DB or not.
     
  12. Offline

    ShoTro

    I also found it extremely funny that all these small subsystem plug-ins were using databases, "Oh, but it is for persistent data!", that is all fine and good, but JSON can handle that, or comma deliminated files, or any type of flat file. It is completely overkill loading a huge (relatively) database system onto your servers RAM and have it utilize a moderate amount of your CPU as well...

    But I have to agree with the OP, why the hell are people using YML and other somewhat ineffective, not to mention not very realistic, flat file types. I understand YML is easy to understand for *some* configuration purposes, but is a relatively unknown technology. It is a slightly better alternative to XML, but... as the OP said, and I support as a seasoned Enterprise level developer at my job, JSON. As it is just leagues above and beyond useful for these things and would accomplish so much more on the dynamic configuration level... and that is ignoring the fact it is a highly useful thing to learn for real development experience as well.

    I appreciate the understanding being put into databases on these forums, it is worth doing, but only once stable large systems are available. Take Stats, Achievements, and iConomy for larger servers, THOSE make sense... eventually. Right now on my private server I tested MySQL and SQLite on my machine, but... flat-files are just lower cost and basically easier to use even if they are "ahem" slower to load initially the trade-off is fairly substantial right now. Again, I think that is exactly what the OP and those who support the OP are saying. If people agreed to use something more flexible like JSON instead of YML (which could be flexible, but I haven't seen it used that way so far) then people could start using a single file or take all the files and load all the configurations into a single shared memory object drastically reducing the number of systems the server needs to manage... THEN, we can move onto a database what would function appropriately.

    Again, I am an outsider looking in. I work with data warehouses and huge rational database for a living, but don't see the need for them at this low level, for now. Bukkit is too young to really utilize things this way effectively especially if the plug-ins are not sufficiently large enough.

    BUT... and this is a big BUT, over time, I can see it being used and I don't think the community should simply ignore databases. Let those who use them use them, but for a plug-in to HAVE to support them... flat-files should be the requirement. Requiring databases sounds crazy to me, but is what you often see in the plug-ins forum.

    Excuse me if I am repeating what others have stated, from a professional stand-point I can't see how anyone could develop an effective database with what Bukkit has to offer right now, unless you need ways to query data dynamically.
     
  13. Offline

    Uristqwerty

    In general, if a plug-in has database support, it should be an option that is initially disabled, instead using a regular file format.

    Although there are some advantages to using a database (if the database is running in a separate process, if minecraft crashes the data is safe; easy cross-program data sharing, without having to reimplement the data file format for each one, and all of the synchronization systems that would be needed along with it), none of it is worth the overhead of adding a database to store what a simple hashtable and a class or two could do just as easily.

    Would you add a virtual machine running a specific version of windows 95, just to get a particular date format function that doesn't exist elsewhere? Almost certainly not! (And if you would, I would not use your plug-ins.)
     
  14. Offline

    ShoTro

    That might just be the single best point I have seen for DB support so far, but I just SVN my server and commit changes daily or after adding/removing plugins, or do development (overkill, yes, but fantastic, and has no bearing on performance). Although, couldn't there just be a simple flat-file process running side-by-side bukkit? (ponders/googles) This might be a decent solution to this whole argument. Lightweight x-path-like configuration management and data storage. There are several solutions to this floating around and don't take nearly as many resources.

    (Minutes later) From my understanding this is exactly what SQLite is, a flat-file service that utilizes SQL... (I retract my statements in terms of SQLite, but not MySQL)

    http://en.wikipedia.org/wiki/Flat_file_database

    Holy crap there is a lot of good options available... why the heck use MySQL?
     
  15. Offline

    Plague

    But that is the original point of this thread. SQL is not an easy thing to implement and run. Parsing being the smallest thing, but you have concurrency, locks, atomicity to think about in databases and all this produces bloaty and/or slow programs. Of course there are positive things about using it but not in a freaking plugin for bukkit that does nearly nothing.
     
  16. Offline

    Valrix

    The general point is that DBs should be used when they need to be and Flat files when not. If each user needs their own config that'll be small, a flat file with their name would work just fine. If you need to integrate it with PHP or another web language, a DB is needed. Also, if you have a plugin like BigBrother or the like a DB is also more beneficial to use. Instead of nitpicking every little thing, why not just simply agree that people just need to use their head and choose the best option for the end user? Now I'm usually one all for DBs, but I'm also sure to try and make my plugin as easy on the server as I can. Less code = faster code and less resources being spent each time it needs to run, that's why I refuse to add a few suggestions to Netstats because what they ask is crazy and would do too many DB calls. This argument is a perfect example of the phrase, "There's a time and place for everything."
     
  17. Offline

    ShoTro

    SQLite isn't a SQL database. Don't get confused, there are not threads therefore no synchronizations, therefore no DB style locks. Read up. It locks the DB because, that is what the system does while writing flat-files (which is what it does), but it does queue these processes, which is a fairly huge benefit. Again, I am not suggesting that it is the way to go, quite the opposite, but it isn't as bad an option if you "NEED" a DB to use. More so than MySQL in this discussion.

    @Valrix I concur.
     
  18. Offline

    hash

    There's an issue closely related to part of the database question that really gets my goat.

    People seem to be re-inventing YAML parsers.

    1. Why?! There are libraries for this already! Google "java yaml library" for the love of god!
    2. Most of these reinventions aren't full YAML parsers anyway.
    2a. This leads to people publishing strictures to their config file format that are inconsistent with other plugins, because what they're claiming is a YAML format quite simply isn't.
    3. People claiming that it's sooooooo haaaard to parse things unless they use a database are also failing at this whole reuse-existing-solutions thing, really really badly.



    Define "safe". I'm not sure people are thinking about concurrency control correctly; you're not at all the first person to say something like this that doesn't really hold up well to close scrutiny. The fundamental issue that folks are missing is this:

    What happens when minecraft crashes and has a different state committed to disk than your database does? For example, what if somebody made a portal and your database committed its coordinates, but then minecraft crashed before it saved the chunk with the obsidian ring or whatever in it? Your database bought you nothing but an illusion of safety.

    You -can't- solve that synchronization problem just by throwing a database at things! You actually need serious external coordination: you have to pause minecraft saves, pause all your plugin saves, make sure everything's committed to one, single snapshop of the universe at one EXACT single tick, and then back it all up, and then resume everything in the same order. And nobody's even come close to having an API to do that.

    I'll say again: having a database gives you atomicity of operations and some durability with regard to your own dataset. 1: this doesn't fix anything at all when you consider the entire system, because if your dataset gets out of sync with other datasets, you lose anyway; 2: you can do the same thing with simple filesystem operations, because believe it or not people smarter than you thought about this shit when they designed operating systems.


    Honestly, I thought we beat that one to death already. There are these things called "libraries". Seriously, has no one noticed json.org or yaml.org or just plain googled this stuff? There are like a hundred implementations of the json standard for more languages than I ever knew existed!


    Uh, been there, done that, it takes about 20 seconds.

    I have absolutely no idea why one would want to bother with that. You're not really buying anything at all in terms of stability; the utterly simple write-then-move approach gives... well, 100% of it that there is to get.


    That I could get behind. I think it would be terribly interesting if some folks got together to develop an interface so that you could switch between flat files and databases by using a different class, but still have the exact same sort of put(*) and get(*) methods. It would be a little rough around the edges, I imagine, since things like "foreign keys" don't translate very well, but it would be an interesting exercise.


    And amen to that. I haven't really said anything so far regarding how downright badly written some of the plugins using databases seem to be, but... it seems that it just doesn't occur to a lot of developers that hey, those SQL queries are executable! There are plugins all over the place that are doing absolutely no sanity-checking of their inputs at all, because apparently their authors are too naive to have heard of SQL injection attacks and they haven't really done their homework regarding the system they're using. Doing SQL right can actually be much more complex than just using a format like json.

    Oh, we've also yet to discuss the fact that SQL isn't actually that well standardized. I mean, yeah, there've been some standards published on it... but the stark reality of it is that MySQL supports a slightly different system than PostgreSQL supports a slightly different system than Oracle 11g supports a slightly different system than SQLite supports a slightly different system than... well, the list just goes on and on. I find this... odd to say the least for a system that people put in mission-critical situations.



    (As far as the yaml-vs-json discussion... well, it's off topic, but for what it's worth, I'm most indifferent to the two.
    Yeeeah, no offence, but I never said anything like that. The only thing I'll really rally on is that it can get a little ridiculous when people re-invent their own formats. Anything that's standardized is fine by me.

    I like json better because the standard seems vastly simpler to me, and because forbidding tabs is utterly insane. The subset of yaml that people actually use seems a little more human-friendly, sorta. XML I find somewhat distasteful in its redundancy, and also for its odd distinction between attributes and nested values (because the way I see things, at the end of the day you have arrays and you have maps, and xml has two different ways of doing maps that does nothing but complicate things), but whatever; it's a standard too.)
     
  19. Offline

    Plague

    Wikipedia: SQLite 2.0 replaced gdbm with a custom B-tree implementation, adding support for transactions




    Oh, hash? tl;dr
     
  20. Offline

    ShoTro

    There are API's available to to what is required, current thread discussions are moving to fix this very problem. Patience. Rome wasn't built in a day. But threads about object persistence have been around and offer suggestions... so yes people are working on this idea.

    Interfaced APIs that utilize constructed models (objects with privatized variables and getters/setters). Sure they are readily available concepts but first everyone needs to agree on how they want the interfaces to "map" (not the official term) the data in their calls, increasing support for different drivers would have to be taken into account and the type associations. I don't think anyone is disagreeing about this. See the persistence threads.

    Thanks for dissecting one line from my comments without taking the rest into account then supporting pretty much exactly what I said. Glad we agree.

    Yeah, like operating systems filesystems. Allows access to particular sections of the indexed files. Cool stuff. If this is in reference to the "locked" comments I am going off of what the official site said. If it allows concurrency that isn't what they are advertising.
     
  21. Offline

    hash

    Well, I didn't mean to be contrary and by and large we do, but I don't want to be construed as being anti-YAML. I'm not a huge fan of it either, but eh.


    Indeed.
     
  22. Offline

    ShoTro

    No big deal.

    Posted a bit to the Bukkit team discussions about persistence APIs and how to solve this whole, "everyone" wants different solutions thing. Seriously, if people want to use databases then let them, but I am suggesting ways to make it so that the Bukkit team doesn't have to write the support for them at all, let the community do that, but provide a single way to model that data, so everyone can use it how they like to.
     
  23. Offline

    Uristqwerty

    Any plugin that wants to use a database to store locations of gameworld objects is already an utter failure. However, if you are storing warp destinations, email-like messaging, logging of any sort, or data about a player, then you generally don't want it to be lost because minecraft crashed. For the warp, you would do a sanity check on server start that each warp leads to a location that hasn't been completely filled by blocks, magma, or has had a 100-block drop built under it, since those things can happen even without a crash to desynchronize the database and the map.

    Honestly, saying that a database being used to provide a safer location for data if minecraft crashes is worthless, at least with the argument you used, is a very narrow opinion based on a small subset of any possible situation, a subset that any good plugin would try to avoid anyway, since otherwise it would be yet another bad plugin.

    But still, a database is wasteful overkill for just about any plugin, so even including one as an option should be a choice that is made after hours of painstaking research and years of experience, and a complete understanding of everything that providing optional database support implies. Anything less, and you are just contributing to unnessecary lag and troubles for server owners. And probably introducing countless security flaws that can be exploited.
     
  24. Offline

    hash

    Agreed. And you're totally correct in your examples of messaging systems and the like not needing to be worried about absolute synchronization with the game world; I just holler about the other cases because I think it's good design practice for a developer to start with the assumption that they do need that level of synchronization until they have firmly convinced themselves otherwise, and a lot of the less thoughtful database proponents seem to have never considered it.

    Although in the case of a warp I don't think I'd do exactly that kind of checks because some of that behavior could well be intentional. I'm actually wondering now how much work it would be to design a portal plugin that stored nothing outside of world itself... what would the cost be to write a loop that scans all signposts in the world upon game startup and pick out the ones that say "portal" on the top line (and then look in the immediate vicinity for the obsidian frame and such, and then assume the second line is this portal's name and the third line is the destination name)? I haven't looked into the guts of bukkit coding enough to know how reasonable that is, but I feel like if you could do that just once and have it take a few seconds at game startup to build a list of them then you could have name-based portal system with zero external storage. Similarly, I feel like a lot of the plugins that do things like protect inventories could do a tiny bit of sign-based storage and pretty much completely get rid of the need for external storage. A developer could even solve the permissions question by always having the name of the placer on the fourth line, and forbid everyone from placing signs that look like plugin-data with anything other than that user's own name on that line.

    That in particular I totally agree with and I probably should have said outright from the very beginning, since the majority of times that a database seriously gets in my way during relatively normal server operations is right there. (Yeah, I'm counting MCEdit/WorldEdit as part of relatively normal operation; call me crazy if you wish.)
     
  25. Offline

    Bolerodan

    Wow I never realized SQLite would be heavily frowned upon. How much overhead does SQLite actually do? I wouldnt think it would be that much since its serverless compared to say mysql... and is just a fancy flat file wrapper with SQL commands (then again I dont know _that_ much about SQLite)

    Granted I haven't used SQLite really in depth so its interesting to know these things before deciding to use something to solve a problem. This is an interesting thread read. I've been meaning to look at JSON as a way for data storage instead. Its definitely very cool.
     
  26. Offline

    Plague

    At least you have to actually parse the SQL command, which you don't when using a flatfile. Then there is some code to make it more modular and manageable, which is all good abut adds some more overhead.
    Sure it's not that much, sqlite is pretty fast and all, but the overhead is still there. And some people (like me) just hate unnecessary overhead).
     
  27. Offline

    hash

    Yeah, it's not devastating overhead... but it's overhead. My main complaint is just that many people seem to think it's somehow magically faster than just using the filesystem directly, and any time you have to load all of your data anyway, that's just not physically possible. And I really do firmly believe that plain text is good simply because people can get at it easily. It's what the core of linux is built around for a reason.
     
  28. Offline

    TnT

    @hash
    Thanks for this thread - I've been saying the same as long as I've been on this forum, but not as clearly and elegantly as you've stated. DB's being the hammer and everything looks like nails statement is spot on.

    I'd also argue, that even with the BigBrother plugin (the only one I've seen that makes sense to use a DB), it could use a properly managed flatfile, with good data pruning, to handle the load instead of a DB.

    Also, people should note that for really fast access to the data, large companies don't use just SQL. They use other software such as memcached to handle the fast access to data. (Facebook is a good example of using memcached)
     
  29. Offline

    Metatoaster

    I'm gonna join in and thank the OP for this informative thread. It's good to read a serious discussion on the subject since internet hearsay can be a bit confusing at times.

    Regarding YAML, are various plugin developers really implementing their own parser with YAML subsets (Yuck!)? This wouldn't make sense since the SnakeYAML parser is included with bukkit, isn't it?
     
  30. Offline

    Valrix

    What's funny is that I completely re-wrote how one of my plugins works to get rid of DBs and it got such a horrible backlash that I'm now having to revert it or have everyone who uses it just drop it. Even though it took a week and the speed was better and nearly all of it was automated, since it seems like a number of users don't have the website on the same server as their minecraft server it wasn't useful for them so now I'm having to change it all back over and do all my updates over, but for a DB. Luckily I learned a few things though to at least speed things up a bit and reduce some overhead from the MySQL DB. That's all I can really do at this point.
     
Thread Status:
Not open for further replies.

Share This Page