How can I optimize Java on my server?

Discussion in 'Bukkit Help' started by odielag, Feb 10, 2011.

Thread Status:
Not open for further replies.
  1. Offline

    odielag

    UPDATE: Check out this post for an excellent guide: http://forums.bukkit.org/threads/3967/page-2#post-184208


    After the server has been running for a few hours with 26+ people online the cpu usage goes 100% core usage (8 HT cores) and instead of slowly gaining in ram usage until all 16gb set to craftbukkit is used, it hovers around 4gb of usage out of 16gb set to the process.

    I have the 500mb world folder on a 3gb tmpfs (ramdrive) mount and I'm running mcmyadmin with the -server flag.

    I could upload a video... but people will move fluidly for 3 seconds, then it pauses for 3 seconds, then the person warps to where they were walking and it's smooth for 3 seconds..... the process repeats.

    When looking at cpu usage. It goes from 60% to 100% over and over again.

    I'm wondering how we can lower the cpu usage and use up more of the 16gb of ram to increase performance... The main thing I'm looking for is a smoothing of gameplay...

    There seem to be a lot of threads for the minecraft server process. Below is a copy from my server advertisement that gives some of the server's info from http://www.planetminecraft.com/serv...elistedbukkitantigriefdozen-plugins24gb-ddr3/
    --- merged: Feb 11, 2011 12:30 AM ---
    I found one thing that lowered performance significantly.... McMyAdmin.

    McMyAdmin was great when I was running a vanilla server, added commands and backups and banners... but it doesn't properly support bukkit yet. I hear that after bukkit is released officially McMyAdmin is supposed to support it fully with such functionality as adding addons from the web interface.

    So now I'm looking for a dynamic banner that at least shows how many people are playing on the server. Wish me luck.

    PS: With mcmyadmin the server would shudder every two seconds, now it's running smoothly so far.
     
  2. Offline

    Toasty

    Replace -XmsXXXXM with -Xincgc, and use the -XX:parallelGCThreads=X flag. The latter flag will set a definite number of cores for garbage collecting. By default, GC will use as many cores as it can grab. I believe for now bukkit and vanilla minecraft are bound to one CPU, though that could change in the future.

    I would set -XX:parallelGCThreads=X to 4 or 5.

    As for a banner, try minestatus.net with minequery. It's not a live indicator, but it updates frequently enough.
     
  3. Offline

    odielag

    yay for minequery!
     
  4. Offline

    TnT

    With that much RAM, just put your entire MC folder into a ramdisk, and use SQLite (or flatfiles).
    Other than that, try playing with the different RAM flags. -d64, remove the -server flag, and if you dare, give this command a try:
    Code:
    -XX:+AggressiveOpts
    (I would mention -Xincgc, but Toasty already stated that one).
    I've also seen people say Java v1.7x has better performance, so you can give that a try too.
     
  5. Offline

    GmK

    MCMA does not have any influence on your servers performance, I would not go around blaming it.

    The only thing you can cause harm with running it is, if you set the GC settings wrong in MCMAs config.
     
  6. Offline

    PhonicUK

    McMyAdmin cannot affect a servers performance as it doesn't manipulate the workings of the Minecraft server.

    Most performance issues when using MCMA are due to users (ab)using the javaopts setting or similar.

    If you want to see what McMyAdmin is doing, set loglevel=0 in your config file, and on server start it'll show you the complete list of command line arguments its using to start the server. Most people don't use the same arguments when they 'test' their server and then come to the wrong conclusion when things work differently.

     
    Daniel Heppner likes this.
  7. Offline

    odielag

    I have tried javaopts before, but for days I had just used the -server option in the McMyAdmin config file.
    There were strange things happening with mcmyadmin... if I stopped it in mcmyadmin webinterface or console it wouldn't "save-all" beforehand, so people missed things or got duplicates. I had to manually save-all in game for things to be ok.... Why does mcmyadmin have to save-all every 15 minutes? I have the world on ramdisk (tmpfs mount) so I'm fine with it interracting with that often...

    All my players witnessed 2 seconds of smooth and 2 seconds of still repeatedly on the server. The cpu usage would go from %70ish to %100+ and go up and down about every second or two... I would look at the stars and see them jitter all over the place back and forth... It got worse every hour the server was up.

    All I did, was got rid of mcmyadmin's start.sh (renamed it) and made a simple script to start craftbukkit inside a screen session (so a backup script would work)... and The problems Went Away Immediately... All of my players (20ish online during the day) notice a huge difference in performance. My server has been up since the last post (minus some plugin additions) since my last post here with NO star jitter. None. The server performance is like night and day. I don't even know why people have posted here about it not being mcmyadmin.

    For more info feel free to message or post here (though if you post here I don't check it at all practically).

    If you use a current bukkit with a high powered server, I suggest not using mcmyadmin at this time. This is coming from someone who has been on the mcmyadmin irc channel and used it since shortly after 1.2 was released.
     
  8. Offline

    Vanderburg

    I created my own server banner using PHP's GD library and McMyAdmin's API. It's fancy-shmancy!
     
  9. Offline

    TnT

    Can you try something else for us while you're at it? Can you try running CraftBukkit with the same commands that mcmyadmin uses? It might be the commands, or it might be something else in mcmyadmin. Hard to know without testing.

    I'd also like to know if it did the same thing running it through mcmyadmin without any plugins at all.
     
  10. Offline

    Phaedrus

    It definitely sounds like Garbage Collection pauses, but I don't think your problem is not using enough RAM. Rather, I think the problem is that your heapsize is extremely large, and you're using the incremental garbage collector (-xincgc) which is still going to take quite a while to traverse such a large heap, even with multiple threads. And if it can't make it through in the heap in time, and one of your generation heap sizes is not set right, it will call a full garbage collection, which will pause the world until it completes.

    Instead, try using the concurrent low pause collector. (-XX:+UseConcMarkSweepGC) It will use multiple threads to scan the heap as well, but it will do so concurrently with the application thread, and only pause the main app for extremely brief time windows. It will also use multiple threads for the young generation garbage collector, where as the incremental collector will not.

    Try this line:
    Code:
    java -server -Xmn4G -Xms8G -Xmx16G -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=5 -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:MaxGCPauseMillis=500 -XX:SurvivorRatio=16 -XX:TargetSurvivorRatio=90 -jar craftbukkit.jar
    
    I'll break it down.
    -server -Xmn4G -Xms8G -Xmx16G
    Specifies to run in server mode, sets 4gigs of ram for the young generation heap, 8gigs for the initial heap, and 16gigs for the max heap. You could set XMS to be equal to XMX right off the bat, but it may be unnecessarily large. However, if it's going to get grown that large anyway, it may be best to set it equal. You'll have to experiment.

    -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:parallelGCThreads=5 -XX:+CMSParallelRemarkEnabled
    Specifies to use the concurrent low pause collector, to use parallel threads for the new generation collection, and to use 5 worker threads to do the garbage collection. (quad core i7 with hyper threading should have 8 available threads, but you can experiement with different values, like 3). And finally, we specify to use parallel threads for the remark phase of garbage collection as well.

    -XX:+DisableExplicitGC -XX:MaxGCPauseMillis=500 -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90
    Specifies to disable the system.gc() call, which will cause a full garbage collection, which we don't want because it will pause the main app thread for the duration. We then tell the concurrent gc to aim for sub-500 millisecond collection pauses. Half a second max shouldn't be noticeable. It's when you get into the full second or greater pauses that you'll get stutters. The survivor ratio increases the amount of space in the survivor area for new objects to live in, granting them a longer lifetime before being tenured into the old generation. This may keep more short lived objects from being promoted into the old generation where garbage collection is more expensive. The target ratio increases the max occupancy of the survivor area from 50% to 90%, further increasing its usage before objects get promoted.

    I'm interested to know whether this helps you out or not. The JVM is a highly tunable environment, it has to be due to the nature of interpreted code abstraction in java. Garbage Collection isn't a one size fits all kind of thing, it has to be tuned to match the requirements of the application, and minecraft is still pretty new. It's going to take a lot of experimentation to get it tuned just right. BTW, -verbose:gc might help you identify whether garbage collection really is causing your spikes, and if so, which generation of collection is responsible.
     
    nala3, Pimpen104, andrewkm and 2 others like this.
  11. Offline

    brotherbillo

    Jason, thanks for that detailed explanation. I would like to try this out on my server to see if it enhances performance! Currently I am running an i7 980 with 24gb myself. Also, I am running MyMCAdmin and I just wanted to say that I have not experienced enough lag to make a post about it. Sometimes there is a second or two of block lag, but relogging seems to fix it for most of my players which leads me to believe its client side. And sure it lags during the forced saves (which are configurable, doesn't have to be every 15 minutes), but I think its better to have a little bit of lag for a few seconds and keep you world backed up then not do it at all.
     
  12. Offline

    TnT

    Will this work with the -Xincgc flag? He might as well start on that before switching to those other flags. It might not be a GC problem.
     
  13. Offline

    Phaedrus

    Yes -verbose:gc will work with any collector. It will spit out a message to the console each time a gc is performed.

    It looks like this:
    Code:
    [GC 1043004K->99525K(3087488K), 0.0221768 secs]
    [GC 1031621K->113636K(3087488K), 0.0247093 secs]
    [Full GC 1045732K->113804K(3087488K), 1.0221445 secs]
    
    The first two collections are minor, and the 3rd is a full collection.
    The first numbers should the size of live objects before and after collection. The number in brackets is the total size available. The final number is the time it took to do the collection.

    With the command line I posted above I NEVER see a full collection and minor collections are typically in the fraction of a second time range.

    That said, I still disagree with using -Xincgc because it is an incremental garbage collector rather than concurrent. For a very busy server with a heavy load the garbage collection pauses are going to introduce more noticeable lag. The concurrent collector allows you to better manage those pauses. That's what it's designed for.
    --- merged: Feb 11, 2011 7:38 PM ---
    You can try using these arguments as well, though i'm still experimenting with their usage.

    Code:
    -XX:+UseAdaptiveGCBoundary -XX:-UseGCOverheadLimit -Xnoclassgc -XX:UseSSE=3 -XX:PermSize=128m -XX:LargePageSizeInBytes=4m
    
    -XX:+UseAdaptiveGCBoundary The garbage collector is allowed to move the boundary between the tenured generation and the young generation as needed (within prescribed limits) to better achieve performance goals. This mechanism is off by default
    -XX:-UseGCOverheadLimit Use a policy that limits the proportion of the VM's time that is spent in GC before an OutOfMemory error is thrown.
    -Xnoclassgc Disables unloading of unreferenced classes during a full garbage collection. There's really no need to discard classes in minecraft, so might as well keep them loaded. minor savings at best.
    -XX:UseSSE=3 enables SSE instruction set. So far only anecdotal evidence of this improving things, and you'd think the JVM ergonomics would enable it by default if present on the CPU.
    -XX:permSize=128m increases the permanent heap size to 128 megs. default is 16 with a max of 64. The permanent heap is a portion of the total heap where permanent objects get stored.
    -XX:+UseLargePages The goal of large page support is to optimize processor Translation-Lookaside Buffers. A TLB miss can be costly as the processor must then read from the hierarchical page table, which may require multiple memory accesses. By using bigger page size, a single TLB entry can represent larger memory range. There will be less pressure on TLB and memory-intensive applications may have better performance.
     
    WolwX likes this.
  14. Offline

    Kainzo

    Server Specs:
    32gb ram
    Dual hexcore (2.6ghz)
    Ramdisk
    Net= 25/25
    Users: 70-100 active

    Hmm I'm actually really interested in learning more about the flags to set. I'll be referring to this post looking at what to add / remove.
    --- merged: Feb 11, 2011 10:19 PM ---
    Also what would the entire flag be for launch.sh ? I'm running this currently.

    Code:
    #!/bin/sh
    java -Xms12G -Xmx12G -jar bmod.jar
    but was running this
    Code:
    #!/bin/sh
    java -XX:+UseParallelOldGC -XX:ParallelGCThreads=3 -XX:SurvivorRatio=32 -Xms1G -Xmx12G -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=25 -jar bmod.jar
     
  15. Offline

    Phaedrus

    Try this and report back.

    Code:
    #!/bin/sh
    java -server -verbose:gc -Xmn4G -Xms12G -Xmx24G -XX:PermSize=128m -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=6 -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:MaxGCPauseMillis=500 -XX:SurvivorRatio=16 -XX:TargetSurvivorRatio=90 -XX:+UseAdaptiveGCBoundary -XX:-UseGCOverheadLimit -Xnoclassgc -XX:UseSSE=3 -XX:+UseLargePages -jar craftbukkit.jar
    
    You've got loads of RAM so I've allocated a max of 24 gigs with an initial size of 12 gigs, 4 of which for the new generation.
     
  16. Offline

    TnT

    I'm eager to see the differences in those commands, running the -verbose:gc displaying the output.
     
  17. Offline

    Phaedrus

    Me too. The jvm is a highly tunable construct. It just takes some experimentation. If I had a server with 32gigs of ram and 60 users I'd test it all myself, but I don't unfortunately. I've tuned other jvms through my work and I used java in computer science courses at university. So hopefully we can come together and see what works best for minecraft/bukkit.
     
  18. Offline

    Afforess

    I just switched from Sun Java 6 (x86) to Sun Java 7 (x64) (Preview b129), and my server-startup time went from about 2 seconds, to less than 1.

    It used to say:
    "Loading Spawn Area 12%"
    "Loading Spawn Area 34%"
    "Loading Spawn Area 56%"
    "Loading Spawn Area 78%"
    "Loading Spawn Area 87%"
    ...

    With Java 7 it went:

    "Loading Spawn Area 57%"
    "Loading Spawn Area 96%"
    ...

    I was pretty impressed.
     
    Guy_de_Siguro likes this.
  19. Offline

    Phaedrus

    Nice. Java has been very focused on performance improvements with the past few releases. 6 especially. It's traditionally been seen as a slow, lumbering, and inefficient language in exchange for ease of use and flexability, but now with efficiency improvements and modern hardware it's really coming into it's own.

    I'll have to check out java 7 to see what's new.
     
  20. Offline

    TnT

    I've heard the same thing from lots of people. Java 7 seems to really kick things up a notch.

    With reference to the GC flags, I haven't had any issues with lag/pauses using -Xincgc, but I don't run the server with more than 4G max heap size. I would say its a good command to use for smaller heap sizes, but one may want to use the other GC flags if you get above a certain server size. Would you agree?
     
  21. Offline

    Phaedrus

    Both are designed to be low pause collectors. Xingc is tuned more towards throughput then the concsweep. For a smaller server it probably won't make a big difference. Verbose:gc will tell you the duration of your collection pauses and whether you're hitting a lot of full collections.
     
  22. Offline

    TnT

    I have it enabled, GC usually in the .00x second range. Do you know of a way to redirect the verbose:gc to its own file? I don't like having it spam up my console, but I want to run it for a while to examine the output.
     
  23. Offline

    Phaedrus

    Sorry I don't think it can be redirected.
     
  24. Offline

    TnT

    Hmm, too bad. Its not collected in the server.log either.
     
  25. Offline

    Chojin

    Perhaps use (under Linux) script command to launch server and logs all output in a file.
    I am trying your parameters.
    I also have lag issues with +50 players (I run almost 80-90).
     
  26. Offline

    TnT

    Its fun to watch the GC go when you're doing a dynmap fullrender.

    Yes, I realize I just said its fun to watch a bunch of numbers scroll by. lol
     
    Daniel Heppner and Guy_de_Siguro like this.
  27. Offline

    Nathan C

    One thing I learned that helps.

    RamDISK...

    Also turn HT off......it is horrid for Minecraft. I have a server similar to your and am working on shutting HT off, but my host has to do it.

    EDIT: Wait what? You are getting the Minecraft server to utilize more than one core?
    My server only uses one core, as far as I know.
     
  28. Offline

    TnT

    OP stated he has a ramdisk (tmpfs drive, damn near the same thing).
     
  29. Offline

    Phaedrus

    MineCraft is single threaded but java is multithreaded. Garbage collection for instance can be improved by multiple threads.
     
  30. Offline

    odielag

    I took your advice because of your experience and one other person said that cpu usage was lowered from 70% to 20%... So I installed the same java 7, the same version as above and am now running CraftBukkit without problems.

    I was hoping the cpu usage would go down on the main virtual core and maybe be spread to more cores... but it doesn't seem like the cpu usage went down as drastically as I'd hoped it would.

    Question: Are there java options for Java 7 that will distribute the process among more of the virtual cores? That's what I'm wondering...

    PS: Since I have a virtual KVM to my unmanaged dedicated host I have gone into the bios before and disabled SMT. I was proud and advertised that I had changed the setting... then a person who seems to know more than me shared how SMT is more sophistocated than I thought. Until then I'd only known about Intel Turbo Boost (dynamic overclocks of up to 4ghz on a core based on load, and temperature)

    Here is a reply I got on reddit about SMT:

    Baughn 1 point 8 days ago* [-]To understand how SMT works, you have to realize that CPU cores are (VERY roughly) split into several sections: Units that fetch instructions from memory, decode them, and generally handle control program flow, and a variety of execution units - things that do arithmetic, boolean logic, whatever.
    Even before SMT, each core would have many of each of these; three or four instruction decoders, probably twenty-plus execution units of various kinds. This is because the CPU actually does instruction-level parallelism, carefully teasing apart program commands to figure out what has to be done in a certain order and what doesn't.
    When this works fine, great; all or most of the execution units can be kept busy, and things speed on, completing multiple instructions per cycle.
    When it can't.. if you have highly serial code where the next action always depends on the result of the last one (branch mispredictions, etc.. but never mind), the execution units will end up being idle for a while. This slows things down, and this is where SMT steps in: By keeping track of a second thread in a single core, the execution units can swap to executing that second thread in case of stalls, and keep higher performance overall. As I said, 40-70% increase; that's from measurements of, oh, tens of thousands of machines running this. I won't tell you where.
    What should not be happening is SMT slowing things down, ever, but there's one scenario where I can see that happening: If the OS kernel is unaware of SMT, it might try to schedule two threads on a single core while leaving the other ones unused.
    And yes, that does in fact happen.. in windows, XP and below. I'm pretty sure W7 fixes it, and Linux avoided the problem since the time of the P4.
    But you're not running server software on windows, right?
    P.S.: On a sidenote, the P4 was badly misdesigned; its implementation of SMT sucked to the point that it would slow things down even when the kernel was doing everything right.. which, on a single-core CPU, is basically trivial. Anything you read about SMT based on the P4 no longer applies.
    --- merged: Feb 13, 2011 8:06 PM ---
    ALSO, with java 6_23 and now java 7_129 I get spammed in the log about "Has system time changed"... Core usage is at %110ish but everyone says the server is running smoothly.
     
Thread Status:
Not open for further replies.

Share This Page