Server stops functioning correctly when "Can't keep up" appears?

Discussion in 'Bukkit Help' started by Jdbye, Oct 21, 2011.

Thread Status:
Not open for further replies.
  1. Offline

    Jdbye

    It started happening on b1185 or b1240 (not sure which, but it never happened before MC 1.8), but I'm not sure what's causing it as it seemed to suddenly start happening. Might be an issue with my server, however what I don't get is that whenever that error starts appearing in the console, things stop working. Sign text stops loading, JSONAPI stops responding to queries properly (it logs the queries, but doesn't respond), LogBlock rollbacks stop working (they're queued but nothing happens), and sometimes chunks stop loading or mobs stop moving.

    What I'm basically wondering is whether anyone else has had this problem and know what causes it. It seems like some of Minecraft's and the plugins' threads are just dying based on how a lot of things simply stop working properly, however there are no errors in the console apart from "Can't keep up". Sometimes it starts happening an hour after a server restart, sometimes it takes a full day, but it consistently happens at least once per day. I usually have to restart the server every morning when I wake up or shortly thereafter.

    I'm posting my java parameters in case they're somehow related (though I doubt it as I've had those since 1.7.3, though I might have made one or two additions afterwards. Last modification was October 6th, 3 days after b1240 was released, so it might be related, however I didn't update to b1240 until 2 days after that). I believe what I added last was -XX:+UseBiasedLocking and -XX:+UseLargePages. However, UseLargePages causes a error about memory allocation on start, not sure if it's of any significance because the server still starts and runs properly at least for a while, so I kept it. I found most of the parameters in posts that explained what they did, and added only the ones I thought would be useful. Can't seem to find those posts again however.
    Code:
    java -server -Xincgc -Xmx6144M -Xms2048M -Xmn512M -XX:PermSize=128m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:+CMSIncrementalPacing -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+CMSParallelRemarkEnabled -XX:+AggressiveOpts -XX:MaxGCPauseMillis=50 -XX:-UseGCOverheadLimit -XX:+UseBiasedLocking -XX:+UseStringCache -XX:+UseCompressedStrings -XX:+OptimizeStringConcat -XX:+UseFastAccessorMethods -XX:+AggressiveOpts -XX:+UseLargePages -XX:UseSSE=3 -Djline.terminal=jline.UnsupportedTerminal -cp craftbukkit.jar org.bukkit.craftbukkit.Main
    Edit: It wasn't UseLargePages or UseBiasedLocking. Pretty sure it's not related to the java parameters.
     
  2. Offline

    Jdbye

    Bump. I'm getting kind of desperate to fix this, it's getting more and more frequent, and some plugins seem to be able to cause it to start happening instantly when they're used. My server is plenty powerful (Core i7 930 2.8ghz, (8 GB RAM) so it doesn't make sense that it happens, but what makes even less sense is that a lot of things such as LogBlock just don't work when it starts happening.

    It can literally happen within 2 minutes of restarting the server now. Basically makes LogBlock useless.
     
  3. Offline

    LilacTheEspeon

    Have you updated your plugins when you update your craftbukkit? Are some plugins malfunctioning?
     
  4. Offline

    Jdbye

    I keep all my plugins up to date, and as far as I know none of them has errored lately. However like I said some plugins stop working correctly when the server starts showing "Can't keep up" every 2 seconds, but there are no errors in the console.
    For the record when it happens, the server ticks per second is always around 8. Otherwise it usually always stays at 20 TPS no problem.

    LogBlock simply stops processing the rollback queue, rollbacks are added to the queue but nothing happens. JSONAPI stops responding to requests (it gets them, as evident from the JSON query messages in console, but doesn't return a result, just sits there). BomberCraft simply won't let you start the game with /bc start, nothing happens when you do. Those are the only plugins I've noticed so far that don't work properly when it happens, but there are probably more that I just haven't noticed yet.

    It also seems like they DO actually work - another admin on the server tried to rollback something while the "Can't keep up" every 2 seconds was happening, and it took 5 minutes before it actually rolled back - but it did roll back. So it seems like something is causing the server to suddenly slow down 100-fold so something that should take 3 seconds actually takes 5 minutes. Either there's something seriously wrong with Minecraft, Java or my server. I've been leaning towards Minecraft before when there was lag but now I'm not so sure anymore.

    I tried the Java server VM (from the JDK) and the normal Java VM (both 1.7) and they both did the same thing, but I haven't tried it with 1.6 to see if it occurs there too. My guess is it would, since 1.7 is supposed to be superior in terms of speed.
     
  5. Offline

    LilacTheEspeon

    Hmm, unless the RAM is being used up somewhere else, or a plugin is conflicting with another plugin and that is causing the lag, I can't form another reason. Maybe take it plugin by plugin on a test server, and see what is causing it. Or take it slow and go thoroughly through your server file and look for something missing. This is quite the quagmire. :p
     
  6. Offline

    Jdbye

    Well, it's not my server configuration, because I got a new, better server and the exact same thing happened there too. It fixed itself after a few minutes though, but then came back again a few minutes later. It's probably a plugin like you say, but I have a LOT of plugins... It's going to be a pain to check them all for conflicts. What do you mean by "server file"?

    Edit: So far on the test server with the exact same set of plugins, the "bug" hasn't occurred yet. I'm confused.
    Edit2: Turns out it was Spout. I installed that around b1185 and it has happened ever since then but I didn't think that might be it, since so many other people are using it and seemingly without any problems. Seems I was wrong, it wasn't Spout, that was just a temporary fix. Same with LWC. It seems like disabling some plugins can stop the issues from occurring temporarily, but nothing I've tried disabling so far has been a permanent fix so I'm giving up on that being the problem. I'm just hoping that when we start with fresh worlds in MC 1.9, the issue won't happen anymore, since it didn't on the test server.

    Edit3: It's not related to the lag message, because now it's happening without that message even appearing. I'm puzzled as to what's causing it.
     
  7. Offline

    Jdbye

    Bump. The "can't keep up" message no longer appears but the issue is still prevalent. If it's one of my plugins causing it, I've given up on trying to figure out which one, because all my attempts so far have failed and only seemed to solve it but the issue occurred again a few minutes or hours later.

    It seems to be affecting server ticks a bit, because I'm only getting 17.9 TPS and it's usually a solid 20, so it's probably related to the "can't keep up" messages, but they don't neccessarily have to appear. I'm starting to think there's just something wrong with my worlds, something that's causing the server to choke, since it didn't happen on a test server with fresh worlds and unloading plugins didn't seem to help on the main server. Whatever it is, it never happened until 1.8. Hopefully either 1.9 or fresh worlds fixes it again.
     
  8. Offline

    TnT

    You're using too many GC flags. They're going to conflict/cause you problems. Try this:
    Code:
    java -Xincgc -Xmx6G -jar craftbukkit.jar
     
  9. Offline

    Jdbye

    I've read that post before. I had a lot of flags when I read it, and I removed most of them, but later found a post that suggested to add some flags and explained very well what they did. For example there have been reports of the JVM not using SSE by default so it was suggested to add that flag. Some of the garbage collector flags were suggested in the very post you linked. MaxGCPauseMillis was suggested to avoid the GC delaying world updates, FastAccessorMethods make sense since they simply use faster versions of some methods. I don't remember exactly what they all did, but it made sense to add them based on the descriptions.

    I don't think the flags are related to the issue for two reasons: First, I've been using most of those flags for a long time and never had any issues until recently, and second, the test server with the same flags had no issues.
    It's true that the JVM might enable some of these flags automatically, but enabling them manually shouldn't hurt. Some are scheduled to be enabled by default in future Java versions, but aren't currently - it makes sense to enable these since they provide some sort of performance benefits.
    I don't remember the details on what all the flags do, but I read what each one did before I added them and they made perfect sense to add. Unless of course, the descriptions on oracle.com are not neccessarily true in every case.
    It might look like I just threw a bunch of parameters in there but I actually read on oracle.com what each one did.

    However, I'll try removing all of them and see if the issue goes away (something I doubt)
    By the way, testing with these flags now:
    (-Djline.terminal is required by RemoteToolkit)
    Edit: Also note most of the flags I'm using aren't actually GC related. However, -UseGCOverheadLimit simply lets the GC run for longer before it throws an Out Of Memory error - and obviously you don't want the server to run out of memory, so it makes sense to have that.

    It might be possible that one specific recently added flag caused it. I added several flags not too long ago (mostly GC related), like the SurvivorRatio and TenuringThreshold flags. I'm pretty sure most of them are fine, however. CMSClassUnloadingEnabled is self-explanatory (enables unloading classes no longer in use, which it doesn't do by default), and I'm not sure about it's usefulness for Bukkit, but at the very least it shouldn't do any harm. CMSParallelRemarkEnabled is to reduce remark pauses which technically should reduce the chance of world updates not happening at 20 TPS (aka lag). CMSIncrementalPacing I'm actually not sure what does because I'm not sure what "This flag enables automatic adjustment of the incremental mode duty cycle based on statistics collected while the JVM is running." means, not sure why I added that flag. All the other ones make sense though.
    Edit: I looked up what the incremental mode does and it's basically a way to reduce GC-induced pauses by dividing the work the GC does into smaller chunks, so it sounds like a useful parameter.

    I'm actually starting to think it's the SurvivorRatio or MaxTenuringThreshold. Those were some of the ones I added last and while I don't see why they would be causing problems, I already tried removing the other ones I added last and it made no difference. (and the other ones weren't GC related)
    I found a lot of the parameter suggestions here: http://www.oracle.com/technetwork/java/tuning-139912.html#section4.2.5, including SurvivorRatio and MaxTenuringThreshold.

    Edit:
    Using these flags and no issues so far, but they sometimes take hours to appear so I don't know for sure yet if it's fixed, I'm hopeful though.

    Nope, that wasn't it. Using the flags you recommended (plus -server) and the issue is still occurring.

    EDIT by Moderator: merged posts, please use the edit button instead of double posting.
     
    Last edited by a moderator: May 20, 2016
Thread Status:
Not open for further replies.

Share This Page