Solved URLDecoder.decode not working with umlauts

Discussion in 'Plugin Development' started by V10lator, Oct 9, 2012.

Thread Status:
Not open for further replies.
  1. I have a problem with BukkitHTTPDs chat that I can't squash:
    When I write "test: möp" in the web chat I get this: "test: m�p" - no matter where I print it out (directly with System.out.print() or sending it to the logger or sending it to the in-game chat or sending it back to the browser). What I do to decode the post data is:
    Code:java
    1. if (raw.startsWith("?"))
    2. raw = raw.substring(1);
    3. raw = raw.replaceAll("\r", "").replaceAll("\0", "");
    4. for (String arg : raw.split("&")) {
    5. String[] split = arg.split("=");
    6. if (split.length < 2)
    7. continue;
    8. post.put(URLDecoder.decode(split[0], "UTF-8"), URLDecoder.decode(split[1], "UTF-8"));
    9. System.out.print(URLDecoder.decode(split[1], "UTF-8")); //Just for debugging...

    BTW: Without URLDecoder.decode I get this: test%3A+m%F6p
     
  2. Offline

    one4me

    Whenever the string was encoded originally, it was encoded in "ISO-8859-1".
    So if you change "UTF-8" to "ISO-8859-1" the string should be printed correctly.

    Also, if you never set the charset to "ISO-8859-1" the server may have just defaulted to that charset. If that's the case and you want more than the first 256 Unicode characters to be supported you should probably set the default charset to "UTF-8" or have the server encode the string in "UTF-8".
     
  3. one4me The client uses the wrong charset (the server is the plugin... :p) ?
    But in the html file I have:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    Also I just uploaded this test file (UTF-8) which schows "Täscht" at the client:
    Code:
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    </head>
    <body>
    Täscht
    </body>
    It works just fine.
    So if the client reads the html file in UTF-8, why does he send in ISO-8859-1 (and shouldn't UTF-8 be able to show ISO-8859-1 characters?)
    Just tested: With chrome it works just fine so is firefox not w3c conform?!?

    //EDIT: Printed out all the headers firefox gave me:
    Code:
    17:20:54 [INFO] Content-type: application/x-www-form-urlencoded
    17:20:54 [INFO] Host: [censored]
    17:20:54 [INFO] Accept-encoding: gzip, deflate
    17:20:54 [INFO] Content-length: 71
    17:20:54 [INFO] Connection: keep-alive
    17:20:54 [INFO] Accept-language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
    17:20:54 [INFO] Referer: [censored]/chat/chat.v10
    17:20:54 [INFO] User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1
    17:20:54 [INFO] Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    No information that tells us to decode in ISO-8859-1. How the hell do other servers handle this? -.-

    //EDIT²: Adding accept-charset="utf-8" to the form in the html file fixes this. But this still isn't w3c compilant. Time to fill a bug for firefox.
     
  4. Offline

    one4me

    When you said web chat I thought you had a server set up that was encoding the messages which would then be sent to the Minecraft server. (I didn't really read into how BukkitHTTPD works, but the encoding/decoding issue should have still been the same.)

    Both UTF-8 and ISO-8859-1 both support the first 256 Unicode characters, however past 128 characters they are encoded differently (past 256 characters ISO-8859-1 offers no support at all.) This StackOverFlow post explains it a lot better.

    As for the Firefox issue, it looks like you got it figured out (I don't know too much about HTML so I couldn't help much with that anyways).
     
  5. Offline

    Njol

    IIRC (URL)InputSteam has a getEncoding() method, and there's also a static method guessEncoding that guesses the encoding of a stream from the first few bytes.
     
Thread Status:
Not open for further replies.

Share This Page