The other day, I was doing something like importing an HTML file and changing the characters.
String str1 = "hoge hoge";
String str2 = anyElement; //Imported from HTML file"hoge hoge"
System.out.println(str1.equals(str2));
// false
What a false was output. Eh, false ...? Is it a bug in String # equals at first? I thought, but I don't think that's the case. I checked various things.
So, when I changed the above two strings to UTF-8 with String # getBytes, it became as follows.
str1 : [104, 111, 103, 101, 32, 104, 111, 103, 101] str2 : [104, 111, 103, 101, -62, -96, 104, 111, 103, 101]
Hmm?
-62, -96 ...!?
*** What this! !! !! !! *** ***
 (0xC2, 0xA0);
[Non-breaking space-Wikipedia](https://ja.wikipedia.org/wiki/%E3%83%8E%E3%83%BC%E3%83%96%E3%83%AC%E3%83%BC % E3% 82% AF% E3% 82% B9% E3% 83% 9A% E3% 83% BC% E3% 82% B9) ↑ It seems that the 2 bytes of 0xC2 and 0xA0 are represented by "no break space".
Something like nbsp on an HTML file isn't the usual half-width space (0x20) in UTF-8. It seems to be represented by 2 bytes of 0xC2 0xA0.
Even if it is output to standard output, it looks like just a half-width space. It's a trap ...
In that case, there may be a problem in processing the character string. The following is the case where all half-width spaces are non-breaking spaces.
String hoge = "a b c".split(" ");
// hoge = ["a b c"]
// ["a", "b", "c"]it's not...?
String fuga = "a b c".replaceAll(" ", "d");
// fuga = "a b c"
// "adbdc"it's not...?
It's a trap ... However, there is no problem if you do the following, for example.
public static final byte[] NBSP = {(byte)0xC2, (byte)0xA0};
String hoge = "a b c".split("[ |" + new String(NBSP) + "]");
// hoge = ["a", "b", "c"]
String fuga = "a b c"
.replaceAll("[ |" + new String(NBSP) + "]", "d")
// fuga = "adbdc"
Both replaceAll and split take regular expressions as arguments, so If you select [(half-width space) | (no break space)], either one will be caught.
Don't worry about non-breaking spaces anymore ...