|
Re: Reading characters from socket wierdness
Characters are converted to and from numeric codes via mapping tables (Charsets in Java), and systems may differ (or be configured differently) in which one they use by default. Two common ones are ISO-8859-1 (aka Latin-1) and UTF-8. These use the same mappings for numeric codes 0-127 (covering most common English text and Java source), but differ in their encoding of other characters. For example, the character (an accent symbol) that's encoded as a single byte with value 0xB4 (180) in Latin-1 is encoded as the two-byte sequence 0xC2B4 (194 180) in UTF-8.
If your intent is just to transfer raw data across the socket, stay away from anything involving characters and character sets, and just send byte sequences.
|