Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   Java (http://www.chiefdelphi.com/forums/forumdisplay.php?f=184)
-   -   Reading characters from socket wierdness (http://www.chiefdelphi.com/forums/showthread.php?t=93531)

drakesword 13-03-2011 12:23

Reading characters from socket wierdness
 
Anyone care to explain this one to me?

Send a character value of 1 receive a 1
. so 1 -> 1
2 -> 2
so on until I tried 150 then it receives a 194 150 as two separate characters
. so 150 -> 194 150
175 -> 194 175
180 -> 194 180

Tried the BufferedReader without the cRio (on windows and linux) and send a 180 receive a 180.

buchanan 13-03-2011 15:59

Re: Reading characters from socket wierdness
 
Characters are converted to and from numeric codes via mapping tables (Charsets in Java), and systems may differ (or be configured differently) in which one they use by default. Two common ones are ISO-8859-1 (aka Latin-1) and UTF-8. These use the same mappings for numeric codes 0-127 (covering most common English text and Java source), but differ in their encoding of other characters. For example, the character (an accent symbol) that's encoded as a single byte with value 0xB4 (180) in Latin-1 is encoded as the two-byte sequence 0xC2B4 (194 180) in UTF-8.

If your intent is just to transfer raw data across the socket, stay away from anything involving characters and character sets, and just send byte sequences.

drakesword 13-03-2011 23:21

Re: Reading characters from socket wierdness
 
I understand that. I was sending a byte value wrapped in character form to avoid signing. I tried sending the data as bytes and reading as bytes and still got the same result.

buchanan 14-03-2011 09:55

Re: Reading characters from socket wierdness
 
Want to post some code?

drakesword 14-03-2011 10:08

Re: Reading characters from socket wierdness
 
Sure

On the robot.
Code:

*snip*
client = scn.acceptAndOpen();
reader = new BufferedReader(new InputStreamReader(client.openInputStream()));
*snip*
char[] data = reader.readLine().toCharArray();

for(lcv = 0; lcv < dat.length; lcv++)
{
    System.out.print((int)data[lcv] + " ");
}
System.out.println();


On the computer

Code:

*snip*
out = new BufferedWriter(new OutputStreamWriter(out.getOutputStream()));
*snip*
out.write(new char[]{1,2,3,100,150,180});
out.write("\n");
out.flush();

So I am sending 1 2 3 100 150 180
robot reports it is receiving 1 2 3 100 194 150 194 180

Even larger numbers have other strangeness with them. Sent a 200 and the robot received a 131 199

buchanan 14-03-2011 13:03

Re: Reading characters from socket wierdness
 
In Java a char is a 16-bit value assumed to mean a unicode character or "code point". This is an internal representation, and any time you do I/O through encoding-aware classes (Input/OutputStreamReader/Writer) it gets converted to or from an external representation. This can be either explicitly specified or taken from the platform's default. If the reader doesn't use the same encoding as the writer, mismatches occur and the reader doesn't get out the same internal "char" values the writer put in. Both UTF-8 and forms of LATIN-1/ISO-8859-1 are in common use as defaults, so relying on defaults is dangerous when passing data between dissimilar machines. What's insidious is that these two encodings, though strictly speaking incompatible, actually do map 0-127 the same way, so programs only passing code points in this range appear to work, even if they're mismatched.

Below is some code you can play with to observe the various interactions, but the takeaways are 1) Don't use encoding-aware APIs unless what you're passing really is text data, and 2) If you are passing encoded text between different platforms, specify the encoding explicitly.
Code:

/////////////////////////
import java.net.*;
import java.io.*;
import java.nio.charset.*;

public class Reader
{
        // reads a character sequence presumed to be in the platform's default character encoding
        public static void main(String[] argv)
        {
                try {
                        System.out.println(Charset.defaultCharset());
                        // run w/ -Dfile.encoding=UTF-8 or -Dfile.encoding=ISO-8859-1 on the command line to change the above
       
                        ServerSocket ss = new ServerSocket(0);
                        System.out.println(ss.getLocalPort());
                        Socket s = ss.accept();
                        System.out.println(s.getPort());

                        BufferedReader reader = new BufferedReader(new InputStreamReader(s.getInputStream())); // uses Charset.defaultCharset()
                        //BufferedReader reader = new BufferedReader(new InputStreamReader(s.getInputStream(), "ISO-8859-1")); // explicitly specifies encoding
                        char[] data = reader.readLine().toCharArray(); // convert the incoming encoded sequence assuming it's in "our" encoding
                        // if our encoding matched the writer's (whatever it was) all is well
                        // if there's a mismatch, we get various kinds of garbage, depending on who used what
                        // for UTF-8/ISO-8859-1 mismatches, the garbage only shows up in code points > 127, since their encodings happen to match for 0-127
                        for(int i = 0; i < data.length; i++) {
                                System.out.print((int)data[i] + " ");
                        }
                        System.out.println();
                }
                catch (Exception ex) {
                        ex.printStackTrace();
                }
        }
}
/////////////////////////
public class Writer
{
        // writes a character sequence in the platform's default character encoding
        public static void main(String[] argv) // supply the Reader port number in argv[0]
        {
                try {
                        System.out.println(Charset.defaultCharset());
                        // run w/ -Dfile.encoding=UTF-8 or -Dfile.encoding=ISO-8859-1 on the command line to change the above
               
                        Socket s = new Socket(InetAddress.getLocalHost(), Integer.parseInt(argv[0]));
                        System.out.println(s.getLocalPort());

                        BufferedWriter out = new BufferedWriter(new OutputStreamWriter(s.getOutputStream())); // uses Charset.defaultCharset()
                        //BufferedWriter out = new BufferedWriter(new OutputStreamWriter(s.getOutputStream(), "ISO-8859-1")); // explicitly specifies encoding
                        char[] data = {1, 2, 3, 100, 150, 180}; // a "char" is a 16-bit unicode "code point"
                        out.write(data); // the OutputStreamWriter encodes the chars in its charset
                        // under UTF-8, the last line writes 1 2 3 100 194 150 194 180 (4 8-bit values and 2 16-bit)
                        // under ISO-8859-1 it's 1 2 3 100 150 180 (all 8-bit values)
                        out.write("\n"); // writes a 10 (in either encoding)
                        out.flush();
                }
                catch (Exception ex) {
                        ex.printStackTrace();
                }
        }
}
/////////////////////////
public class RawReader
{
        // reads a stream of bytes; nothing here is affected by the JVM's default encoding
        public static void main(String[] argv)
        {
                try {
                        System.out.println(Charset.defaultCharset());
               
                        ServerSocket ss = new ServerSocket(0);
                        System.out.println(ss.getLocalPort());
                        Socket s = ss.accept();
                        System.out.println(s.getPort()); // pass to Writer in argv[0]

                        InputStream in = s.getInputStream();
                        for (int i = in.read(); i != -1; i = in.read()) {
                                System.out.print(i + " ");
                        }
                        System.out.println();
                }
                catch (Exception ex) {
                        ex.printStackTrace();
                }
        }
}
/////////////////////////
public class RawWriter
{
        // writes a stream of bytes; nothing here is affected by the JVM's default encoding
        public static void main(String[] argv) // supply the Reader port number in argv[0]
        {
                try {
                        System.out.println(Charset.defaultCharset());
               
                        Socket s = new Socket(InetAddress.getLocalHost(), Integer.parseInt(argv[0]));
                        System.out.println(s.getLocalPort());

                        OutputStream out = s.getOutputStream();
                        byte[] data = {(byte)1, (byte)2, (byte)3, (byte)100, (byte)150, (byte)180}; // bytes are 0-255 integers
                        out.write(data); // no encoding happens here
                        //out.write((byte)10); // if we add the EOL (10) here we can duplicate the output of -Dfile.encoding=ISO-8859-1 Writer
                        out.close();
                }
                catch (Exception ex) {
                        ex.printStackTrace();
                }
        }
}
/////////////////////////


derekwhite 14-03-2011 14:17

Dealing with UNsigned values in Java
 
I agree that reading and writing bytes is what you want here. To get around the "sign extension" problem requires an extra operation...

To recap, most integral types in Java are signed:
byte - signed 8-bit (-128..127)
short - sign 16-bit (-32768..32767)
char - unsigned 16-bit (0..65535)
(mostly for unicode character set, but you can use them as a numeric type)
int - signed 32-bit (-2147483648..2147483647)
long - signed 64-bit (0x8000000000000000L..0x7fffffffffffffffL) (the decimal values are getting less useful here!)

Compared to C, Java is nice in that the limits for each type are the same on every platform. BUT - it's much more of a pain to deal with unsigned values in Java. Also note that Java's "char" is nothing like a C "char".

To deal with unsigned values in Java you need to promote the type to a "larger" type, then mask the value back to an 8 (or 16, or 32) bit range.

BYTE:
int unsignedVal = signedValue & 0xFF;
SHORT:
int unsignedVal = signedValue & 0xFFFF;
INT:
long unsignedVal = signedValue & 0x0FFFFFFFFL;

To extend buchanan's RawReader example:

Code:

InputStream in = s.getInputStream();
for (int i = in.read(); i != -1; i = in.read()) { // read signed byte
        i = i & 0xFF;                                      // convert to unsigned byte value
        System.out.print(i + " ");
}
System.out.println();


drakesword 14-03-2011 23:18

Re: Reading characters from socket wierdness
 
I had at one point sent raw byte data as well. But the fact remains that when you use read() or readLine() the api returns 2 bytes of data for the larger values whereas the documentation says it returns an int between 0 and 255
. . . I will try again when I have access to the robot.


So essentially I need to select the "other" encoding. To help reduce ping-ponging code debugging. Does anyone know what encoding the cRio natively reads so I can set it in the OutputStreamWriter?


All times are GMT -5. The time now is 10:40.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi