Java - using Unicode Characters in Variable Names

This is completely not critical at all. If you are here, and have literally anything important to do, leave now and go do it. Don’t waste your time on this question.

OK. To those who remain:

We’re attempting to make variable names in our Java code out of unicode characters (because reasons).

I was under the impression javac does support unicode in source files, but we’re getting errors on build:

error: illegal character: '\u02dc'

This stackoverflow post makes me think that there’s a flag we need to add to the javac calls that are actually building the source. However, my gradle skills are not good enough to know where this would need to be injected. So two questions:

  1. Anyone know for sure if that -encoding flag is the secret sauce to making javac happy?
  2. Anyone know the best-practice way to add the flag to the FRC gradle build?
1 Like

There is a way to pass arguments to javac using Gradle. I haven’t tried to do this, but maybe this will help you:

You can then use the information on that to pass a -encoding flag

So maybe add

compileJava {
    options.compilerArgs << '-encoding UTF8' 
}

Into your build.gradle. Again, I haven’t tried this, but maybe it will work.

Add this to build.gradle:

tasks.withType(JavaCompile) {
    options.encoding = "UTF-8"
}

This is what we used in Shuffleboard, albeit with the Kotlin DSL. This should still work for the Groovy DSL

Hmmm, No dice so far. Adding @SamCarlberg 's flag seems to get the option in the build:

2020-01-25T15:25:11.490-0600 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationExecutor] Build operation 'Compile Java for :compileJava' started
2020-01-25T15:25:11.507-0600 [DEBUG] [org.gradle.api.internal.tasks.compile.NormalizingJavaCompiler] Compiler arguments: -source 11 
-target 11 -d C:\Users\IRONCHEF\Documents\Git\RobotCasserole2020\RobotCasserole2020\RobotCasserole2020\build\classes\java\main -encoding UTF-8 -g -sourcepath  -proc:none -s C:\Users\IRONCHEF\Documents\Git\RobotCasserole2020\RobotCasserole2020\RobotCasserole2020\build\generated\sources\annotationProcessor\java\main -XDuseUnsharedTable=true -classpath C:\Users\IRONCHEF\.gradle\caches\modules-2\files-2.1\com.googlecode.json-simple\json-simple\1.1.1\c9ad4a0850ab676c5c64461a05ca524cdfff59f1\json-simple-1.1.1.jar ... (continues for quite some time)

But the build still errors with (more specifically):

> Task :compileJava FAILED
C:\Users\IRONCHEF\Documents\Git\RobotCasserole2020\RobotCasserole2020\RobotCasserole2020\src\main\java\frc\robot\ControlPanel\ControlPanelColor.java:23: error: illegal character: '\u2603'
        int ? = 1;

Still experimenting.

Are you following the java rules for variable names? I don’t think the only character can be a Unicode character.

Good question.

In my head I had mentally mapped and 😊 to be in the same category as a, x, and y, but upon further consideration I realized I have no real justification for that.

Inspired to RTFM, I found this:

Not knowing the ins and outs of Unicode, I began to wonder if “Letters and Digits” indicated some particular subset of Unicode. It seemed reasonable.

For example, this compiles:

int the_౫_variable = 5;

But this produces illegal character:

int the_😊_variable = 5;

While looking for that answer, I found a more spec-like set of documentation with the following:

Say what you want about kids these days, but I think it’s hard to defend that :blush: is part of anyone’s native written language.

Regardless, there’s still this “letters and digits” thing that I can’t quite seem to nail down from online investigation. So, since java conveniently provides an API to detect if a particular character is allowed inside of (or as the start of) an identifier, I thought, why not just dump a handy table?

package frc.robot;

import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;

public class testUnicode {

    public static void main(String[] args) {
        try{
            Writer out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("./unicodeTest.txt"), "UTF-8"));

            String infoString = new String();
            for(int i = 0x0020; i < 0xFFFF; i++){
                infoString = "";
                infoString += String.format("0x%04X: %c ", i, (char)i);
    
                if(Character.isJavaIdentifierPart(i)){
                    infoString += " ID_PART ";
                } else {
                    infoString += "         ";
                }
    
                if(Character.isJavaIdentifierStart(i)){
                    infoString += " ID_START ";
                } else {
                    infoString += "          ";
                }

                infoString += "\n";
                out.write(infoString);
            }
            out.close();

        } catch (Exception e) {
            System.out.println("What, you actually expected reasonable error handling?");
            System.out.println(e);
        }
    }
}

Which produces the following results:unicodeTest.txt (2.0 MB)

After perusing the list - I am genuinely impressed by what characters Java does support, but it’s missing some critical ones. Namely the fun ones. We’ll have to find some other way to fulfill FIRST’s Core Value of “fun”.

TL;DR - Java is no fun.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.