Improbable Java slow down in combining doubles and strings

We’ve been chasing a bug all year that has been driving us crazy. I think I may have found it, but my result feels so improbable that I’m having a hard time believing it; I’d like to describe it here in the hopes that someone else can corroborate or point out the obvious error in my conclusion.

That is, we use string constructs like this:
"Header " + m_some_double_variable
We use that in debug messages, and in a header to a csv file we generate.

However, the first use of such a string construct seems to take an astronomical amount of time.

For example, this code:
https://gist.github.com/jwhite66/c799cc8bca57f165d5d70907749fd9bc

Gives the following result:

********** Robot program starting **********
NT: server: client CONNECTED: 10.28.23.162 port 59765
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Normal string operations took: 1.0367089999999999
Don’t slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0
String builder operations took: 0.0016929999999999445

That is, a full second just to print a string!

My best guess is that there is some sort of exception that occurs with some sort of JIT compilation. I haven’t probed a lot further; it is the case that the second use of a string such as that does not take as long.

Anyone else have any insight on this one?

1 Like

When I was working with Team 8 last week we were noticing incredible slowdowns with printing (System.out.println()) and reverted to dumping everything to files instead. Nothing on the order of a second, but sometimes several ms.

The delay seems proportional to the number of doubles you print; the more, the longer the delay. It also seems to be catastrophically bad only once, and then just slow after that.

I too am wondering if JIT compilation has anything to do with these strange delays (printing, CAN, various netcomm things it seems) in general. We noticed our first auto periodic loop taking 250ms and then it dipping to between 20-40ms with occasional spikes up to 60ms. Double checked and couldn’t find any one-off initialization stuff (that was all handled in auto init).

Here’s another thread from last week on loop timing issues:

How slow after the first time?

If you swap the two operations (StringBuilder and string operations) does the result change? What if you use normal Java timer functions instead of getFPGATimestamp? What if you don’t include the println but measure only the string part of the operation?

JIT generally takes a few thousand iterations to kick in.

1 Like

No change. New code is here:
https://gist.github.com/jwhite66/2e28319273727f6ffb84cb9fa02ce7fa

Second and subsequent calls seem not to take any slow down. I have an instinct that the ‘gremlin’ we have been chasing is worst at start up, but then we also see it periodically, which is what leads me to believe there is something else causing it. The smaller hiccups could just be Garbage collection, which I was also independently pursuing; there seems to be more GC than I would expect, but those hiccups are only on the order of ~ 20ms, which our control systems survive.

I did also try adding a ‘-Xint’, and that just made it worse. I’m largely a Java neophyte, so I’m just pushing buttons in the dark :-/.

I also tried moving it to teleopInit(), in the theory that there was a startup pain of some kind, and that also made no material difference.

For completeness, here is the output of my last run, with the code give above:

Results:
********** Robot program starting **********
NT: server: client CONNECTED: 10.28.23.162 port 62630
Don’t slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 2.0, m_b: 4.0, m_b: 5.5, m_b: 4.2, m_b: 7.0, m_b: 8.0, m_b: 9.0
fast time: 0.026999950408935547
slow time: 0.8399999141693115
slow time (again): 9.999275207519531E-4
slow time (after changing): 9.999275207519531E-4

Not sure what is going on, but here are a couple thoughts.

The one thing that is different about appending a double (or a float) to a StringBuilder is that it will cause the FloatingDecimal class to load if it has not already. Perhaps that is the issue with the first run. In addition, it is possible that a native library is being loaded for the first time along with the class. The only native I could definitely see it reaching is System.arrayCopy but I would be surprised if that was not already loaded.

A couple of things to try:

  1. Change the doubles to ints to see if the first time performance improves.

  2. Try using FloatingDecimal directly first to force it to load. In particular, try to hit the FloatingDecimal.appendTo(double, Appendable) method since that is what StringBuilder uses to append a double. Or, just do a single double append to a StringBuilder and then run your test concatenations.

I hope this helps,
Steve

I would avoid concatenating strings with plus.
I would bet that the code that gets generated looks something like this
new String( new String( new String( … ) …)))))
It’s supposed to be using StringBuilder, but it also might not be doing it efficiently.
I would use StringBuilder, or String.format(" %f … ") and measure that.

Sure; there are a variety of best practices for strings in Java.

But 1 second? That feels wrong in some more substantive way.

Further, this did not happen in previous years, so it feels like some part of the changes this year (Java 11?) triggered this change. It strikes me as wise for us to figure out why this is happening so we can correctly solve it, not just work around it.

(And, arguably, no one rational should use Java for timing critical code, but Java is what they teach in many high schools, so…)

1 Like

Have you measured how bad it was previously?

No, there was never a noticeable or even measurable impact.

We didn’t have the watchdog timers checking values before 2019, so its very possible this wasn’t noticed in previous years, but was still an issue. Did you have other timing checks that were checking how good timing was?

The nature of the code is:

  • Start a motion planner with a period (usually 5-20 ms)
  • Write a header line to a CSV
  • Until motion plan complete
    • Compute the setpoint, set setpoint, run pid loop, write a line to the CSV

We then plot the resulting csv to tune the motion planner. Until this year, that always worked, and our CSVs were completely rational.

This year, the result has been that the first two data points are severely delayed, presumably because the string generation that goes into the csv writes is taking a long time. That’s novel this year, and not something we’ve seen in the past. (And mechanical behavior with 1 second delays in control loops are wicked fun :slight_smile: ).

Also, I often encourage the kids to debug by printing, and I know that we’re often spamming code with System.out.println(blah + blah), and I feel certain that we would have noticed substantial delays as a result of that spamming.

I cannot look at it this week, but I’ll investigate if there’s a difference between the openjdk java and the oracle java.
Obviously there’s something wrong with the ‘+’ string concat.

It seems most likely this is a regression bug in Java 11. We changed from the Oracle 8 JVM in 2017 to the Zulu 8 JVM in 2018 to the OpenJDK 11 JVM in 2019. Note that Java 11 Oracle builds now are identical to OpenJDK, and the OpenJDK 11 JVM ARM JIT engine is supposed to be the same as the old Oracle 8 one. Some detailed profiling work within the JVM will likely be necessary to ferret this out. Fortunately it’s all open source, but it is a massive piece of software. This is the repo we use to build OpenJDK 11 for the RoboRIO in case anyone is interested: https://github.com/wpilibsuite/frc-openjdk-roborio

I’ll take a poke, but it’s way outside my area of expertise. Can I experiment on a Pi? That might be easier (Robot requires travel, Pi sitting right here… :slight_smile: ).

One more data point - I tried building a second string the ‘slow’ way, and that was quick, so it seems to be a first time only problem. I haven’t tested this in my current code, but I’m pretty sure that the problem occurs once per thread, not just once per program.

If the Pi shows the same problem, sure! :slight_smile: For the FRCVision rPi image, we build our own OpenJDK 11 from the same sources, although we build the server JVM instead of the client JVM.

Of course, it’s even better if it repro’s on desktop…

This issue is specific to Java 11 “target” class files; if you target to Java 8 class files (but still run with Java 11), there’s no slowdown. I changed the test to be standalone to make it easy to run. The following is on desktop with Oracle-distributed OpenJDK 11.0.1.

javac Slow3.java (or javac -source 1.8 Slow3.java), java Slow3 result:

Don't slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 2.0, m_b: 4.0, m_b: 5.5, m_b: 4.2, m_b: 7.0, m_b: 8.0, m_b: 9.0
 fast time: 9.999275207519531E-4
 slow time: 0.06799983978271484
 slow time (again): 0.0
 slow time (after changing): 0.0

javac -source 1.8 -target 1.8 Slow3.java, java Slow3 result:

Don't slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 2.0, m_b: 4.0, m_b: 5.5, m_b: 4.2, m_b: 7.0, m_b: 8.0, m_b: 9.0
 fast time: 0.0010001659393310547
 slow time: 0.0
 slow time (again): 0.0
 slow time (after changing): 0.0

Disassembling the two class files shows the difference.

With Java 8 target class file, the compiler turns the +'s into a StringBuilder and builds the string that way.

With Java 11 target class file, the compiler makes a single InvokeDynamic call to a function called makeConcatWithConstants.

My guess is the first time InvokeDynamic is called on this function, a lot of work has to be done to create an optimized function. Later calls are much faster because the function has already been created. I’m challenged to call this a “bug” in the typical sense. It’s certainly a lot slower on first invocation, but it’s likely a lot faster on later invocations. There’s lots of cases of this throughout Java. E.g. the first time you use a function it may need to load a new .class file and that can take a lot of time. This does seem to be a particularly extreme case.

If you really care about this, you can change the string concatenation method with a parameter to the java runtime.
“java -Djava.lang.invoke.stringConcat=BC_SB Slow3” with the Java 11 class file results in:

Don't slow me down, m_a: 1.0, m_b: 2.0, m_c: 3.0, m_d: 4.0, m_e: 5.0, m_f: 6.0, m_g: 7.0, m_h: 8.0, m_i: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 3.0, m_b: 4.0, m_b: 5.0, m_b: 6.0, m_b: 7.0, m_b: 8.0, m_b: 9.0
Slow me down, m_a: 1.0, m_b: 2.0, m_b: 2.0, m_b: 4.0, m_b: 5.5, m_b: 4.2, m_b: 7.0, m_b: 8.0, m_b: 9.0
 fast time: 0.0019998550415039062
 slow time: 0.003000020980834961
 slow time (again): 0.0
 slow time (after changing): 0.0

Some more information on this can be found on Stack Overflow: https://stackoverflow.com/questions/46512888/how-is-string-concatenation-implemented-in-java-9

4 Likes