Our deploy process now often results in two copies of the java process on the RoboRIO. I dimly recall other folks complaining about this same issue awhile ago, but I can’t find those threads. Does anyone remember those cases?
One weird clue, the FRC_UserProgram.pid file is missing, so the frcKillRobot.sh script never kills it. I would expect subsequent deploys to somehow recover from that state, e.g. the “kill -9” route, but it seems not to.
We’re running the dev version, if that matters.
Can you clarify what you mean by “running the dev version”? I can think of a few different things you might mean.
WPILib is in the process of making breaking changes for the new season and things aren’t always in sync across all repositories and tools. In addition, vendors haven’t made compatible updates. I very much expect things to be broken right now, but not sure that I’d expect it to exhibit what you’re seeing.
sorry, i meant this:
https://github.com/wpilibsuite/allwpilib/blob/main/DevelopmentBuilds.md
is there another kind of “dev version”?
We use this version to avoid the disruption of library porting at the start of the build season.
Right, I’m not sure how the contents of the jar would somehow break the roboRIO end of the deploy process in this way. A student noticed today that the new process seems unable to “kill -9” the old one (the output from this line never appears); maybe that’s a clue for folks more knowledgable about how all this works than i amAndyMark. Interesting to me, the cpp referenced here uses a different “pid file” than the frcKillRobot.sh script, which is kinda weird to me, but whatever.
I found a few threads that mentioned replacing lvrt with a shell script. we’llLimelight, an integrated vision coprocessor try that, since our actual requirements seem a lot simpler than the code here seems to support.
https://gist.github.com/PeterJohnson/2702b331ee3c236188ec76ecf499c333
Our team was experiencing those issues, we followed these steps to temporarily fix it. It wasn’t too much of an issue because it only happened on deploy.
- SSH into the RoboRIO (ssh admin@10.[first 2 digits of team number].[last 2 digits of team number].2)
- perform the command ps -ef to list the processes running
- Find the java process that was stalling out
- kill -9 the process
Yeah, we did that too. Did the problem go away for you? Or are you still doing this on every deploy?
Still doing it on every deploy, it’s been an issue for the past 6 months or so and we haven’t found a fix online yet.
ohh noooooo!
Generally, issues like this are caused by issues in team code. If you’re using custom threading anywhere, or not handling InterruptedExceptions correctly, it can cause this behavior. If you’re not using either of them, check to see if your vendors are.
Could you say more about that? We’re not doing anything with threads, etc, and not using any vendor code that everybody else doesn’t also use. How would a threading issue result in the erasure of the pid file so that the process killer can’t find the process to kill?
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.