Chief Delphi

Chief Delphi (http://www.chiefdelphi.com/forums/index.php)
-   Control System (http://www.chiefdelphi.com/forums/forumdisplay.php?f=177)
-   -   Robot Restart and Reset Issues (http://www.chiefdelphi.com/forums/showthread.php?t=135329)

Jared 01-03-2015 20:28

Robot Restart and Reset Issues
 
We're having a very strange problem with our autonomous code and we really need some help fixing it. Our code is here:
https://github.com/dicarlo236/2015-Robot

Our autonomous code generates several arrays of data during the robotInit method. This procedure takes around 400-500 ms, and uses 10 MB of RAM. These arrays are then read by two threads to control the wheels of our robot as we enter autonomous mode. This process is very sensitive to timing, as we are actively controlling position, velocity, and acceleration of the two wheels independently.

If we power on the robot and run our autonomous mode, we drive too quickly and crash into things. The drive movement is jerky and uncontrolled in autonomous mode.

If we power on the robot, wait for the code to start, and push the reset button on the roboRIO, the autonomous code works very well and we pick up three totes.

If we power on the robot, wait for the code to start, and click the reboot roboRIO or restart robot code button on the driver station, autonomous mode works perfectly.

If we power on the robot, compile and download new software, it works perfectly too.

I have replicated this issue many times, and I am sure that the cold boot of the roboRIO is causing our problem.

This weekend, we missed doing the reset trick once during one of our elimination matches and autonomous mode went wrong.

Next time we get to robot, I'll try running our auto setup code in a static initialization block (we're using Java) in the main robot class, or having it initialize auto mode 5 seconds after the user code starts running, but I do not know if this will fix the problem.

We also experience high CPU usage and a bit of lag with the controls before we've reset the robot code. The FTA's at the event could not explain the problem, and kindly allowed us to restart the code and waited for us, but we aren't happy with this solution.

I'm fairly convinced it's a bug with the framework as there should be no difference between a cold boot and a reset button boot.

Any suggestions?

GeeTwo 01-03-2015 21:35

Re: Robot Restart and Reset Issues
 
The most likely thing to cause this sort of behavior is if some piece of hardware (probably a sensor) other than the 'RIO needs a few hundred milliseconds to initialize. Once this initialization is done, the 'RIO initialization works.

Have you tried inspecting your 10MB data block to see what is different between a successful and a jerky startup?

Also you indicated that
Quote:

Our autonomous code generates several arrays of data during the robotInit method.
If this data is only used for autonomous, and you can spare the half second, try moving this array generation to the beginning of autonomousInit() rather than robotInit().

virtuald 01-03-2015 22:48

Re: Robot Restart and Reset Issues
 
If I were to guess at this problem without looking at your code, I would say it has something to do with how you're dealing with time. In particular, this sounds like a system that doesn't allow 'missed frames', and tries to catch up to where it wanted to be, so it keeps executing them until it's caught up.

After looking at it, I noticed that your Updater thing uses a Timer.scheduleAtFixedRate, which looks like it might have a behavior similar to what I'm describing.

"If an execution is delayed for any reason (such as garbage collection or other background activity), two or more executions will occur in rapid succession to "catch up." In the long run, the frequency of execution will be exactly the reciprocal of the specified period (assuming the system clock underlying Object.wait(long) is accurate). As a consequence of the above, if the scheduled first time is in the past, then any "missed" executions will be scheduled for immediate "catch up" execution."

That sounds exactly like what's happening.

virtuald 01-03-2015 23:07

Re: Robot Restart and Reset Issues
 
After thinking some more, one thing that ties into this, is that the RoboRIO doesn't have a battery holding the system's time. So when your RoboRIO starts up it thinks the time is... some time in the past. Maybe an hour, maybe a day, who knows.

However, when the DS connects to your roboRIO, the time is synchronized to the DS's time. So time all of a sudden goes from 0 to 1 billion something.

If your code started before the DS synchronized time, then the fixed time delay mechanism would kick in and try to 'catch up'.

There's a few ways to fix this. One way that pops into my head is only start your updater thing upon the robot being enabled for something. Other solutions involve basing the time delay on something that isn't tied to the robot's datetime... I'm sure there's a source somewhere.

MrRoboSteve 01-03-2015 23:10

Re: Robot Restart and Reset Issues
 
virtuald makes good points. You could log the actual execution time and easily see whether that is happening.

Has anyone documented the GC settings on the roboRIO? I've wondered, but we're a C++ team and a bit busy right now to go figure it out. -XX:MaxGCPauseMillis would be really useful.

I'd move the autonomous selection logic from disabledPeriodic to autonomousInit().

Can you do your profile generation earlier (e.g., via an instance initializer) and see if that helps?

It looks like you know the size of the list of Elements wrapped by Profile in advance. Consider modifying Profile's constructor to receive an initial capacity, and use that to set the size of the ArrayList<Element>. This will help your perf.

Jared 02-03-2015 07:34

Re: Robot Restart and Reset Issues
 
Quote:

Originally Posted by GeeTwo (Post 1451817)
The most likely thing to cause this sort of behavior is if some piece of hardware (probably a sensor) other than the 'RIO needs a few hundred milliseconds to initialize. Once this initialization is done, the 'RIO initialization works.

Have you tried inspecting your 10MB data block to see what is different between a successful and a jerky startup?

Also you indicated that

If this data is only used for autonomous, and you can spare the half second, try moving this array generation to the beginning of autonomousInit() rather than robotInit().

We can't spare the time, as the generation has occasionally taken 1000+ ms to generate, and we need that time to back into the auto zone and drop the stack. However, I think I'll try having the generation happen a few seconds after the controller boots up instead.

Quote:

Originally Posted by virtuald (Post 1451857)
If I were to guess at this problem without looking at your code, I would say it has something to do with how you're dealing with time. In particular, this sounds like a system that doesn't allow 'missed frames', and tries to catch up to where it wanted to be, so it keeps executing them until it's caught up.

After looking at it, I noticed that your Updater thing uses a Timer.scheduleAtFixedRate, which looks like it might have a behavior similar to what I'm describing.

"If an execution is delayed for any reason (such as garbage collection or other background activity), two or more executions will occur in rapid succession to "catch up." In the long run, the frequency of execution will be exactly the reciprocal of the specified period (assuming the system clock underlying Object.wait(long) is accurate). As a consequence of the above, if the scheduled first time is in the past, then any "missed" executions will be scheduled for immediate "catch up" execution."

That sounds exactly like what's happening.

I agree that we're "missing/skipping frames". However, this happens even 20 minutes after robot startup. What is different about background tasks 20 minutes after a cold boot and 20 minutes after pressing the restart code or reset buttons?

Quote:

Originally Posted by virtuald (Post 1451859)
After thinking some more, one thing that ties into this, is that the RoboRIO doesn't have a battery holding the system's time. So when your RoboRIO starts up it thinks the time is... some time in the past. Maybe an hour, maybe a day, who knows.

However, when the DS connects to your roboRIO, the time is synchronized to the DS's time. So time all of a sudden goes from 0 to 1 billion something.

If your code started before the DS synchronized time, then the fixed time delay mechanism would kick in and try to 'catch up'.

There's a few ways to fix this. One way that pops into my head is only start your updater thing upon the robot being enabled for something. Other solutions involve basing the time delay on something that isn't tied to the robot's datetime... I'm sure there's a source somewhere.

I think you're right. The other code that I used timer.scheduleAtFixedRate() for won't have any issues with running too fast, but the autonomous driving stuff relies on that timing being spot on. The way updater works now is it starts the timer tasks on robot startup, which is unnecessary. We could likely have the updater not call timer.scheduleAtFixedRate() until the first updatable is passed to it, which would be as autonomous mode starts.

Quote:

Originally Posted by MrRoboSteve (Post 1451864)
virtuald makes good points. You could log the actual execution time and easily see whether that is happening.

Has anyone documented the GC settings on the roboRIO? I've wondered, but we're a C++ team and a bit busy right now to go figure it out. -XX:MaxGCPauseMillis would be really useful.

I'd move the autonomous selection logic from disabledPeriodic to autonomousInit().

Can you do your profile generation earlier (e.g., via an instance initializer) and see if that helps?

It looks like you know the size of the list of Elements wrapped by Profile in advance. Consider modifying Profile's constructor to receive an initial capacity, and use that to set the size of the ArrayList<Element>. This will help your perf.

We could figure out the size of the arraylist right now, but if we needed to change the auto mode to avoid colliding with other robots, the length of the list would change. We don't have a spare roboRIO, and our current one is in the bag, so I won't have be able to test any of my changes for a few weeks.

Thanks to everybody for the great advice- I'm really hoping we can fix this.

Mark McLeod 02-03-2015 09:33

Re: Robot Restart and Reset Issues
 
Initial date/time when roboRIO is booted seems to be 2/17/1970, 12:40am.

When the Driver Station connects, the time on the roboRIO will get sync'ed to the DS.
So timestamps may vary accordingly if time is taken before/after the DS connection is established or files are written/created before/after the DS connects after a roboRIO boot

MrRoboSteve 02-03-2015 11:07

Re: Robot Restart and Reset Issues
 
Quote:

Originally Posted by Jared (Post 1451983)
We could figure out the size of the arraylist right now, but if we needed to change the auto mode to avoid colliding with other robots, the length of the list would change. We don't have a spare roboRIO, and our current one is in the bag, so I won't have be able to test any of my changes for a few weeks.

You don't need the exact size of the arraylist when you allocate, as it will be expanded even if you set an initial size. Your goal here is to reduce the number of times the underlying array is grown.

I think this is the current source for ArrayList. Note that the size starts out at 10 and grows by size >> 1. Each time it grows, there's a fully copy of the array made. (see method grow()). In your scenario, I'd set it to an estimate of your maximum size plus a buffer (say 10%).

virtuald 02-03-2015 12:02

Re: Robot Restart and Reset Issues
 
Quote:

Originally Posted by Jared (Post 1451983)
I agree that we're "missing/skipping frames". However, this happens even 20 minutes after robot startup. What is different about background tasks 20 minutes after a cold boot and 20 minutes after pressing the restart code or reset buttons?

On a warm reboot the time won't get reset back to 1970. You indicated above that deploying new code also fixed the problem -- which also won't reset the time back to 1970.

As a workaround, if you hit the 'reset robot code' button on the driver station, that would be equivalent to a redeploy, but much faster than a warm reboot.

Better to find a solution that doesn't use the Timer object, or look at its source code and see if you can override it and provide a different non-datetime dependent time source to it.

Coach#3536 02-03-2015 14:58

Re: Robot Restart and Reset Issues
 
If you are using the smart dashboard and data logging it messes up the field control. We experienced it until the FTA and our team found the backlog dumping and timing out. We suggest that you turn off the logging when playing on the field. It doesn't show up in practice mode and wont show up with a quick connect after start up on field. but if match start is delayed logging compiles and hold until communication start and then floods the system until it overwhelms field control and drops connection. We Followed WPI documents to the letter and it is just the way it is.

Greg McKaskle 02-03-2015 15:44

Re: Robot Restart and Reset Issues
 
Jared, can you find two log files of the two states? One that is the correct procedure with the warm boot and another with the code start.

Post all three files, .dslog, .dsevent, and .pdplog. If it isn't easy to post to CD, PM me and I'll give you a place. I think they may help indicate what ran differently.

Greg McKaskle

Jared 04-03-2015 15:53

Re: Robot Restart and Reset Issues
 
Quote:

Originally Posted by Greg McKaskle (Post 1452286)
Jared, can you find two log files of the two states? One that is the correct procedure with the warm boot and another with the code start.

Post all three files, .dslog, .dsevent, and .pdplog. If it isn't easy to post to CD, PM me and I'll give you a place. I think they may help indicate what ran differently.

Greg McKaskle

Currently, I've only got the log files I've saved to my home computer.

From those, I've found one sequence of events that may help.

The first log file (2015_03_01 13_06_11 Sun) shows the robot connecting while tethered (cold boot), and enabled in teleop for a few seconds to adjust the elevator. There was a significant amount of control lag here, and if we had run autonomous, we would have had a problem. The dsevents file shows that I requested robot code restart at the end.

The next log file is when the code restarts and connects while I was tethered and in line, but is saved at the school computer.

Finally, I have the connection to the field (warm boot), where we play our match and things run well. This is the 2015_03_01 13_23_59 Sun file.

It makes sense that the CPU usage is high, as the controller is attempting to run the PID and drive controllers as fast as possible to catch up.

https://drive.google.com/folderview?...1U&usp=sharing

ozrien 04-03-2015 23:43

Re: Robot Restart and Reset Issues
 
So when you power boot (like in the *_06_11_Sun capture), does it hold CPU +90 % indefinitely? Or does it eventually settle? It sounds like you watched it for 20min. Does it do this everytime? (I can't seem to reproduce it).

Robot setup looks like a bunch of PWMs, a few encoders, PDP on CAN bus (no other CAN devices), and four joysticks (Logitech Attack 3,Logitech Attack 3,Gamepad F310 (Controller),Logitech Dual Action) on the DS right?

ozrien 04-03-2015 23:51

Re: Robot Restart and Reset Issues
 
Does this happen when tethered as well? Usb or Eth?

ozrien 05-03-2015 00:05

Re: Robot Restart and Reset Issues
 
1 Attachment(s)
I tried to reproduce what's in your capture, deplyed, power boot and cycled teleOpEn and teleOpDis. Screenshot attached. The +90CPU seems to drop off quickly, maybe I'm missing something.


All times are GMT -5. The time now is 07:39.

Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi