Log in

View Full Version : The 8.2 (or 8.3) Battery Voltage Bug


eugenebrooks
03-03-2006, 01:48
We are using a combination of Kevin Watson's camera code, using only the sensor reporting, not the servo functionality, and a timer/sensor interrupt to regulate the speed of our ball shooting wheel. We see the camera go off line sporadically, with some of the printf() statements ceasing to appear on the upload line. When this happens, the battery voltage shown on the OI is 8.2 (or 8.3) even though the voltage of the main battery is just fine.

We are backing out of the camera code, which depends on interrupts for the serial line support and our own interrupt based feedback for the ball wheel speed, but are interested in knowing if any other teams have seen this problem. It started off being very sporadic, but for some minor code adjustments can be something that hits almost 100% of the time.

We have integrated the new library code that fixes an interrupt problem
but this did not fix this bug for us.

Eugene

Mark McLeod
03-03-2006, 07:07
We had the same symptoms before our robot went in the box. Our other interrupt lines were detached, only running one timer interrupt and it still occurred.

I was hoping the recompiled libraries would solve that one. We didn't have time with the robot for generic programming much less systems debugging.

We have an upgraded RC returning in the mail either today or Monday just so we can explore the new RC issues. Let us know what you uncover.

DjAlamose
03-03-2006, 07:09
I’m not even sure if this is a "bug". The battery voltage in our batteries when fully charged is usually around 8 for the backup and 13.2 for the main battery. This is caused by the charger and is actually helpful because it makes the battery last longer. Im no big electrical person but i do know that the batteryies will not be exactly what they are rated to.

Al Skierkiewicz
03-03-2006, 07:16
Gene and Mark,
Does the OI report normal battery when not running the code? This is very interesting since the RC is on the hairy edge of reset at 8.2 due to the dropout restrictions of the internal regulator. Are you running the IFI backup battery charger circuit as well? When you measure the main battery are you at the battery or at the RC terminals?

Mark McLeod
03-03-2006, 07:58
Gene and Mark,
Does the OI report normal battery when not running the code? This is very interesting since the RC is on the hairy edge of reset at 8.2 due to the dropout restrictions of the internal regulator. Are you running the IFI backup battery charger circuit as well? When you measure the main battery are you at the battery or at the RC terminals?
All this is by my recollection, not directly from my test notes which are in our Robotics lab, so I may have forgotten some details.

Were not able to comparison test using the default IFI code or other variations. Ours was based on Kevin's streamlined camera code. We were running a mixture of drive train only and shooter/gimbal.

Normal battery voltage ups and downs were displayed on the OI until it suddenly went to an exact display of 8.2v.

We ran with the backup battery trickle charger circuit and charged backup battery. We did disconnected it for one of the tests, but we were looking at the sporadic controls problem and I don't remember if the 8.2v symptom also occurred when it failed. All while drivers and mechanics were simultaneously working out their problems of course. No servos or other drains on the backup battery, we even moved the camera-only power to an analog pin.

Measured main battery voltage at the battery was in the 11 to 12 volt range at the time under a light load (disabled). Under stress and very long running time the battery voltage was being driven down to 6 volts and the RC was transitioning to the backup battery fine.

Gdeaver
03-03-2006, 08:19
The Pic's will keep running below 5 volts unless IFI implemented the voltage watch dog. The RS232 however is very sensitive to voltage drop out witch might explain the printf. Could you have an intermitant power conection to the RC controler? Swap a different breaker too.

Mike Bortfeldt
03-03-2006, 09:51
We had the same issue with the OI displaying an 8.2/8.3 battery voltage (changed by the .1 volts when disabled). It was sporadic for the most part. One of the last times we had the issue, we were able to perform a number of code changes, download and attempt to isolate the problem. It seemed to be related to a particular (custom) code module we were using. We commented out all the code in the module, but left in the file variable list, and still had the issue. One by one, we removed the variables until we removed a specific structure and the problem went away (we had several variables of the same structure and some were still defined, so it wasn't the structure itself). After we thought we identified what was causing the issue, we started adding things back in until we were back to the original state of the module and couldn't recreate the problem (so was it the structure or did it just fix itself?). The only ideas we had at the time was that we shortened the variable section size such that the compiler placed the data in a different memory bank. It sure looked like some of the IFI variables were getting stomped (txdata, rxdata maybe?). After this, we switched to a spare 2004 controller we had and made the slight modifications necessary to get our code to run on that. We had no problems for the last week before ship using the old controller. I was really hoping that the new libraries were going to solve the problem, but it will be hard to tell with it being so intermittent. We have since gotten the 2004 controller upgraded by IFI and will try some testing on that this weekend.

One other item of note: the first thing we did when we saw the 8.2 voltage problem was disable all our user interrupts. We also deleted all the code from the InterruptHandlerLow to ensure that if we still had one active it would generate a red code error condition. It didn't help.

Mike

eugenebrooks
04-03-2006, 02:05
Gene and Mark,
Does the OI report normal battery when not running the code? This is very interesting since the RC is on the hairy edge of reset at 8.2 due to the dropout restrictions of the internal regulator. Are you running the IFI backup battery charger circuit as well? When you measure the main battery are you at the battery or at the RC terminals?

We measured our main battery voltage during one of these events and it was 12.6 volts at the power terminals of the RC.

We are not running the IFI backup charger circuit, we just keep our backup batteries swapped every match. It could very well be the backup battery voltage that is appearing on the user display, but this line of code is not in our program, we are using the user mode display to provide height information from the camera and rpm information, not the backup battery voltage.

As noted by others above, the problem seems related to some memory stomping that is going on and small changes to the code that move things around in memory cause the problem to come and go. We have seen wierdness on this front even when using a non-interrupt open loop code
for all controls. We have a "static int counter" in our main loop, used
to control printing every 40 trips. When we used

if(counter % 40 == 2) { printf(...); }

we lost our prints. When the counter was explicitly initialized
as "static int counter = 0;" the prints came back. The prints would
come and go with the "= 0" in the code to initialize the counter. Either
the compiler is not correctly initializing the static variables that are
not explicitly initialized, or the memory movement that might be caused
by the variable landing in a different segment, the one with explicit inits,
caused a memory stomping problem to go away. Remember, this
was with the new default code and simple open loop feedback control,
no use of timers or interrupts, beyond what is in the default code.

Given this experience, we went through the Kevin's camera code and
explicitly initialized all of the non-array static variables, and the resulting
binary locked up our RC with a programming light on, but it would not
load a program. Holding down the program button for the time period that
is normally takes to get the programming light allowed us to get a
the non-interrupt code back in place.

So, either the compiler or RC is not properly initializing implicitly
initialized static memory, or memory stomping is occuring even
with the default code... We are pretty skittish at changing the
code at this point, given that we need the robot working for matches
at the Portland regional, but given the fault that has appeared with the
static counter in the default code we may spend some time exploring
how this fault is affected by adding more variables before and after
the counter, to move it around in memory.

I would really like to be using the 2005 controller. We have some really
nice automatic aiming and ranging code that uses the camera, but we
can't trust that it will run reliably enough to use it in a match. The interrupt
based code on the 2006 controller would
have to get through an entire Thursday, while making code changes, without any of this kind of faults for us to trust it for a competition match.

Eugene

Mike Bortfeldt
04-03-2006, 19:07
Eugene,

I've been doing some testing on our recently IFI upgraded controller and have run into an interesting situation. It seems that if you set bit 7 of User_Byte3 (User_Byte3 = 128) that you can get the yellow "Low Main Batt." LED on the OI to turn on. I've tried using two completely different code projects with the same result. This definitely doesn't seem right and was wondering if you or anyone else could confirm that this happens on another 2006 controller before I try contacting IFI. (Tethered, no I/O attached, no backup battery)

Mike

chris31
04-03-2006, 20:05
We were getting this problem also. Unfortunantly it cost us 2 matches at the NASA/VCU regional in Richmond as we were unable to move! We didnt have alot of time to debugg the issue but we will post more as we find it out. When we were on tether it worked perfect, but when running through the competition port we had issues. We will post more as we find out.

EDIT: Al Skierkiewicz was asking some questions and we have answers. We were not running the backup battery charger circuit. The OI was reporting a voltage of 8.2, but the low battery indicater LED was not turning on. This stuck as us odd.

EDIT2: Also, this issue only arised after we went back to an older code base that was based on Kevin Watsons streamlined camera code.

Sachiel7
04-03-2006, 20:36
Yes, as Chris31 said above, this bug cost us two whole matches today at VCU.
We were quite frustrated in that we could not isolate the problem.

What the "bug" does is it returns a constant 8.2 volts on the OI, YET, the Low main bettery light does NOT cut on. In addition, the 8.2 volts does not change when you do manage to get systems running (It should drop with current draw).

We had interesting behaviour when our RC was running this bug. First off, our autonomous mode did not run. Second, pwm03 worked properly, but 04, 05, 13, and 15 stopped. Our relay based loader continued to function.

The thing about the bug is that it was very sparotic. It would randomly cut in or out, when you started up the bot, yet I don't think it ever changed while the bot was in operation.

-Sigh-
This bug cost us two whole matches when we finally had our robot and autonomous very functional.

Just to teams about to compete out there, keep your eyes on this one, it will kill matches....

chris31
04-03-2006, 20:45
I want to test our code with a new version of the camera code. Kevin Watson used interupts to handle serial port data, i beilive. And if i recall this is the only space. If i take that code out and dont use any interupts, will this still happen? We will just have to wait and see. Just a warning to all other teams, watch out. Its a match killer.

Sachiel7
04-03-2006, 20:56
Yes, if you want to see some video of this bug in action, watch us (1132) in these matches:
http://soap.circuitrunners.com/2006/movies/virginia/va_071.wmv
http://soap.circuitrunners.com/2006/movies/virginia/va_079.wmv

You might notics a few balls popping weakly out of our bot, because some systems were still working.
Robot reset button and OI Reset button did nothing.
Don't get caught by this guys...

chris31
04-03-2006, 21:00
After having this happen to us i am very interested in making sure that it doesnt happen to other teams. I am working on some code as we speak that would work the camera without the interupts Kevin used. Unfortunantly i do not have the bot, if someone would like to test code for me please PM me or say something here.

Kevin Watson
04-03-2006, 21:13
I want to test our code with a new version of the camera code. Kevin Watson used interupts to handle serial port data, i beilive. And if i recall this is the only space. If i take that code out and dont use any interupts, will this still happen? We will just have to wait and see. Just a warning to all other teams, watch out. Its a match killer.Just FYI, the camera code uses a serial port driver that teams have been using for over a year now without a problem. The camera, once initialized, only sends a few hundred bytes of information per second, which is easily handled by the PIC microcontroller.

-Kevin

gobeavs
04-03-2006, 21:16
That bug hit us once and disabled us for a round at PNW, but we have found a way to avoid it. When we turn our robot on at the beginning of the round we have someone at the OI looking at the battery voltage to give us a thumbs up or down depending on whether it is showing ~12 or 8.2. If we get 8.2 we just power cycle (not just reset, because that screws up the camera). It saved us at least once later on.

chris31
04-03-2006, 21:22
Thank you for the information. Outside of the interupt driven code you wrote we did not use any interupts this year. I am trying to debug the issue to save teams from having this happen to them but all i have is info from debugging today, the last day of our regional, the code we used, and some video of the bot. IFI has recognized a problem with the interupts and that was why i was quick to jump to that conclusion. I wish i had the RC with me to test more but i dont. I havent been able to find out why this would just start happening. The code we were running when we got this error was your camera code with a modified default routine. Sachiel7 can confirm that for us. Sachiel7, if you have any more info please tell us.

EDIT: gobeavs, we have had that work also. That is a temporary solution. It doesnt solve the root of the issue which is what we would really like to happen. Can you supply more info on what the code you were running was.

chris31
04-03-2006, 21:27
I will continue to be a "Graciuos Professionalist" and not make any comments. Chief Delphi is a great community and flaming rarely occurs. Please do not make comments like that to me and others that spend our time trying to help others.

EDIT: D.J. Fluck, thank you for removing his post.

Mike Bortfeldt
04-03-2006, 21:28
I was just lucky (or unlucky) enough that this bug kicked in while I was bench testing. It seems that the 8.2 battery voltage is really the reading of the p4_wheel. As you move the joystick 4 wheel, the OI battery voltage changes. The p3_wheel is mapped to the backup battery voltage (as shown from the data in the dashboard). the p1_x axis now maps to the joystick #3 switches. I can't say exactly what works and what doesn't as I was just reading what I was sending back from the RC to the OI from the dashboard and didn't really map all the joystick data to the user_bytes/pwm values. A reset doesn't work, but a full power down did (although I know that it doesn't always from previous testing). I do know that one of the joystick ports seemed to be mapped to the "disable" flag (not sure which one), so it's quite likely that the "autonomous" flag was also messed up.

Mike

Correction - Rather than the p3_wheel & p4_wheel, it is actually the p3_aux and p4_aux.

chris31
04-03-2006, 21:31
I was just lucky (or unlucky) enough that this bug kicked in while I was bench testing. It seems that the 8.2 battery voltage is really the reading of the p4_wheel. As you move the joystick 4 wheel, the OI battery voltage changes. The p3_wheel is mapped to the backup battery voltage (as shown from the data in the dashboard). the p1_x axis now maps to the joystick #3 switches. I can't say exactly what works and what doesn't as I was just reading what I was sending back from the RC to the OI from the dashboard and didn't really map all the joystick data to the user_bytes/pwm values. A reset doesn't work, but a full power down did (although I know that it doesn't always from previous testing). I do know that one of the joystick ports seemed to be mapped to the "disable" flag (not sure which one), so it's quite likely that the "autonomous" flag was also messed up.

Mike

Interesting, Ill have a look. We decided early today to use are port 4 wheel for launcher wheel PWM. This could be part of the problem. Ill try and have that check out.

chris31
04-03-2006, 21:34
Uncalled for noobs

Maybe im just a little stressed from competition today, but you just joined today and made 5 rude posts that are bettering the community in anyway. If you dont stop i exspect to see an IP ban soon.

Sachiel7
04-03-2006, 22:15
To get back on topic real quick, has anyone had this error with the v2 Camera code?
Just wondering....
We used v1 because for some reason were having a few issues getting v2 going at comps, and reverted to the streamlined v1, because that's what we had working.

gobeavs
04-03-2006, 22:29
EDIT: gobeavs, we have had that work also. That is a temporary solution. It doesnt solve the root of the issue which is what we would really like to happen. Can you supply more info on what the code you were running was.
We are running code based off of Kevin's camera code, I think v1.

Kevin Watson
04-03-2006, 23:25
To get back on topic real quick, has anyone had this error with the v2 Camera code?
Just wondering....
We used v1 because for some reason were having a few issues getting v2 going at comps, and reverted to the streamlined v1, because that's what we had working.What issues did you have with version two?

-Kevin

eugenebrooks
04-03-2006, 23:45
Just FYI, the camera code uses a serial port driver that teams have been using for over a year now without a problem. The camera, once initialized, only sends a few hundred bytes of information per second, which is easily handled by the PIC microcontroller.

-Kevin

The problem appears to be some form of memory stomping, in the IFI default
code, or in the controller chip itself. I don't believe that it is a bug in the
camera code per-se, although the use of the camera code increases the
severity of the problem. We backed out of all of our interrupt/timer based
control at the Portland regional in response to this problem as there was
no way we could keep our robot running otherwise. We have seen problems
in very simple code. Below are the only lines of custom code, using
the recently published default code.

At one point whether or not the periodic printf statements appeared on
he download console depended upon whether or not we had
static int counter;
or
static int counter = 0;
for the declaration of "counter". This problem comes and goes over
a day, in the morning it is consistent but when attempting to reproduce
it with the same code in the afternoon one can't after several tries.
I am presuming that the memory location
of "counter" moves between the implicitly initialized segment, from the
explicitly initialized segment, with the declaration change and this changed
the arrangement of memory. Whether or not memory is actually getting
stomped is something a little more random in the controller.

An option question is, given the problems with the 2006 controller,
whether or not FIRST would be willing to allow the use of the 2005
controller at regional events. We have not seen any of these problems
on the 2005 controller.

Eugene



/************************************************** *****************************
* FUNCTION NAME: Process_Data_From_Master_uP
* PURPOSE: Executes every 26.2ms when it gets new data from the master
* microprocessor.
* CALLED FROM: main.c
* ARGUMENTS: none
* RETURNS: void
************************************************** *****************************/

struct rpmpower {
int rpm;
int power;
};
/* This table is constructed by using the manual RPM control to set
RPM values manually and then reading off the required power for that RPM.
*/
#define RPMPOWERTABLESIZE 9
struct rpmpower rpmpowertable[RPMPOWERTABLESIZE] = {
1300, 160,
1400, 161,
1500, 167,
1600, 168,
1700, 170,
1800, 177,
1900, 181,
2000, 189,
2100, 193
};
int RPM2Power(int rpm) {
int RetPower;
int i;
if(rpm <= rpmpowertable[0].rpm) return rpmpowertable[0].power;
if(rpm >= rpmpowertable[RPMPOWERTABLESIZE-1].rpm) return rpmpowertable[RPMPOWERTABLESIZE-1].power;
for(i=0; i<(RPMPOWERTABLESIZE-1); i+=1){
if(rpmpowertable[i].rpm < rpm && rpmpowertable[i+1].rpm >= rpm){
RetPower = rpmpowertable[i].power + ((rpm - rpmpowertable[i].rpm) * (rpmpowertable[i+1].power - rpmpowertable[i].power)) / (rpmpowertable[i+1].rpm - rpmpowertable[i].rpm);
break;
}
}
return RetPower;
}


void Process_Data_From_Master_uP(void)
{
int Target;
int BallWheelPower;
static int counter; /* Does not work if not initialized to zero! */

Getdata(&rxdata);

/* Control of the drive motors.
*/
if(rc_dig_in08 == 1) {
pwm01 = p1_y;
pwm02 = p3_y;
pwm03 = p1_y;
pwm04 = p3_y;
}
else {
pwm01 = STOP;
pwm02 = STOP;
pwm03 = STOP;
pwm04 = STOP;
}

/* Pan Control
*/
if(p3_sw_trig == 1 && rc_dig_in12 == 0) {
pwm06 = STOP + 20;
}
else if(p2_sw_aux2 == 1 && rc_dig_in11 == 0){
pwm06 = STOP - 12;
}
else{
pwm06 = STOP;
}
if(rc_dig_in06 == 0){
pwm06 = STOP;
}

/* Ball lift control.
*/
if(p2_sw_top == 1 && rc_dig_in07 == 1) {
pwm05 = 254;
}
else if (p2_sw_trig == 1 && rc_dig_in07 == 1){
pwm05 = 1;
}
else {
pwm05 = STOP;
}

/* Ball wheel speed control.
*/
Target = (int)1200 + (int)3 * (int)p4_x;
BallWheelPower = RPM2Power(Target);
if(rc_dig_in10 == 0) {
pwm07 = STOP;
}
else {
pwm07 = BallWheelPower;
}

/* Ball shooter fire control
*/
if(p2_sw_aux1 == 1 && rc_dig_in09 == 1) {
pwm08 = 254;
}
else {
pwm08 = STOP;
}


if(counter % 40 == 2) {
printf("p1_y = %d, p3_y = %d, BWP = %d, TargetRPM = %d, p4_x = %d\r",
(int)p1_y, (int)p3_y, (int)BallWheelPower, (int)Target, (int)p4_x);
}

counter += 1;

if (user_display_mode == 0) {
if(rc_dig_in12 == 1) { /* Right pan limit. */
Switch1_LED = 1;
}
else {
Switch1_LED = 0;
}
if(rc_dig_in11 == 1) { /* Left pan limit. */
Switch2_LED = 1;
}
else {
Switch2_LED = 0;
}
}
else {
User_Mode_byte = backup_voltage * 10;
}

Generate_Pwms(pwm13,pwm14,pwm15,pwm16);

Putdata(&txdata); /* DO NOT CHANGE! */
}

chris31
04-03-2006, 23:54
What issues did you have with version two?

-Kevin

Not really any issues. v2 is only for the bells and whistles. Which as it names syas has alot of features and we were using teh features so the code was a waste and took awhile to put on the RC. So we just went back to useing the streamline version.

devicenull
04-03-2006, 23:56
What happens if you move the "static int counter" outside of the Process_Data function?

Also, you may want to add the "rom" keyword to the rpmpowertable statement. It then becomes one less thing that could be overwriting other variables (As it would be stored with the actual program, rather then other variables)

kaszeta
05-03-2006, 14:09
We are using a combination of Kevin Watson's camera code, using only the sensor reporting, not the servo functionality, and a timer/sensor interrupt to regulate the speed of our ball shooting wheel. We see the camera go off line sporadically, with some of the printf() statements ceasing to appear on the upload line. When this happens, the battery voltage shown on the OI is 8.2 (or 8.3) even though the voltage of the main battery is just fine.

Our robot died three times during autonomous at BAE, and this was one of the symptoms at the time (main battery voltage reading as 7.75 volts on the OI). Eventually, the IFI rep requested that we (a) double-check our code (see other thread), and (b) use a loaner RC.

Interestingly, when I was double-checking the code after the first autonomous failure (still on our original 2006 RC), one of the first things I did is revert to a code version that doesn't use any of the camera code (but does use Kevin's ADC code), and we still had this problem. We also saw it on one of the loaner RC's (an upgraded 2004) as well, using both our code and the most recent default code to which we had added a single printf() reporting the battery voltages. The final loaner RC didn't show any problems .

So I've seen this problem on both our original 2006 RC *without* the camera code (using a fresh copy of Kevin's Gyro code with our default routine and autonomous added), and even on one loaner RC with default code (although other teams had reported problems with this loaner RC). So I'm guessing it's something with the interrupt handler (since we're using Timer 2, and the two sets of serials interrupts)? The serial code is common between the camera code and the gyro code.

I'm wondering if there is some sort of memory stomping happening, because I've also seen some random bits flicking on the OI status leds (the pwm and relay leds; we had disabled the default routine LED feedback on these to make a little bar graph showing how good the camera lock was, but after we pulled the camera I pulled the code for this as well, and we still would see random bits flicker in there).

I'm interested in solving this, since we had three matches that we lost basically since we died during autonomous (and were non-functional the rest of the match), and a fourth in which we competed with a 100% dead bot (since we pulled the RC to replace it with a loaner, but the first loaner was DOA, and we didn't have time to put the original RC back in). We still got five points with our bot that round (an alliance bot pushed us up on the platform...), but considering that we have a really good ball launcher, it rather stank that we couldn't run most of the time.

Unlike most teams, I now actually have the 2006 RC in my possession (a replacement is coming from IFI), although it's going to IFI this week once I get an RMA. Anything I should try checking with it?

In any case, if someone would like a copy of our code (that showed the voltage bug on two different RCs), or has some ideas, I'd like to know.

Kevin Watson
05-03-2006, 14:31
...We also saw it on one of the loaner RC's (an upgraded 2004) as well, using both our code and the most recent default code to which we had added a single printf() reporting the battery voltages. The final loaner RC didn't show any problems...

So I've seen this problem on both our original 2006 RC *without* the camera code (using a fresh copy of Kevin's Gyro code with our default routine and autonomous added), and even on one loaner RC with default code (although other teams had reported problems with this loaner RC). So I'm guessing it's something with the interrupt handler (since we're using Timer 2, and the two sets of serials interrupts)? The serial code is common between the camera code and the gyro code.When you say "default code", do you mean the default code from IFI's website or do you mean code from my website?

-Kevin

kaszeta
05-03-2006, 14:55
When you say "default code", do you mean the default code from IFI's website or do you mean code from my website?


A fresh copy that I downloaded from IFI's website during the competition (http://www.ifirobotics.com/docs/frc-code-2-28-2006.zip). However, that RC was suspect (every once in a while the printf's would produce garbage).

Keith Watson
05-03-2006, 16:02
...Either the compiler is not correctly initializing the static variables that are not explicitly initialized, or the memory movement that might be caused by the variable landing in a different segment, the one with explicit inits, caused a memory stomping problem to go away.The compiler creates a section of code in the binary where all of the static initializers are called. I have always assumed any sort of processor reset would call this block of code before calling main().

That bug hit us once and disabled us for a round at PNW, but we have found a way to avoid it. When we turn our robot on at the beginning of the round we have someone at the OI looking at the battery voltage to give us a thumbs up or down depending on whether it is showing ~12 or 8.2. If we get 8.2 we just power cycle (not just reset, because that screws up the camera). It saved us at least once later on.We encountered behavior where the symptom is that the static initializers are not being called by a reset under certain conditions. See the recent thread Camera does not search in autonomous mode after reset (http://www.chiefdelphi.com/forums/showthread.php?t=44975) for a full description of those conditions.

Mike Bortfeldt
05-03-2006, 21:00
I've been able to successfully cause the 8.2 battery voltage bug to occur on demand by sequentially downloading 2 different versions of code (both based off the default code). I haven't narrowed down the exact cause, but I will see what I can find out tonight and call IFI tomorrow. With a way to reproduce the problem, they will hopefully be able to come up with a fix or workaround.

Mike

Ryan Meador
06-03-2006, 00:42
I have some information that may be relevant. There is a problem with the silicon in the PIC chip that involves the interrupts not properly restoring register values upon return. I'm not sure if this could cause the problems you're all reporting, but the possibility exists - it may be this problem that the new libraries are supposed to fix.

You can read about this and other errors in the silicon of this line of chips here: http://ww1.microchip.com/downloads/en/devicedoc/80221b.pdf

kaszeta
06-03-2006, 08:10
I have some information that may be relevant. There is a problem with the silicon in the PIC chip that involves the interrupts not properly restoring register values upon return. I'm not sure if this could cause the problems you're all reporting, but the possibility exists - it may be this problem that the new libraries are supposed to fix.

At least with Team 95's code (which does use an interrupt handler, but just for the Timer2 and ADC), the new libraries don't fix this problem.

chris31
06-03-2006, 09:56
The new libraries didnt fix are problem. And the only interupts used are with Kevin's code.

steven114
06-03-2006, 11:02
We had this problem too - our workaround was to stop using structures. It seemed that even having a simple struct of a few integers would cause rampant corruption and crashing. I'd suggest that as something to try if you're at your wit's end...

kaszeta
06-03-2006, 11:03
The new libraries didnt fix are problem. And the only interupts used are with Kevin's code.

I just talked with IFI (and was talking to the same IFI rep that was at BAE), and they are looking at the issue and are going to call me back.

Kevin Watson
06-03-2006, 11:08
The new libraries didnt fix are problem. And the only interupts used are with Kevin's code.Just FYI, IFI uses the high-priority interrupt for SPI communications between the master and slave processors. You just can't get away from those pesky interrupts <grin>.

-Kevin

Sachiel7
06-03-2006, 12:16
We were using a struct too, in our auto routines....
Has everyone here with this problem used a struct? It'd be funny if it was something as simple as that, but just checking...

eugenebrooks
06-03-2006, 12:43
We were using a struct too, in our auto routines....
Has everyone here with this problem used a struct? It'd be funny if it was something as simple as that, but just checking...

We were using a struct as well, but I believe that the IFI code
also uses structs. Remember that in an earlier post in this thread
the problem has been demonstrated to occur with the default code.

The problem comes and goes with small changes to your code, so
things like "getting away from structs" might make the problem go,
but it will come back later. Unless there is a decisive cure, the only
way out is the 2005 controller and a squeeze to fit if your code has
gotten too big...

Eugene

yoyodyne
06-03-2006, 12:46
We were using a struct too, in our auto routines....
Has everyone here with this problem used a struct? It'd be funny if it was something as simple as that, but just checking...

We had this problem crop up at the VCU regional on Friday and it almost cost us a number of matches. Each time, we reverted to some version of old code and at one point made just a few mods to the default code just to get operator control of the drive motors and the herder.

Our code:

Uses a lot of structures with mixed char, int, and long types.

Uses Kevin Watson's serial port code.

Uses Timer 4 interrupt for a 5ms real time clock

Uses Kevin Watson's ADC with 5 analog channels each with a 200Hz rate for a 1KHz timer 2 interrupt

Points to structures and calls routines through pointers

Uses two additional shaft encoder interrupts

Assembles a lot of instrumentation data and sends that to the OI user variables, LED variables, and unused PWM outputs for data logging

Process_data_from_local_IO always runs at least every 10ms - we set a flag if two or more 5ms timer ticks have occured so I don't think this is an excessive loading issue. A printf with multiple arguments will cause this dely, however.

When we removed support for the shaft encoders and yanked out all the code that was not absolutely necessary toward the end of the qualification rounds on Sat the problem went away but it is clear to me that it will come back. Code that was "bad" causing the 8.2 volt problem on Friday night worked first thing Sat morning! (Power was removed for an extended period, however the backup battery was connected overnight) Tried to re-load the same code and it started failing again.

My instincts and experience with these problems is leading me to take a good look at the map file to see if there is something fishy going on with data section allocation. The problem is is that I don't have a "known good" hex/map file to use as a reference. We had no problems until Friday and I was wondering if it was the new IFI libraries but I can see from this thread that the problem can occur with the old libraries.

I posted more details to IFI - waiting for a reply.

If all else fails, I am inclined to trim the code to fit in the 2005 controller - not sure if it is legal to use it yet?

Greg

kaszeta
06-03-2006, 12:49
We were using a struct too, in our auto routines....
Has everyone here with this problem used a struct? It'd be funny if it was something as simple as that, but just checking...

No structs here (well, it should've used them, but our team's students are just learning about those things).

kaszeta
06-03-2006, 12:51
We were using a struct as well, but I believe that the IFI code
also uses structs.

Indeed, take a look at ifi-aliases.h and the pic18 header files and you'll see that most everything you use for variables provided by IFI is a struct.

kaszeta
06-03-2006, 12:55
Our code:

Uses a lot of structures with mixed char, int, and long types.

Uses Kevin Watson's serial port code.

Uses Timer 4 interrupt for a 5ms real time clock

Uses Kevin Watson's ADC with 5 analog channels each with a 200Hz rate for a 1KHz timer 2 interrupt

Points to structures and calls routines through pointers

Uses two additional shaft encoder interrupts

Assembles a lot of instrumentation data and sends that to the OI user variables, LED variables, and unused PWM outputs for data logging


Our original code was very complex, but by competition we had narrowed it down to code that


Uses Kevin's gyro code (which includes his serial code and adc code)
Uses Timer2 and the ADC interrupt
Uses two analog channels (the gyro and a selector switch for autonomous strategy)
No shaft encoders, weird pointers, or anything like that.


(although I like the idea of using a timer to make sure that you aren't bogging down on the loop execution)

yoyodyne
06-03-2006, 13:30
The compiler creates a section of code in the binary where all of the static initializers are called. I have always assumed any sort of processor reset would call this block of code before calling main().

We encountered behavior where the symptom is that the static initializers are not being called by a reset under certain conditions. See the recent thread Camera does not search in autonomous mode after reset (http://www.chiefdelphi.com/forums/showthread.php?t=44975) for a full description of those conditions.

We don't initialize any variables in the code, but instead in each file have an explicit initialize routine such as void InitializeRealTimeClock(void) that performs the initialization. If you write code this way, you don't have to worry about what the loader will or will not do to initialize bss and data. We put all the init calls that were used for autonomous just above the inner loop as well just so we could toggle the competition port pin to test the autonomous routine over and over without hitting reset. - Didn't help with the "8.2" problem though.

Eldarion
06-03-2006, 13:32
I'm wondering if there is some sort of memory stomping happening, because I've also seen some random bits flicking on the OI status leds (the pwm and relay leds; we had disabled the default routine LED feedback on these to make a little bar graph showing how good the camera lock was, but after we pulled the camera I pulled the code for this as well, and we still would see random bits flicker in there).
I had the intermittent data "glitches" with the LEDs as well; the cure was to save sections .tmpdata and MATH_DATA as well as the default PROD in the interrupt handler #pragma. For example:
#pragma interruptlow InterruptHandlerLow save=PROD,section("MATH_DATA"),section(".tmpdata")

Also, this may be a stupid question, but is everyone here using the latest version of the IFI loader? The previous version had a problem writing to FLASH bank 1, which might explain why it works just fine as long as it can fit in the 2005 controller. :)

EDIT: What version of the compiler is everyone using?

Kevin Watson
06-03-2006, 14:23
...What version of the compiler is everyone using?You beat me to this question. I'm using 2.44 (version can be found in readme.c18) and have seen *none* of the problems teams are seeing. All of the code on my website has been compiled with the 2.44 version. Has anyone had a problem with the pre-compiled code in my .zip files?

-Kevin

kaszeta
06-03-2006, 14:28
You beat me to this question. I'm using 2.44 (version can be found in readme.c18) and have seen *none* of the problems teams are seeing. All of the code on my website has been compiled with the 2.44 version. Has anyone had a problem with the pre-compiled code in my .zip files?

Yes. I can regenerate the 8.2V problem with the gyro.hex file from a freshly-downloaded frc_gyro.zip from http://www.kevin.org/frc/, although it doesn't do it every time I upload it.

So far, the IFI folks think that ADC_Int_Handler() is too long, but I'm not sure I agree (since I'm not seeing the RLOD, and it works flawlessly on my 2004 RC).

Oh, and to answer your question, I'm using mcc18 v2.40. (Is an update available to us?)

Sachiel7
06-03-2006, 14:50
Yes, we are using v 2.40 as well, this is the version that was supplied in the kit. Did we miss an update? I don't recall seeing anything come across the board...
Perhaps that is the root of our issues,
From the IFI Website:

NOTE: To Compile 2004, 2005 or 2006 code, use MPLAB ver 7.20 and C18 Compiler ver 2.40 (newer versions can not be used)

Kevin...you didnt give us base code compiled with 2.44 did you...? :rolleyes:
Even if so, we do appreciate all the hard work you do to provide us with the code each year, so Thank You! :)

Eldarion
06-03-2006, 15:07
You beat me to this question. I'm using 2.44 (version can be found in readme.c18) and have seen *none* of the problems teams are seeing. All of the code on my website has been compiled with the 2.44 version. Has anyone had a problem with the pre-compiled code in my .zip files?

-Kevin
Hmm, I am using 2.40. I wonder if the Microchip folks know about the problem, and have fixed it in 2.44? Where did you get 2.44?

Here's an idea for someone with access to a 2006 controller:
Try compiling your code for the PIC18F8520 instead of the PIC18F8722 and see if that cures the problem. It'll still run on the '8520, but I wonder if it might fix some of the problems (if it is truly a compiler bug, maybe it is restricted to the '8722 compiler). Just a hunch... :)

yoyodyne
06-03-2006, 16:10
You beat me to this question. I'm using 2.44 (version can be found in readme.c18) and have seen *none* of the problems teams are seeing. All of the code on my website has been compiled with the 2.44 version. Has anyone had a problem with the pre-compiled code in my .zip files?

-Kevin

Kevin, we are using v2.40 17 November 2004. I didn't think we were allowed to use any newer versions. Saturday morning I was showing Dave Lavery the problem at the VCU regional and he went over and talked to the "IFI guy" who said that it was not possible for the user processor code to corrupt the battery voltage reading or the RC to OI comms. And that makes sense to me unless memory is shared between the processors or the MP passes the rxdata structure transparently or something like that. Is there any documentation on the interaction between the user processor and the master processor? The only thing I know is the tx and rxdata structures and the format of the dashboard data which provides clues as to what the OTA format between them is. Our one and only 2006 controller is locked in a crate awaiting the Atlanta regional so I can't do any testing but I do have code that parses through the dashboard output and generates an output file I can suck into excel for analysis. I would be interested in what actually comes out of the dashboard port when this is happening to see if data is skewed by a byte or two. I know when we have this problem, control input that should effect PWM5 turns actually gets coupled to PWM1 or 2 (they used to drive the same motor pair) I can give more details if anyone is interested.

Greg

Eldarion
06-03-2006, 16:13
Thanks to the Wayback Machine (www.archive.org), I was able to find the 2.42 upgrade file here:
http://ww1.microchip.com/downloads/en/DeviceDoc/C18_242.zip

Aha! I found the 2.44 upgrade file here:
http://ww1.microchip.com/downloads/en/devicedoc/mplab-c18-upgrade-v2_44.exe

I found this interesting section in the Readme (emphasis mine):
"The following parts have updates and/or corrections:
18F6520, 18F6620, 18F8520, and 18F8620"

Also, it makes mention of the 18F8722 having a problem or two... :)

Eldarion
06-03-2006, 16:24
Kevin, we are using v2.40 17 November 2004. I didn't think we were allowed to use any newer versions. Saturday morning I was showing Dave Lavery the problem at the VCU regional and he went over and talked to the "IFI guy" who said that it was not possible for the user processor code to corrupt the battery voltage reading or the RC to OI comms. And that makes sense to me unless memory is shared between the processors or the MP passes the rxdata structure transparently or something like that. Is there any documentation on the interaction between the user processor and the master processor? The only thing I know is the tx and rxdata structures and the format of the dashboard data which provides clues as to what the OTA format between them is. Our one and only 2006 controller is locked in a crate awaiting the Atlanta regional so I can't do any testing but I do have code that parses through the dashboard output and generates an output file I can suck into excel for analysis. I would be interested in what actually comes out of the dashboard port when this is happening to see if data is skewed by a byte or two. I know when we have this problem, control input that should effect PWM5 turns actually gets coupled to PWM1 or 2 (they used to drive the same motor pair) I can give more details if anyone is interested.

Greg
That makes sense, however why didn't we see this last year if the only change was the user processor? :confused:

Has anyone tried a master update in case some of the RCs have a corrupted master firmware?

kaszeta
06-03-2006, 16:51
Kevin, we are using v2.40 17 November 2004. I didn't think we were allowed to use any newer versions. Saturday morning I was showing Dave Lavery the problem at the VCU regional and he went over and talked to the "IFI guy" who said that it was not possible for the user processor code to corrupt the battery voltage reading or the RC to OI comms.

At the BAE regional, the IFI person (Corey) at first thought we had a user code bug, but later said that he was of the belief that something was wrong with the RC or the Master CPU code, for the same reasons (user processor code shouldn't be able to corrupt the comms or battery voltage readings). I've given them my code, FWIW. I haven't gotten the 8.2V bug to show up on our 2006 RC with the pure default code dated 2/24 (but I *did* get this code to show the problem with a loaner RC at BAE), but using gyro.hex from frc_gyro.zip I can get this to show up simply by loading the hex code a few times.

yoyodyne
06-03-2006, 17:06
At the BAE regional, the IFI person (Corey) at first thought we had a user code bug, but later said that he was of the belief that something was wrong with the RC or the Master CPU code, for the same reasons (user processor code shouldn't be able to corrupt the comms or battery voltage readings). I've given them my code, FWIW. I haven't gotten the 8.2V bug to show up on our 2006 RC with the pure default code dated 2/24 (but I *did* get this code to show the problem with a loaner RC at BAE), but using gyro.hex from frc_gyro.zip I can get this to show up simply by loading the hex code a few times.

If I had the 2006 processor I would reload the master processor bin file. So many things that I normally would have done to narrow this problem down did not happen in the heat of battle.

I asked IFI if they knew what master processor version was shipped but I haven't gotten a reply yet.

Part of post to IFI new problem under gsmith last night...
*******
I ran into a problem at the VCU regional starting last Friday where the communications between the OI and RC were getting corrupted. It looks like the data structure between the master processor and the OI is shifted by a byte or so. We are using the 2006 RC and OI received in the FIRST KOP. We have not re-flashed the master processor and I don’t know what firmware revision it is although I noticed that the date of the Frc_master_v12.bin file, the latest?, is 10/27/2005 so I assume we have the latest master code.

*****

Kevin Watson
06-03-2006, 18:01
To Compile 2004, 2005 or 2006 code, use MPLAB ver 7.20 and C18 Compiler ver 2.40 (newer versions can not be used)They're referring to version 3.0 of the compiler, which cannot be used for a few different reasons.

Kevin...you didnt give us base code compiled with 2.44 did you...? :rolleyes: Yep, I certainly did. It's not a problem.

Even if so, we do appreciate all the hard work you do to provide us with the code each year, so Thank You! :)As long as you learn something from reading/using the code, it's my pleasure.

-Kevin

Eldarion
06-03-2006, 18:18
*******
I ran into a problem at the VCU regional starting last Friday where the communications between the OI and RC were getting corrupted. It looks like the data structure between the master processor and the OI is shifted by a byte or so. We are using the 2006 RC and OI received in the FIRST KOP. We have not re-flashed the master processor and I don’t know what firmware revision it is although I noticed that the date of the Frc_master_v12.bin file, the latest?, is 10/27/2005 so I assume we have the latest master code.

*****
Where did you find the Version 12 Update file? The latest on IFI's website is Version 11, dated 4-15-2005.

devicenull
06-03-2006, 18:27
I'm using 2.40 and I haven't seen any of the problems stated here.. The real test will come on Thursday, when I see if any of these problems have appeared since we put it in the box...

I'm going to check tonight to see if we have a spare 2005/2004 RC that we can bring, but as of the current FIRST rules, they are not allowed. Maybe someone should go post on the Q&A forums, explaining the situation.. maybe we could get the 2005 RC cleared to use until this problem is resolved (It DOES work with the field control system, we were using it at the scrimmage with no problems) It wouldn't even be a tight fit for me.. 55% usage on the 2005 controller :)


Someone with access to a 2006 RC: Disable compiler optimizations... Project->Build Options->Project->MPLAB C18->Optimization (Dropdown), Disable.

If that fixes it, then its a compiler bug (But it can be worked around!)

I can tell you right now I've had all the optimizations enabled since the second week of build, and haven't seen this problem.

yoyodyne
06-03-2006, 18:37
Where did you find the Version 12 Update file? The latest on IFI's website is Version 11, dated 4-15-2005.

It's in the

"2006 RC Code (zip,21-28-2006) Contains both Default/User and Master Code, contains new libraries below."

Zip file.

Eldarion
06-03-2006, 18:45
It's in the

"2006 RC Code (zip,21-28-2006) Contains both Default/User and Master Code, contains new libraries below."

Zip file.
Thanks! :)

Bharat Nain
06-03-2006, 19:10
I have a 2004 upgraded to a 2006 RC sitting at my house. I will try to get a battery and try to reproduce the results. Since we were also having similar problems through the day, I would be glad to test out anything suspectible. Please get in contact with me if you think you have a possible solution to this problem or a way to get there. I never spent time at NJ trying to correct the bug. I just replaced the battery. Thanks
-Bharat

Eldarion
06-03-2006, 19:17
I have a 2004 upgraded to a 2006 RC sitting at my house. I will try to get a battery and try to reproduce the results. Since we were also having similar problems through the day, I would be glad to test out anything suspectible. Please get in contact with me if you think you have a possible solution to this problem or a way to get there. I never spent time at NJ trying to correct the bug. I just replaced the battery. Thanks
-Bharat
Try downgrading your controller to the v11 master code. I'm wondering if something got messed up in the v12 master code.

Eldarion
06-03-2006, 20:56
OK, I might have figured this out:
Has anyone noticed if the red "Code Error" LED on the OI is steadily on during the 8.3-volt bug?
I was doing some experimenting with the 2005 RC upgraded to v12 of the master firmware. Taking the advice of another person in this thread, I tried downloading frc_gyro.hex to the controller, at which point I was greeted with the 8.3V display and a solid red Code Error light. The interesting thing about this was that the Program State LED on the RC was steadily orange and the RC was ready to accept programming? :confused:

On power off / power on, the RC errored with a BLROD. I tried recompiling frc_gyro for the 18F8520, and it worked fine.

My thought is that maybe the 8.3V display is one manifestation of the sequence that is supposed to trigger a BLROD, halt the user processor, and kill all the PWMs, and it is simply displaying the 8.3V without doing the rest? I am guessing that user program data corruption (similar to what would happen trying to run an 8722 file on an 8520 chip) is fooling the master processor into this "half-disabled" mode.

Unfortuately, without a 2006 controller at my diposal, this is all speculation. :)
I just found it interesting that it is possible to trigger the same 8.3V bug on the 2004 / 2005 RC as well.

kaszeta
06-03-2006, 21:05
I'm using 2.40 and I haven't seen any of the problems stated here..

And neither did we, with almost 100 hours of autonomous mode testing in our clubhouse. At BAE we were batting 50%.


I'm going to check tonight to see if we have a spare 2005/2004 RC that we can bring, but as of the current FIRST rules, they are not allowed. Under the circumstances, you can probably get a ruling on the field. They originally gave us trouble last year about subbing a 2004 RC for our 2005 when we broke a terminal off of it, and they eventually calmed down.

kaszeta
06-03-2006, 21:10
OK, I might have figured this out:
Has anyone noticed if the red "Code Error" LED on the OI is steadily on during the 8.3-volt bug?

For us, the Code Error light WAS NOT on. The Radio light showed an error, and the Team LEDs quit blinking, but the Code Error light was the first thing we and Corey from IFI (who did a good job peering at dead bots on the field trying to debug them) checked.


I was doing some experimenting with the 2005 RC upgraded to v12 of the master firmware. Taking the advice of another person in this thread, I tried downloading frc_gyro.hex to the controller, at which point I was greeted with the 8.3V display and a solid red Code Error light. The interesting thing about this was that the Program State LED on the RC was steadily orange and the RC was ready to accept programming? :confused:

Odd. I haven't been able to replicate this on a 2005 RC, but I'll try upgrading to v12 firmware.


My thought is that maybe the 8.3V display is one manifestation of the sequence that is supposed to trigger a BLROD, halt the user processor, and kill all the PWMs, and it is simply displaying the 8.3V without doing the rest?

I'll note that you can see the 8.3V problem with code that otherwise appears to work correctly.

When I last talked with IFI, their guy was fairly convinced it was something in the ADC interrupt handler (which has a lot of code for an interrupt handler)

Kevin Watson
06-03-2006, 22:16
When I last talked with IFI, their guy was fairly convinced it was something in the ADC interrupt handler (which has a lot of code for an interrupt handler)Um, I'll eat the cardboard box that housed the last dozen Krispy Kremes that Dave Lavery polished-off if this is found to be true :).

-Kevin

Mark McLeod
06-03-2006, 23:37
It happens to us without the ADC code.

dlavery
06-03-2006, 23:47
Um, I'll eat the cardboard box that housed the last dozen Krispy Kremes that Dave Lavery polished-off if this is found to be true :).

-Kevin

Kevin -
Just as a precaution, I really hope you are hungry. We experienced the "8.2V problem" over the weekend at the VCU regional. Ricky Torrance from IFI was there, and we reviewed the problem with him. We were not able to confirm it in the time we had to work the problem, but issues with overwhelming the interrupt handler was one of the two plausible explanations we were able to identify (the other was a memory overrun situation into a specific unprotected memory space that affected the ADC lines).

At the time, we only had that one known case of the problem to give to IFI as an example, which limited the options for investigation of the problem. As of this afternoon, IFI is aware that the problem may be larger, and is looking in to it. We have pointed them to this thread for more information.

-dave

Gdeaver
06-03-2006, 23:49
Question. Are all of the teams that are having problems using the most recent IFI loader? Wasn't the latest loader upgrade to allow full use of the new pic's increased memory? Could this be something in common with teams having problems. Wouldn't a fully charged back up battery show between 8.2 to 8.4 volts? When the error occurs what happens if the back up battery is unplugged? How does IFI manage the change from the 12 volt bat to the back up? Could a drop out from those shooters starting up hang the processor?

Kevin Watson
07-03-2006, 00:17
Kevin -
Just as a precaution, I really hope you are hungry. We experienced the "8.2V problem" over the weekend at the VCU regional. Ricky Torrance from IFI was there, and we reviewed the problem with him. We were not able to confirm it in the time we had to work the problem, but issues with overwhelming the interrupt handler was one of the two plausible explanations we were able to identify (the other was a memory overrun situation into a specific unprotected memory space that affected the ADC lines).

At the time, we only had that one known case of the problem to give to IFI as an example, which limited the options for investigation of the problem. As of this afternoon, IFI is aware that the problem may be larger, and is looking in to it. We have pointed them to this thread for more information.

-daveDave,

Well, I'm pretty confident it has nothing to do with my code for several reasons. The first, and most important, is that Mark has duplicated the problem using IFI's own default code.

-Kevin

dlavery
07-03-2006, 00:30
Dave,

Well, I'm pretty confident it has nothing to do with my code for several reasons. The first, and most important, is that Mark has duplicated the problem using IFI's own default code.

-Kevin

I would not be surprised if that turns out to be the case. If so, then I will gladly eat a dozen Krispy Kreme donuts (notice I didn't say anything about the box...). Either way, I understand that both Mark and Tony Norman will be at the Arizona regional with us this weekend. If the problem is not solved by then, we can hold them down and sit on their heads until they fix it... :)

-dave

ericand
07-03-2006, 03:57
When we have seen this problem it has been immediately after a power up,
and is not cleared by using the reset button, but can be cleared by powering down and bringing it up again.

What is different between the data segment initialization when the reset button is pushed and the initialiazation when the system is powered up?

It almost looks like the code in _do_cinit is not setting the initialized memory
up correctly sometimes.

kaszeta
07-03-2006, 08:27
Dave,

Well, I'm pretty confident it has nothing to do with my code for several reasons. The first, and most important, is that Mark has duplicated the problem using IFI's own default code.

As have I, and demoed it to Corey at BAE, hence my skepticism that the interrupt handler is the problem. My money is on something being wrong in the master CPU, since bad user code shouldn't be able to cause some of the problems I've seen.

chris31
07-03-2006, 08:29
Um, I'll eat the cardboard box that housed the last dozen Krispy Kremes that Dave Lavery polished-off if this is found to be true :).

-Kevin

We are not even useing the adc code and we get the problem. So lucky for you it looks like you wount have to eat a cardboard box.

MikeDubreuil
07-03-2006, 10:59
Can someone who has experienced the problem and have the hardware upgrade to the latest 2.44 version of the C18 compiler. Compile some user code and with the newly compiled code try to replicate the problem. I find it highly suspicious that Kevin Watson does the amount of development he does on the IFI boards and he hasn't experienced the problem.

kaszeta
07-03-2006, 11:08
Can someone who has experienced the problem and have the hardware upgrade to the latest 2.44 version of the C18 compiler. Compile some user code and with the newly compiled code try to replicate the problem.

I'm going to check that either tonight or thursday when I'm working on our test bot (I'll slip in the 2006 controller).

I find it highly suspicious that Kevin Watson does the amount of development he does on the IFI boards and he hasn't experienced the problem.

I don't, since we didn't notice the 8.2V and "failing bot" issues (and didn't have a reason to look for them) during testing (we were running motors and test chassis systems by the end of week 2), and spent just about 100 hours running variations of autonomous plays without a problem. Something is subtley different about the competition environment that makes everything worse.

Eldarion
07-03-2006, 13:24
I don't, since we didn't notice the 8.2V and "failing bot" issues (and didn't have a reason to look for them) during testing (we were running motors and test chassis systems by the end of week 2), and spent just about 100 hours running variations of autonomous plays without a problem. Something is subtley different about the competition environment that makes everything worse.
There is an encrypted serial link between the OI and the Arena Controller; could it be bogging down the OI in certain (rare) circumstances?

Kevin Watson
07-03-2006, 13:47
Can someone who has experienced the problem and have the hardware upgrade to the latest 2.44 version of the C18 compiler. Compile some user code and with the newly compiled code try to replicate the problem.. Someone has already stated that they can invoke the "8.2 mode" using the pre-compiled gyro code from my website. As this code was compiled with 2.44, I don't think 2.44 is the cure.

I find it highly suspicious that Kevin Watson does the amount of development he does on the IFI boards and he hasn't experienced the problem.My guess is that most teams aren't having a problem, and that something really low-level and sinister is the root of the problem. I'm also using a hand-built early beta version of the robot controller, which may be a contributing factor.

-Kevin

kaszeta
07-03-2006, 16:09
IFI appears to be working on this, and has sent me an updated linker script which I am going to test to see if this corrects the issue. I'll let you know how testing turns out.

Kevin Watson
07-03-2006, 16:46
IFI appears to be working on this, and has sent me an updated linker script which I am going to test to see if this corrects the issue. I'll let you know how testing turns out.Can those folks who are having problems please try building your code with the attached 18f8722beta.lkr link script and testing your code. If you still have problems, then try the attached 18f8722_2.44.lkr script and test again. Please report back here with the outcome.

-Kevin

ScottM
07-03-2006, 17:28
We were running our practice robot tonight and ran into a very similar problem. In the middle of running the robot manually, the inputs stopped working and the OI displayed 8.6v. After cycling power on the robot, it worked fine. Could this be another manifestation of this issue? We are using the 2005 controller and easyC.

chris31
07-03-2006, 20:02
I dont have our 2006 controller or any parts with me. But anyone who had this issue please post what you find after using this new linker script.

kaszeta
07-03-2006, 20:07
Can those folks who are having problems please try building your code with the attached 18f8722beta.lkr link script and testing your code. If you still have problems, then try the attached 18f8722_2.44.lkr script and test again. Please report back here with the outcome.

-Kevin

I breadboarded up our 2006 RC, and now I'm in the frustrating situation that I can't duplicate the error, even using the exact same .hex files as in the pits at BAE. So I couldn't get the error to show up, so I can't tell if 18f8722beta.lkr is doing anything. Anyone yet found a highly reliable way to duplicate this? (very frustrating, since it was showing the error reliably in the pit)

As for 18f8722_2.44.lkr, if you try to use it as provided, you get a "Error - section '_entry_scn' type is non-overlay and absolute but occurs in more than one input file" from the linker, which requires that you comment out the first "FILE" line in the .lkr script. After that, IFI Loader won't upload it, giving an "invalid address : 0x20 (Correct Range: 0x800-0x7fff)" error.

Kevin Watson
07-03-2006, 21:35
As for 18f8722_2.44.lkr, if you try to use it as provided, you get a "Error - section '_entry_scn' type is non-overlay and absolute but occurs in more than one input file" from the linker, which requires that you comment out the first "FILE" line in the .lkr script. After that, IFI Loader won't upload it, giving an "invalid address : 0x20 (Correct Range: 0x800-0x7fff)" error.Okay, I see the problem with the file. If you can, try the 18f8722beta.lkr script.

-Kevin

Rickertsen2
07-03-2006, 21:46
Well programmers, there is upshot to this. For once there is a programming problem that might not be our fault!

What leads people to think that this is a problem with the linker script? It seems to me like there is a good chance this could be a master proc problem.

Its interesting to note that 8.2/8.3 volts is very near 127 when you convert it using that little forumula. Is it possible that the RC and OI are somehow loosing synchronization and reading the wrong parts of the packets? Maybie some structure or array is being manipulated incorrectly?

Make sure you are using the newest version of IFI loader. I had some bizarre similar problems from using an old version. I have also had weird errors from USB->serial converters. Rule these things out first.

kaszeta
07-03-2006, 21:56
Okay, I see the problem with the file. If you can, try the 18f8722beta.lkr script.

Script seems to work fine (produces hex files that seem to work), but since I currently haven't been able to duplicate the bug then I can't tell if it cures it.

kaszeta
07-03-2006, 22:00
Make sure you are using the newest version of IFI loader. I had some bizarre similar problems from using an old version. I have also had weird errors from USB->serial converters. Rule these things out first.

I haven't been able to recreate the problem using the older IFI loader (but all the code I've used so far, even when I was still using trig functions, still fit in bank 0).

And our laptop is still old enough to have a serial port...

But yes, both of these are worth ruling out.

devicenull
07-03-2006, 22:20
There is an encrypted serial link between the OI and the Arena Controller; could it be bogging down the OI in certain (rare) circumstances?

I highly doubt it's encrypted... why would it be? Teams gain nothing from somehow monitoring it.. And how many teams would actually have a chance to? It requires access to the field, when it's fully set up and running.

Although, the fact that the Master processor is now communcating with the radio and arena controller may have something to do with it.. but this would have shown up last year if this were the case.

Does anyone know if the master processor was upgraded too? Or is it still an 8520? If its the 8772, why haven't we seen another release of the firmware after the new IFI libraries were released?

I somehow doubt that the user processor is at fault here.. so the linker script is worthless.

Eldarion
08-03-2006, 00:48
Does anyone know if the master processor was upgraded too? Or is it still an 8520? If its the 8772, why haven't we seen another release of the firmware after the new IFI libraries were released?
It is still an 8520, and I was wondering the same thing about the firmware.
I somehow doubt that the user processor is at fault here.. so the linker script is worthless.
That was my thought too, especially since I was able to get the same "bug" to appear on the 2004 RC. I am thinking it might occur when the user processor "confuses" the master processor by sending bad data because of interrupts interfering with PutData(). Just a guess. :)

EDIT:
DeviceNull, I did some snooping and I think you may be right about the encryption, although there's no way to be absolutely sure. There sure are a lot of unsubstatiated rumors about that AC to RC link floating around! ;)

eugenebrooks
08-03-2006, 00:51
Can those folks who are having problems please try building your code with the attached 18f8722beta.lkr link script and testing your code. If you still have problems, then try the attached 18f8722_2.44.lkr script and test again. Please report back here with the outcome.

-Kevin

We can't check for the bug with the beta loader on our robot controller
as our robot is in the crate heading from Portland to SanJose.
I did try building the hex file with the beta <linker> script, however, and
note that there is a large number of differences in the hex file, relative
to the one generated prior to copying the beta <linker> into place.
If IFI fixed a bug in the linker script with their change, our code was
stepping on this bug in lots of places.

We will learn the score, for sure, during the practice day at the
SanJose regional...

Eugene

Eldarion
08-03-2006, 00:53
Anyone yet found a highly reliable way to duplicate this? (very frustrating, since it was showing the error reliably in the pit)
Try uploading Kevin's frc_gyro.hex as many times as necessary to cause the error. :) (See previous post; I have also confirmed that this causes a similar "bug" on the 2004 RC)

Was the robot tethered when you had the bug, or was it on the field with the radio?

kaszeta
08-03-2006, 08:11
Try uploading Kevin's frc_gyro.hex as many times as necessary to cause the error. :) (See previous post; I have also confirmed that this causes a similar "bug" on the 2004 RC)

I tried it a bunch of times last night without triggering it,

However, (1) I only tried it with the tether (the radio modem pair I scrounged up at the clubhouse appears not to be working), and (2) I don't have a backup battery handy, but I did vary the voltage on the backup battery lugs by (a) hooking a +5V signal from the pwm outputs to it, (b) hooking up a 1.5v battery, and (c) hooking up a 9V charger to the pins

I'll try it again today with an actual battery and see if massive repeated uploads of frc_gryo.hex do it.

Was the robot tethered when you had the bug, or was it on the field with the radio? Both, and oddly, I saw the error about half of the time.

EricS-Team180
08-03-2006, 09:34
.....from page 2.....
...As you move the joystick 4 wheel, the OI battery voltage changes. The p3_wheel is mapped to the backup battery voltage (as shown from the data in the dashboard). the p1_x axis now maps to the joystick #3 switches. ...

Correction - Rather than the p3_wheel & p4_wheel, it is actually the p3_aux and p4_aux.

This doesn't directly relate to these microprocessors, but...
This reminds me of something I once encountered with PowerPC's, running VxWorks OS.
It had to do with how you saved the data registers when a function was
interrupted by a function with higher priority.
You could request that the interrupted functions's registers be saved as an "integer save" or a "floating point save". If you used an "integer save" and you had any floating point registers that you were using, your saved data was
corrupted.......a very low level and weird bug.

So...
How are the registers saved on the User proc when it is interrupted by the master proc for data i/o? A pointer hack, overflow or corrupted save could certainly be changing the apparent mapping of the i/o data. And, it could be reproducible on any of the pic FRC's (I believe ElDarion reproduced the problem on a 2k4)

...just my 2cents

ericand
08-03-2006, 10:51
Can those folks who are having problems please try building your code with the attached 18f8722beta.lkr link script and testing your code. If you still have problems, then try the attached 18f8722_2.44.lkr script and test again. Please report back here with the outcome.

-Kevin

Our problems seem to be helped by changing the linker script. The symptoms
indicate that txdata and/or rxdata are being corrupted. In our robot,
the .map file indicates that txdata is at 0xf00 (gpr15 right above the stack).

Normally I would suspect a stack overflow into the txdata, but the PIC
addressing modes can't allow the stack to escape its bank.

On the theory that there is something strange about the memory in gpr15,
we changed our linker script to disallow data in gpr15. This seemed to fix
things, but we were out of time so we don't have enough testing to be 100%
sure. My fear is that we have just changed the symptom, but not eliminated
the problem.

Mark McLeod
08-03-2006, 11:04
If gpr15 is suspect (or really 14) we can nail a diagnostic data array into that space and test periodically to see if an overwrite of that bank occurs. I was thinking of using databanks before and after to isolate our user code from "outside" corruption.

ericand
09-03-2006, 13:41
Check out IFI web site for an update. They are saying
to use the linker change (protect gpr15) and the new
libraries that resolve the interrupt issue.

http://www.ifirobotics.com/docs/memory_problem_8722.pdf

Kevin Watson
09-03-2006, 14:39
Check out IFI web site for an update. They are saying
to use the linker change (protect gpr15) and the new
libraries that resolve the interrupt issue.

http://www.ifirobotics.com/docs/memory_problem_8722.pdfPlease let IFI or myself know if this does not fix the problem.

-Kevin

Rickertsen2
12-03-2006, 13:30
I'm not sure i correctly understand this issue. Is it a problem with the 18f8722? Why does not using the gpr15 area fix this?

ericand
12-03-2006, 15:15
What we are hearing from IFI is that the problem is at least partly associated with
the oscillator that controls timing in IFI controller. It seems that it can drift, and that
the drift is related to temperature.

The data that is getting corrupted is the inter-processor communication data which tends to be stored in gpr15.

I too am curious about why moving those data structures (by protecting gpr15) seems
to solve the problem and not just move it somewhere else. It may have something to
do with the fact that gpr15 is located adjacent to the locations the registers are memory mapped to.

I'm hoping that we will have more clues about what works and what doesnt over the course of the next few regionals, but I feel sorry for teams that are affected by this
problem. Our robot was impacted by this in about 1/3 of our matches in Portland.

eugenebrooks
12-03-2006, 16:11
Different data segments might have slightly different address/data
timing margins and moving the data to a different segment could be
resolving the timing issue if that is what it is. Lets hope that the
patch resolves the problem. We were also at Portland and got hit
quite severely by this bug on Thursday afternoon. After trying a
loaner RC and seeing the same problem we gutted all of our feedback
code to obtain a robot that ran reliably. This severely impacted our
performance by taking all of our cameral directed shooting off line
and forced us to resort to a defensive strategy. If the linker patch
works around the problem we will have a very different robot.

Given the temperature sensitivity perhaps I should bring one of my
leg warmers to strap to the robot controller...

Eugene




What we are hearing from IFI is that the problem is at least partly associated with
the oscillator that controls timing in IFI controller. It seems that it can drift, and that
the drift is related to temperature.

The data that is getting corrupted is the inter-processor communication data which tends to be stored in gpr15.

I too am curious about why moving those data structures (by protecting gpr15) seems
to solve the problem and not just move it somewhere else. It may have something to
do with the fact that gpr15 is located adjacent to the locations the registers are memory mapped to.

I'm hoping that we will have more clues about what works and what doesnt over the course of the next few regionals, but I feel sorry for teams that are affected by this
problem. Our robot was impacted by this in about 1/3 of our matches in Portland.

Eldarion
12-03-2006, 17:06
What we are hearing from IFI is that the problem is at least partly associated with
the oscillator that controls timing in IFI controller. It seems that it can drift, and that
the drift is related to temperature.
So they are using an RC oscillator on the processor? That would be odd.
If it's a crystal oscillator, there's no way that should be happening whatsoever! :confused:

On a side note, there are over 100 posts in this thread! :yikes:
This must be a big problem! :D

Rickertsen2
12-03-2006, 17:48
So they are using an RC oscillator on the processor? That would be odd.
If it's a crystal oscillator, there's no way that should be happening whatsoever! :confused:

On a side note, there are over 100 posts in this thread! :yikes:
This must be a big problem! :D

The 04 RC uses a crystal. I have not taken apart a new RC. If they use an RC oscillator it is indeed a BIG problem.

Gdeaver
12-03-2006, 17:56
Crystal are subject to temperatures so are voltage regulators. If I remember correctly don't both processors share a common clock? Maybe this worked for the old proc but not the new one this year. Also the pic does have a watchdog capability. Maybe this needs to be implemented.

rangersteve
12-03-2006, 18:07
WOW. We had this same problem. We got it fixed at the competition after we lost 2 matches. I am not a programmer so i dont know what the problem was but it supposedly came out in an update. I will try to get one of our programmers to tell me what or where. It is a simple update that took them about 5 minutes. I wish i had more details. It is a very fixable problem. It shifts the inputs by one or something. I am really surprised nobody has posted a fix here yet.

kaszeta
12-03-2006, 18:44
The 04 RC uses a crystal. I have not taken apart a new RC. If they use an RC oscillator it is indeed a BIG problem.

The '05 uses a crystal, and I'm sure the '06 does too. Crystals still have temperature issues, they just aren't as bad.

chris31
12-03-2006, 18:55
WOW. We had this same problem. We got it fixed at the competition after we lost 2 matches. I am not a programmer so i dont know what the problem was but it supposedly came out in an update. I will try to get one of our programmers to tell me what or where. It is a simple update that took them about 5 minutes. I wish i had more details. It is a very fixable problem. It shifts the inputs by one or something. I am really surprised nobody has posted a fix here yet.

Please post what you know, but i doubt the fix is as simple as you seem to think it is.

On a side note i just found out are RC is back from competition so hopefully i can test the tempurature stuff more.

EDIT: WOW, 105 posts in this thread.

Rickertsen2
12-03-2006, 19:29
The '05 uses a crystal, and I'm sure the '06 does too. Crystals still have temperature issues, they just aren't as bad.


NOWHERE NEAR as bad.

eugenebrooks
12-03-2006, 20:50
Come'On guys, the someone used RC for "Robot Controller"
in posts above and then someone distorted it to "RC" circuit for
timing. Lets get back on topic...

Its not the Crystal... It is likely the R and the C in the CMOS
wiring...

eugenebrooks
12-03-2006, 20:52
WOW. We had this same problem. We got it fixed at the competition after we lost 2 matches. I am not a programmer so i dont know what the problem was but it supposedly came out in an update. I will try to get one of our programmers to tell me what or where. It is a simple update that took them about 5 minutes. I wish i had more details. It is a very fixable problem. It shifts the inputs by one or something. I am really surprised nobody has posted a fix here yet.

If you had the 8.2[3] battery voltage problem, and you applied the
high priority interrupt patch, and the linker patch, and it fixed the problem,
WE REALLY WANT TO KNOW!

Eugene

devicenull
12-03-2006, 21:15
If you had the 8.2[3] battery voltage problem, and you applied the
high priority interrupt patch, and the linker patch, and it fixed the problem,
WE REALLY WANT TO KNOW!

Eugene

We've never had this problem.. I patched the linker script Thursday, when I found out there was a patch. I don't know if it fixed problems that would have occured, or if we never would have had them.

I can't see any reason why you shouldn't use both patches.. What more can you do? Patch both and hope for the best.

rangersteve
12-03-2006, 23:20
I am positive that we had the 8.3 Volt battery problem. I dont know what was fixed but I do know that it was fixed by a moderate programmer in 5 minutes. I also know it had something to do with switching ports and making our robot shiver but not work. The person who fixed it(Brandon Heller, Tim Emerson also knows about it) is a regular to this site and i am very surprised he hasnt read this thread. I dont know where he got the fix or anything, I heard something about a team update and a new version of something. I wish I could be of more help. I will try to get a hold of somebody who knows what they are talking about.

Kevin Sevcik
12-03-2006, 23:35
Ok all, just back from GLR. I was in terror of the 8.2V bug thursday night and saw the patch in here. I was going to rebuild and dump code to make sure we wouldn't get hit, but figured the odds were low that it'd happen in the first match, and we had a lot of robot work to do. So, of course, I see the dreaded 8.2V on the OI when we're out on the field. Tried to cycle the power, but the match was starting, so that sucked. Rebuilt the code and dumped it, and didn't have the problem for the rest of the regional.

However, this is such an intermittent problem, that I'd sort of like to have a statistically significant sample of people this has worked for.

rangersteve
13-03-2006, 15:42
I talke dto one of our programmers at school today and I sent an email to a mentor, so one of them should be on here soon.

devicenull
13-03-2006, 18:11
I talke dto one of our programmers at school today and I sent an email to a mentor, so one of them should be on here soon.

IFI has posted the patch already.. It's on their web site. You don't have to know any programming to do it..

tanstaafl
13-03-2006, 22:56
ok, here we go: the answer to this problem is actually pretty simple. here's the link to the file, in case anyone can't find the thing.

http://www.ifirobotics.com/docs/memory_problem_8722.pdf

realize that this is labeled as only a temporary fix, hopefully they'll be able to find the permanent fix soon.

Metalcrafters
15-03-2006, 19:21
I am curious about this subject. We had some troubles in AZ, seemed like our bot was glitching. Very sporadic, we were never able to trace it.
What were the symptoms of this voltage bug?

Thanks in advance

Kevin Sevcik
15-03-2006, 20:21
Typical symptoms are your robot just plain not working. It tends to hit as soon as you power the bot on and won't ever rectify itself once it happens. Atleast until you completely power the bot down. When it happened to our bot, we just ran back and forth and the joysticks wouldn't do anything.

Mark McLeod
16-03-2006, 18:36
On our controller the problem never occurred on startup although most of the others I've personally seen occurred immediately at startup.

It hit us after the robot had been on awhile. The symptoms will differ with your individual control setup. In one instance ours suddenly began attempting to rapid fire our poof balls all on it's own. The more common case was the gunner would suddenly find himself controlling the left drive train, or when idling the robot suddenly began driving away slowly. You will see mechanisms moving without orders and controls that won't control.

No recurrence for us in competition with the IFI library update and blocking out bank 15. But the controller doesn't run very long in matches.

I did see the problem in a couple of other robots at competition, but they didn't have the newest patches in.

eugenebrooks
17-03-2006, 00:56
We patched our code with the interrupt related patch, and the linker script patch, and although we did not see 8.2 displayed on the controller for the battery voltage our code refused to load and run on the 2006 controller. Retrofitted to the 2005 controller our custom code loaded and ran without any problems.

Depending on your use of interrupts and timers, you may, or may not see problems continue after patching your libraries and linker script. Several teams at the San Jose regional remained dis-functional after patching.

The only sure cure seems to be a 2005 controller upgraded to V12 master code. The upgrade is required and checked by the technical inspectors. If you are having problems with the 2006 controller, I would like to suggest that you use a fix-it window to retrofit your custom code to the 2005 controller, squeezing it to fit if required, so that it is ready to go if you need it. Don't forget to have a copy of the V12 master code to load into your controller.

Have fun, and don't forget that it all about fun and nothing else...

Eugene

devicenull
17-03-2006, 13:19
The only sure cure seems to be a 2005 controller upgraded to V12 master code. The upgrade is required and checked by the technical inspectors. If you are having problems with the 2006 controller, I would like to suggest that you use a fix-it window to retrofit your custom code to the 2005 controller, squeezing it to fit if required, so that it is ready to go if you need it. Don't forget to have a copy of the V12 master code to load into your controller.

Nope.. The 2006 RC is required this year.

<R60> You must operate your robot with the wireless, programmable Innovation First 2006 Robot Control
System.

Post on QA about it too: http://forums.usfirst.org/showthread.php?t=788

chrisinmd
17-03-2006, 20:00
Yeah, we got bit by the 'bug' in like our 4th seeding match today at Chesapeake. Bot started going crazy all over the field, running into the walls and such. Took it back to the pits, all the controls were messed up, then mentor walks over to the controls, hits the select button on the OI, and says, "Well you only have 8.2 volts here." Ahh!!! Problem solved. Got the update on flash drive from IFI guy, our programmer had it fixed in 5 minutes.

Everything else went well and we are now 5 and 1- the match we lost was when our bot was possessed!

Make sure to get the update!

Good luck to all,
-Chris

yoyodyne
18-03-2006, 00:01
I am happy to report that so far (8 matches today) Team 116 has not had problems at the Peachtree regional. We are using the same library update as we were at VCU and in addition using the new linker command file with gpr15 PROTECTED. I still don't understand how user processor memory issues would impact the master processor to OI comms. We have not implemented the large code switches mostly because we don't want to slow the code down.

Greg

Rickertsen2
20-03-2006, 21:49
This killed one of our alliance partners in the elimination rounds and almost killed another.

ericand
21-03-2006, 19:01
This killed one of our alliance partners in the elimination rounds and almost killed another.

More importantly, did the suggested linker and library changes help, or were they already using them?

At this point, posts saying "yes it happened to me (or someone we know)" are not worth much without knowing if they have tried the fix or not.

If anyone is experiencing a problem (and they are using the fix) please let IFI know, and let the Chief Delphi community know ASAP!

Right now, all indications that I have seen are that the linker script and the library replacements fix the two hardware problems in the robot controller. If anyone has reason to doubt that the problems are fixed, or if anyone has experienced any other controller hardware problems, please report the details here.

Thanks!

kaszeta
02-04-2006, 12:16
Well, with the revised linker script, Team 95's robot functioned perfectly at Palmetto. Competition is a lot more entertaining when your bot actually does something (we got second place with Teams 16 and 1676).

Deepfelt thanks to everyone here and at IFI for their help in finding this problem and turning our season around.

ericand
03-04-2006, 15:21
Well, with the revised linker script, Team 95's robot functioned perfectly at Palmetto. Competition is a lot more entertaining when your bot actually does something (we got second place with Teams 16 and 1676).

Deepfelt thanks to everyone here and at IFI for their help in finding this problem and turning our season around.

I concur, it was very nice seeing our robot run in Las Vegas and being confident that any strange behavior was due to things within our control, and not hardware failure in the RC.

ericand
09-01-2007, 21:54
Does anyone know if the hardware problem that caused this has been fixed in the 2007 controler, or do we still need the linker script change that eliminates the upper section of memory?

Kevin Watson
09-01-2007, 22:04
Does anyone know if the hardware problem that caused this has been fixed in the 2007 controler, or do we still need the linker script change that eliminates the upper section of memory?No, it hasn't. This years controller uses the same revision of the 18F8722 as last year.

-Kevin

ericand
09-01-2007, 22:36
My understanding was that the problem was drift in the oscilator that coordinated the communication between the master and user processors.

Kevin Watson
09-01-2007, 22:51
My understanding was that the problem was drift in the oscilator that coordinated the communication between the master and user processors.Come to think of it, I was discussing this year's controller with IFI staff and this was mentioned, but I don't recall if they made changes to address this bug. The other 18F8722 related bugs are still there though. You might give 'em a call to find out.

-Kevin

amateurrobotguy
11-01-2007, 21:50
Yeah, I will dl and patch just to be sure. Oh, the ref thought we were having the 8.2 volt bug, but it turned out to be the circuit breaker tripping :) Is there any plans to fix this bug permanently? Did Microsoft have something to do with the controller---this seems like their handy-work ;)

JBotAlan
11-01-2007, 23:14
I saw this bug tonight on last year's controller. I am not using any fixes yet, because I didn't think that this was such a huge issue.

Mine went into autonomous mode. All by itself. Even though the competition jumper was connected and set to disable. This is a *serious safety issue*. More than one light was amber on the RC; I think it was the bottom two. I can't remember for sure because I saw it, went "what the..?" and hit reset a few times. For about 2 minutes the controller was strange, not obeying joystick commands, shifting (or trying to, there was no air in the pneumatics), and running auton intermittently (YEA! Another intermittent error!:) ). I re-programmed the controller more than once, with slightly different codes (I added a printf to see if it really was auton running; I couldn't believe it).

I will try the linker script. I already had new libraries compiled in, so those aren't the cure. I hate that I have to use a "beta-ish" fix, but if that's all there is, that's what I'll use. This is the first time in the whole year and a half we've had that controller that I've seen that error. May it never come back!

Hopefully IFI gets it straightened out quick!

JBot

Joe Ross
19-02-2007, 12:35
I saw the 8.3 Battery Voltage Bug last night with the following setup:

2004 controller upgraded with the 8722 processor (the upgrade was done last year).
Beta 14 master firmware.
2005 radios.
2007 OI
the current linker script and frc_library.lib from the 2007 default code (which I verified were the same as the updated ones from last year)


I originally saw the problem about a week ago. After investigating, I realized we were not using the new frc_library.lib. After putting that in, the problem went away until last night, when we saw the problem 3 or 4 times.

Terry Sherman
19-02-2007, 21:46
We've seen the 8.3/8.2 Battery Voltage Bug yesterday and today with the following setup:

2007 controller
Beta 14 master firmware.
2007 radios.
2007 OI

We have the PROTECTED memory section defined in the linker command file.
We have the latest FRC 8722 Library file

Has any other teams seen the 8.2v bug resurface? Any ideas?

Thanks!
-Terry

chris31
19-02-2007, 21:56
We've seen the 8.3/8.2 Battery Voltage Bug yesterday and today with the following setup:

2007 controller
Beta 14 master firmware.
2007 radios.
2007 OI

We have the PROTECTED memory section defined in the linker command file.
We have the latest FRC 8722 Library file

Has any other teams seen the 8.2v bug resurface? Any ideas?

Thanks!
-Terry

It hurt first week regional team last year. Hopefully, if it is occurring again we can figure it out fast. Especially since we ship our RC tomorrow.

Andrew Blair
19-02-2007, 22:03
We've had it on our brain for the entire build season. Adding protected to the problematic memory section and replacing the library files fixes the whole problem.

eugenebrooks
20-02-2007, 03:46
We've seen the 8.3/8.2 Battery Voltage Bug yesterday and today with the following setup:

2007 controller
Beta 14 master firmware.
2007 radios.
2007 OI

We have the PROTECTED memory section defined in the linker command file.
We have the latest FRC 8722 Library file

Has any other teams seen the 8.2v bug resurface? Any ideas?

Thanks!
-Terry

Are you using any interrupts?
If so, what interrupts are you using and for what purposes.

<<< Let me be a little more transparent. All of the teams that I talked to last
year who were seeing this problem, after applying all the patches, were using
timer 3. I am interested in knowing if you are using timer 3. The patches
move things around in memory, and any movement of variables in memory
affected this problem, without really fixing it. We never saw the problem when
we backed away from interrupt use, although our interrupt based code was well
wrung out on prior year's controllers and never showed a problem. Knowing that
the chip was the same, and that the patches did not fix it for us last year, we
avoided any additional interrupt use in our code this year. One workaround worth
trying if you are not using the Kevin's camera code is to use his lastest code
that changes how the high number pwms are being handled. >>>

Eugene

Mike Betts
20-02-2007, 06:23
Joe and Terry,

If you have not done so already, please get this info to IFI as soon as possible.

Regards,

Mike

Dave K.
20-02-2007, 06:50
I read through the errata on the 8722 last year, and again recently, and couldn't see how the changes to the linker's ability to use a region of memory related to any of the published errata.

Did I miss something?

kaszeta
20-02-2007, 07:57
Seeing that we got killed by this bug at BAE last year, it's on our pre-match checklist to verify the battery voltage on the OI when the bot is placed.

That said, we haven't seen this one this year in practice yet, or at the scrimmage. But there's a lot of code out there that doesn't have the linker fix in it, so I'd expect a certain amount of repeat of this problem.

Matt Krass
20-02-2007, 19:14
Seeing that we got killed by this bug at BAE last year, it's on our pre-match checklist to verify the battery voltage on the OI when the bot is placed.

That said, we haven't seen this one this year in practice yet, or at the scrimmage. But there's a lot of code out there that doesn't have the linker fix in it, so I'd expect a certain amount of repeat of this problem.

Furthermore it seems like there's a lot of code with the linker fix that's still being affected. I'm starting to think we need a processor rollback, with maybe memory extensions (I'm not sure if the PIC supports external Flash/SRAM latches though) to compensate.

Joe Ross
20-02-2007, 22:34
Joe and Terry,

If you have not done so already, please get this info to IFI as soon as possible.

Regards,

Mike

I did speak with Tom Watson yesterday. He said I was the first to report it since the new linker script and library files were released. Hopefully Terry has talked to them by now as well.

I did more testing today. I was able to reproduce it with all 2007 components.

We're using a modified version of Kevin's ADC code at 3200hz. We're using his camera code, but without a camera physically installed. The problem will occur consistently if we're using printf to print 20 or so characters per slow loop. I didn't see it when we were not outputting any data.

Last week, I verified that that we could output 40 characters without causing us to slow down the slow loop. However, something may have changed since then, so I need to reverify this.

I suspect that something in our modified ADC code is stomping on something it shouldn't, and that it isn't a widespread issue.

I do have a test setup that I can reproduce the issue on, so I will continue trying to isolate the problem, and hopefully it is just our code.

I still wonder in the back of my mind whether protecting gpr15 doesn't fix the root cause, but only the most common symptom.

kaszeta
21-02-2007, 11:22
I did more testing today. I was able to reproduce it with all 2007 components.


Obviously, we can't test with 2007 components. I've been concerned with this, so I've been testing, both with my 2006 RC and this years 2007 RC before we shipped.

Our code has 8 ADC channels running Kevin's ADC code at 800 Hz and 16 samples per update (so that we could get 12 bit precision although in testing I think we can work just fine with the default 200 Hz and 4 samples per update), as well as three shaft encoders all running at up to around 1000-2000 pulses/sec. We also generate a *lot* of debug printfs, but the student programmers will probably comment out most of those in the pits. We've got Kevin's serial drivers in there as well, but no camera. I haven't been able to duplicate anything yet with either controller using the fixed library and linker scripts. The 2007 rc has the beta update for the master controller code, but the 2006 rc is un-updated.

I'll try to recreate your setup and torture-test my 2006 RC if I have a chance.


We're using a modified version of Kevin's ADC code at 3200hz.


Why that fast, if you don't mind my asking? And what did you modify?


I still wonder in the back of my mind whether protecting gpr15 doesn't fix the root cause, but only the most common symptom.

I'm pretty sure this just cures one symptom.

eugenebrooks
21-02-2007, 14:36
We're using a modified version of Kevin's ADC code at 3200hz. We're using his camera code, but without a camera physically installed. The problem will occur consistently if we're using printf to print 20 or so characters per slow loop. I didn't see it when we were not outputting any data.


For what it is worth, when we had this problem last year we were using a timer to time the rotational interval of the ball shooting wheel. Our code to do this was very compact and carefully combed over for potential race conditions. Small changes to printf statements, or adding static sentinel variables to the code, could cause the 8.2 volt bug to come and go, and in particular, memory movement caused by the sentinel variables could cause hard code errors. The code ran without a hitch on the prior years RC.

Given all this trouble, we did not use any custom interrupt/timer coding this year. Stepping outside of what is commonly used by all the teams seems to get you into serious trouble with the new PIC chip. In my humble opinion FIRST should allow an older controller to resolve the 8.2 bug when it occurs. It just isn't right to lead students through a sophisticated control system development only to have the 8.2 bug take you out at the competition. We are supposed to be inspring the students, not depressing them.

Eugene

Joe Ross
22-02-2007, 18:10
We've seen the 8.3/8.2 Battery Voltage Bug yesterday and today with the following setup:

2007 controller
Beta 14 master firmware.
2007 radios.
2007 OI

We have the PROTECTED memory section defined in the linker command file.
We have the latest FRC 8722 Library file

Has any other teams seen the 8.2v bug resurface? Any ideas?

Thanks!
-Terry

Terry, when I talked to Tom Watson at IFI this morning, he again said that I was the only person to report the problem this year. Can you please call IFI so that they are more aware of the problem.

For what it is worth, when we had this problem last year we were using a timer to time the rotational interval of the ball shooting wheel. Our code to do this was very compact and carefully combed over for potential race conditions. Small changes to printf statements, or adding static sentinel variables to the code, could cause the 8.2 volt bug to come and go, and in particular, memory movement caused by the sentinel variables could cause hard code errors. The code ran without a hitch on the prior years RC.

Given all this trouble, we did not use any custom interrupt/timer coding this year. Stepping outside of what is commonly used by all the teams seems to get you into serious trouble with the new PIC chip. In my humble opinion FIRST should allow an older controller to resolve the 8.2 bug when it occurs. It just isn't right to lead students through a sophisticated control system development only to have the 8.2 bug take you out at the competition. We are supposed to be inspring the students, not depressing them.

Eugene

That's exactly the type of behavior we are seeing. I disabled all printfs and it ran overnight without a code error

I would support allowing the older controllers, however that wouldn't help us without a major code redesign, because we're already over both the code and RAM limit for the older processor.



Our code has 8 ADC channels running Kevin's ADC code at 800 Hz and 16 samples per update (so that we could get 12 bit precision although in testing I think we can work just fine with the default 200 Hz and 4 samples per update), as well as three shaft encoders all running at up to around 1000-2000 pulses/sec. We also generate a *lot* of debug printfs, but the student programmers will probably comment out most of those in the pits. We've got Kevin's serial drivers in there as well, but no camera. I haven't been able to duplicate anything yet with either controller using the fixed library and linker scripts. The 2007 rc has the beta update for the master controller code, but the 2006 rc is un-updated.

I'll try to recreate your setup and torture-test my 2006 RC if I have a chance.


We wanted to run the gyro at 1600 hz and 32 samples per update (we've gotten very good results with that in the past). Last year, that was the only sensor we used, so Kevin's ADC code was fine. However, this year, we're using 5 other analog sensors, so it wasn't practical to use his code with a sample rate of 1600*6. Our modifications use the same basic framework, but allow different channels to use different sample rates and samples per update values. It takes longer to execute the interrupt routine, and uses more stack space, but I've proven that the problem isn't throughput or stack space.

kaszeta
23-02-2007, 09:14
We wanted to run the gyro at 1600 hz and 32 samples per update (we've gotten very good results with that in the past)

I'm surprised that you need both that update rate and precision, that's all.

But I feel your pain, last year we used Kevin's ADC code for a gyro and a single pot (and the pot wasn't used after autonomous). This year we've got the gyro, IRs, and a pot.

I'm interested in your "different channels at different update rates", since we tried the same thing (we really don't need our IR sensors updating as fast as the gyros), and it was indeed very difficult to get it to not RLOD. Difficult enough that we decided it wasn't worth the risk.

Joe Ross
23-02-2007, 10:01
I'm interested in your "different channels at different update rates", since we tried the same thing (we really don't need our IR sensors updating as fast as the gyros), and it was indeed very difficult to get it to not RLOD. Difficult enough that we decided it wasn't worth the risk.

I just dropped Kevin's original code back in, and still had the problem. That makes me feel better that it isn't our new ADC code, but leaves me stumped.

ericand
23-02-2007, 20:51
When we were debugging this problem last year, we found that it was heat related. The problem would show up more frequently when the system was cold and less when it was warm.

See: http://www.chiefdelphi.com/forums/showthread.php?p=466992#post466992

WesleyC
19-03-2007, 15:54
We (Team #1825, JCH Robotics) have had this same problem, and let me tell you it absolutely killed our robot. Our design probably wasn't as strong as it could have been, but it was solid--however, starting Thursday and increasing in intensity through Saturday, we kept running into the following errors:

Our two interrupt-driven optical shaft encoders failed to update, causing our encoder-based arm and wrist positioning system to drive to its extremes without ever updating the value from the encoder. Thank goodness we had limit switches that were hardcoded to stop all motor operation, or even the foam-core fiberglass/resin compound that composed our arm would have failed.

Incremented variables designed to serve as timing loops failed, making our autonomous mode a signal failure, since our "release ringer" code was never triggered. Without this error, our autonomous mode would have been one of the most successful at the arena--the robot placed ringers perfectly in position several times (until our camera failed Saturday) but the variable that triggered the arm to release was overridden and never reached the expected value.

The camera, when we fired the robot up on Saturday, locked up entirely, despite the fact that it worked fine Friday and no changes had been made since.

Our OI displayed 8.3 volts consistently.

An "Unknown User Violation" displayed on the dashboard occasionally instead of the voltage readings, with no apparent relation to anything done in the code. All attempts to solve or debug this failed. PLEASE, would it be possible to get more detailed error messages than "You have an error!"?

On tether, the robot would work just fine--though other symptoms such as the 8.3 volt output on the OI were still present--however, when hooked up to the competition interface at the arena, the robot failed miserably, as apparently several variables were overwritten, causing the robot to act as if buttons had been pressed when they weren't.



In testing at home, the robot had appeared to work perfectly in testing, responding excellently to everything we did and placing several ringers even in autonomous. Based on this, we hadn't tried any fixes or patches--why fix what isn't broken? However, when we got to the arena, we started noticing these strange errors. We attempted to fix them, came here looking for help, found several bits of advice, and followed them all, but to no avail--the robot's performance only degraded as time went on.

Fortunately our drive train was still operational, and even though it only had 2 small CIM motors excelled at defensive maneuvers thanks to its traction and center of gravity. This allowed us to get into 8th in the seeding rounds--but by the time the finals matches rolled around, the robot was in such a state that it could barely drive, and its arm was entirely useless.

Fixes we tried:

Updating library files (no success; we were using the latest release)
Updating linker files (made the robot worse if anything; certainly no improvement noted)
Rebuilding the code from scratch (a real pain, and a last-ditch effort to make SOMETHING on the robot work, but we might as well have saved ourselves the effort)
Involving the IFI staff at the location (they were extremely helpful, but nothing they suggested made a difference--which puzzled even them)
Asking other teams for help (the Bomb Squad mentor/programmer and a few others graciously suggested a few fixes, but they were as ineffective as the other methods tried)
...
Attempting to punch the robot (hey, it works on my PC! Unfortunately, the RC was set back in the robot far enough that I couldn't reach it from where I was sitting...)

Even though we're now out for the season, I still want to get this robot working for demonstration purposes. What else can I try?

Bharat Nain
19-03-2007, 15:59
Call up IFI and tell them your problem. They might offer to look at your code and maybe even your processor. Maybe something else is also wrong with the processor you are using.

Joe Ross
19-03-2007, 16:09
We were able to work around it by not using printf or Kevin's serial code. We successfully used IFI's blocking serial libraries without triggering the bug. I do not beleive that Kevin's serial code is the entire problem, as we tried reproducing the problem with just his code several times. Rather, we've decided that it is some wierd interaction between his code and our code.


We replaced all our printfs with DEBUG statements like Kevin's camera code so we could easily disable all printfs before doing anything important.

Astronouth7303
20-03-2007, 17:33
I find it scary that the linker is allocating variables such that problems that bad are appearing, and you're using code that could do that.

All variables are statically allocated.

Pointers and arrays are used only a few times in the code.

Issues that bad would suggest that something is very, very wrong at a fundamental level. Try it on another RC.

Tottanka
16-02-2008, 07:15
We are having the same problem that you guys speak of in this thread, and we can't understand from this thread what the exactly is the solution...
Can anyone please clarify to us what exactly we shoud do?
We are using Kevin's most updated code...

Kevin Watson
16-02-2008, 12:38
We are having the same problem that you guys speak of in this thread, and we can't understand from this thread what the exactly is the solution...
Can anyone please clarify to us what exactly we shoud do?
We are using Kevin's most updated code...Whoa, really? Can you zip up your entire build directory and e-mail it to me?

-Kevin

Tottanka
16-02-2008, 12:49
I'm sorry for the confusion, everything is ok now.
Thanks a lot though =]

Kevin Watson
16-02-2008, 13:03
I'm sorry for the confusion, everything is ok now.
Thanks a lot though =]What do you mean everything is okay now? Given the pain the 8.2 bug caused teams in the past, you need to let me know if you actually had this problem, so that I can try to understand it and hopefully provide a fix.

-Kevin

Tom Line
16-02-2008, 13:41
We experienced this bug earlier this year. We changed the size of our autonomous code and it went away.

Kevin Watson
16-02-2008, 14:27
We experienced this bug earlier this year. We changed the size of our autonomous code and it went away.Is this with my code? If you happen to have an archive copy of the malfunctioning code, can you e-mail it to me? Has anyone else seen the 8.2 bug this year (yes, that includes you, over there in the corner)? If you have, please post here.

-Kevin

Tom Line
16-02-2008, 16:24
We are using your gyro + encoder code integrated into last years ifi default code.

We didn't remove any functional code when we did our clean up - we deleted some old procedures that we were not even calling. We were not close to the memory limits. That's the best information i can give you as we only keep the last 4 days of code - the rest gets deleted.

We're using MPLAB 7.20.

Madison
18-02-2008, 01:50
We experienced this bug, I think, twice for the first time tonight. We'll be driving the robot around normally and, without warning and for no obvious reason, it will stop in place and cease responding to all inputs. The OI shows 8.3V and the Code Error light illuminates.

Since I'm not a programmer, I can't say too much about how we have things set up. We're coding manually within EasyC and using two Chicklets for control, if that makes any difference.

I've pointed our programming mentor to this thread, as well as to the document provided by IFI about changing the information in the linker file. I'll update this with more information from him as I come across it.

Mr. Freeman
18-02-2008, 03:43
Is this with my code? If you happen to have an archive copy of the malfunctioning code, can you e-mail it to me? Has anyone else seen the 8.2 bug this year (yes, that includes you, over there in the corner)? If you have, please post here.

-Kevin

We have gotten the 8.2 bug on multiple occasions. Sometimes we'll turn on the robot and it'll go freaking crazy (right track at full speed and drive the arm all the way up, fortunately it seems to obey the limit switch and not overdrive it). We'll hit the disable switch and read the OI, lo' and behold it reads 8.2.
After a complete power cycle (kill main breaker and hit the reset button to power off the RC), sometimes we have to do it twice, it'll work just fine.

I believe this code is the 2007 default plus all of the changes we've made. We did not get this bug in any previous years and we don't know what made this happen this year.

Kevin Watson
18-02-2008, 12:36
I believe this code is the 2007 default plus all of the changes we've made. We did not get this bug in any previous years and we don't know what made this happen this year.Before this happened had you completed all the steps mentioned in this document (http://www.ifirobotics.com/docs/memory_problem_8722.pdf)?

-Kevin

Tom Line
21-02-2008, 16:52
Help!

We now figured out why the 8.2V bug went away last time - it's because we stopped using our practice bot + controller and switched to our new one.

Today we went back to the practice bot, loaded the code that works perfectly well on our competition bot, and it immediately went into the 8.2V bug.

I've deleted and commented out huge chunks of code in an attempt to change the compiled memory footprint however it does not seem to make a difference. Nothing we do with this code seems to correct the problem.

The issue immediately goes away upon loading the IFI default code. Can someone take a look at this and see if there's anything that jumps out at you as to the cause of the issue? Thank you!

We updated the master code to check if that fixed the issue and it did not.

Tom Line
21-02-2008, 16:58
Just for fun, we got our 2006 board out and used that - no 8.2 error. So we get 8.2 on the 2007 board, but not the '08 or the '06.

That makes me lean toward a silicon issue.

Kevin, if you would like our code, I can email it to you.

Kevin Watson
21-02-2008, 17:40
Kevin, if you would like our code, I can email it to you.Yes, please.

-Kevin

Kevin Watson
21-02-2008, 18:28
Help!

We now figured out why the 8.2V bug went away last time - it's because we stopped using our practice bot + controller and switched to our new one.

Today we went back to the practice bot, loaded the code that works perfectly well on our competition bot, and it immediately went into the 8.2V bug.

I've deleted and commented out huge chunks of code in an attempt to change the compiled memory footprint however it does not seem to make a difference. Nothing we do with this code seems to correct the problem.

The issue immediately goes away upon loading the IFI default code. Can someone take a look at this and see if there's anything that jumps out at you as to the cause of the issue? Thank you!

We updated the master code to check if that fixed the issue and it did not.Got your code and it looks like you didn't make the changes to the linker script discussed here:

http://www.ifirobotics.com/docs/memory_problem_8722.pdf

Also make sure you're using these libraries:

http://www.ifirobotics.com/docs/legacy/revised-frc-libraries_2-24-06.zip

Let me know if you're still having a problem after you make these changes.

-Kevin

TomZ
21-02-2008, 18:36
Got your code and it looks like you didn't make the changes to the linker script discussed here:

http://www.ifirobotics.com/docs/memory_problem_8722.pdf

Also make sure you're using these libraries:

http://www.ifirobotics.com/docs/legacy/revised-frc-libraries_2-24-06.zip

Let me know if you're still having a problem after you make these changes.

-Kevin

Thanks for the help, we fixed the two errors in the code and now it works fine.

Many Thanks From Team 1718
The Fighting Pi

Tom Line
21-02-2008, 20:05
Thank you Kevin. I'm a bit embarassed :o . I was told these were done... Mr. Z and myself are going to have some words when I get back...:mad: