AnalogModule->GetAverageValue() gives wrong results if polled too fast

I have found that if I try to read from Analog Module 1 while the battery reading jumper is on it, I occasionally get a value of 1628, even if the signal line is grounded.

This only happens if the battery jumper is in place, and if a Wait(.005) is omitted from the read loop.

I have produced a minimal Wind River project to demonstrate this. Here is the main routine, but the zipped project file is attached:

class AnalogModuleBugTest : public SimpleRobot{
Timer timer;
AnalogModule *aMod;

public:
AnalogModuleBugTest(void):
timer()
{
aMod = AnalogModule::GetInstance(1);
GetWatchdog().SetExpiration(0.1);
timer.Start();
}

void Autonomous(void){
double deltaT = 0;
float sampleRate = 0;
INT32 avgValue;
bool error = false;
timer.Reset();
sampleRate = aMod->GetSampleRate();
while (IsAutonomous())
{
GetWatchdog().Feed();

        deltaT = timer.Get();
        if (deltaT > .02){
            avgValue = aMod->GetAverageValue(6);
            if (avgValue > 500){
                 error = true; //<- put breakpoint here <<<-should never get here!
            }
            timer.Reset();
        }   
    //Wait(.005);  // (.005 seems OK) (.0005 has error) (.001 has error)
    }
}

… balance of code not relevant.

Here is the complete description, which is also included in the zip file ( and the line numbers will make more sense in the IDE):

This applies to Analog Module 1 only.

Although channel 6 is used in this example I have duplicated the
results for all other channels, except 8 which is battery.

Setup:
(1) Place a jumper on Channel 6 of AnalogModule 1 (slot 1) between
ground and signal, so the voltage and avgValue should be 0, + noise.
Typical value is -4.

(2) Battery jumper is IN place on the breakout board.

(3) Breakpoint is set on line 33 “error = true”.

(4) Line 37 “Wait(.005)” is commented out.

(5) Program is run as Debug Kernal Task - with option to attach spawned
tasks checked.

Result:
Breakpoint is reached within 3-4 minutes, often sooner, with
avgValue near 1630. If the program is resumed from the breakpoint,
the breakpoint is reached again within a similar timeframe.

Comments:
(1) With Wait(.005) at line 37 active the breakpoint was not reached
in a 2 hour span.

(2) With Wait(.0005) the breakpoint was reached as usual.

(3) With battery jumper OUT the breakpoint was not reached in several
hours. (Program was restarted after removing jumper).

(4) With battery jumper OUT - and the breakpoint not having been
reached for 20 minutes - the jumper was put IN while the program was
running. The breakpoint was reached with 2 minutes.

(3) Same results if using a gyro or accelerometer on channel 6 (or other
channel, including accumulator channels), whether using a live signal
or jumper to ground.

(4) SampleRate is 50000 as is, slightly higher ~51000 if a gyro is used due
to the oversampling/sample rate conversion.

(5) Module 2 (slot 2) seems not to be affected, possibly because placing
the battery jumper on that module has no effect.

Based on these tests it appears that there is a bug when the battery
jumper is in place on module 1.

Regards,
David

AnalogModuleBug.zip (7.85 KB)


AnalogModuleBug.zip (7.85 KB)

Changing the battery jumper only changes what is wired to
an analog input (8 if I remember correctly) and should not
change whether spurious values occur.

I posted a message about spurious values picked up from
an analog input some time ago, which was clearly a race
condition in the system somewhere. Perhaps your timer
use is clearing the race condition somehow.

When we upgraded to the required updates for the cRIO,
the FPGA and WindRiver, our spurious values went away.
We were reading the analog input directly, not using the
averaging functions.
Are you running the software configuration required by
the team update 12?

Software/Firmware Revision
LabVIEW for FRC Update 3.0a and newer
cRIO FPGA Image FRC_2009_v11.Zip and newer
WPI Robotics Library 3.0.1718 and newer
Driver Station 2009-02-010a3 and newer

Again, our problems with stray analog input values went away
when we installed this software configuration (LabView version
is not relevant when using C++)

Sorry, I don’t have time to read your post in detail,
I have to leave town on a trip…

Eugene

Eugene,

Thanks for the response. I had seen your January post and I didn’t know if we had the same problem or not.

All our software/firmware is the latest. After running another test I have determined that the “spurious” reading is actually the battery voltage. More accurately it is the voltage that is on the battery jumper pin.

For example, the DS reports my battery voltage as 14.15, and my “spurious” reading is 1635. Your reading was 1435 so I’d bet your battery voltage was (1435/1635) x 14.15 = 12.42 Volts. (I’m using a power supply, maybe you were on a battery).

As a test I put 8.55 Volts on the pin and my spurious reading dropped to 987, the exact ratio expected.

The reason the error didn’t show up with the battery pin out was that I was testing for a “high” spurious value, and with the pin out I was probably getting a spurious near 0.

David

I’m at my son’s place for the evening now,
and will be driving back home in the morning.
I won’t be able to check this thread again until tomorrow
evening.

We had seen the spurious value I had reported on the
previous thread when using the analog input on the same
module as the one measuring the battery voltage. Your
observation does seem to indicate that it was the battery
voltage that was being reported when the spurious value
came through. We also saw spurious values near zero
in other testing, for instance we tried other channels and also
swapping the interface card, and in this case the jumper
for the battery voltage, was likely not hooked up.
This explains the behavior. We never saw a zero spurious
value when the ones associated with the battery voltage were
coming through, so it appears to be confusion with regard to
the analog input used for the battery voltage.

We got no help, so we tossed in the towel on using
the analog input for anything in a feedback loop.

We were detecting spurious values by recording and reporting
a high and low limit for an input that was set up to read
a fixed voltage with a pot. One would pop through every
few seconds or so.

This behavior went away when we upgraded, but it appears that
the problem is still lurking. We ran a system that showed
the glitch every few seconds or so for about 10 minutes and did
not see the problem occur after our upgrade specified in
team update 12. Our cRIO is now in the crate with
the robot, so we can’t do any follow-up testing until we get to
the San Jose regional. We had tried both side cars, both bumper
cards, both plugins, and both slots.

We decided not to do anything with analog inputs when we saw
the problem during the build, but have since spent some time developing
a feedback circuit for the SanJose regional as we thought that the
problem was gone. it is not good that the problem is still lurking.

I believe that the spurious values come through one at a time,
with good values on either side, but we will have to check that
carefully in SanJose. One strategy to deal with the
problem for suitably filtered analog inputs would be to lag the
value used by one cycle and look for unreasonable jumps and when
these occur interpolate. Another strategy is to design your analog
system to avoid the value produced by the battery voltage, so you
know when a glitch has put it out of the normal range.
We will have to try this in SanJose if we
see the problem happening with our planned feedback loop.

If you see any more clues or a way to resolve
the problem please post the data.

Eugene

1 Like

Many A/D converters multiplex their input lines (that is that there is 1 A/D converter for up to 8 analog inputs). When these switch from 1 line to another it takes time for the charge to dissipate on the A/D input.

The Wait(.005) may be a ‘quick fix’ as the system may be able to cycle inputs faster than the A/D can dissipate a large charge (in this case ~ 12 v).

Is there a reason that you need to poll the inputs that quickly?

1 Like

Eugene,

On my system I had no spurious data during hours-long tests as long as I had a Wait(.005) in each of the Continuous loops. Shorter Wait() times, e.g. 0.0045 did not work.

So the fix may be that easy, and most teams may have Wait(.005) or longer in place already. I did initially - it was only while experimenting with shorter wait times on our 2nd cRIO that I noticed the problem.

I had cross-posted to http://decibel.ni.com/content/thread/2325?tstart=0 so there is more information there, but it is mostly intended to give a running start to the NI engineer who says he’s going to look into it on Monday.

Good luck in San Jose.

Regards,
David

Daniel,

Thank you for your input. The thread topic is misleading at this point: the issue isn’t the polling rate - mine was actually set at 50 Hz since I didn’t read the value until deltaT > .02 .

The issue is the requirement to stop the task with a Wait() of a certain duration - or risk getting intermittent bad data of a very specific type.

I had thoughts similar to yours, so I ran add’l tests last night (see link above) which ruled out that explanation. Since zero values can be reported spuriously as well, (when the battery jumper is off), there is no “high” voltage on channel 8 to dissipate.

Regards,
David

This conversation needs to be had on the usfirst.org forum, so that the experts can take a look.

As a side note, powering the PD from a power supply is an unsupported mode of operation. It is possible to make it work, but is in no way recommended or supported.

In our prior instance of the problem, We were checking one analog input in the periodic loop of the C++ program when we saw the problem with the prior version of the software. We think that the problem went away with team update 12, certified using a 10 minute test. We can’t be sure about this now, our cRIO is in the crate at SanJose.

Eugene

PS: We have always run our cRIO using a battery with no other power source attached.

Eric,

No need to move the post at this point. NI support has picked up the issue on the cross-post I listed above. - Thx.

Eugene,

NI has confirmed that the problem still exists (see the latest on the cross-post above) - so you will need to be sure you have your Wait()'s in place.

However, it appears to be limited to Wind River. They could not duplicate the problem under similar LabVIEW code.

Regards,
David

We were never polling the analog input at a high rate. Is NI really sure that the semaphore works? If the problem is caused by asynchronous collision between two threads, one reading the battery voltage, and the other thread being the one we use to control the robot, how about moving the battery voltage check to the user code and then use a shared variable to let the driverstation.cpp code to fetch the battery voltage after the user code stores it. Does that fix the problem for you?

Eugene

Eugene,

Just to clarify: the thread topic is misleading at this point. You can poll at a leisurely 50 Hz, but if you don’t stop your thread for .005 sec between data reads you will get the error.

It’s getting late here in Philadelphia (2:30 am), so I can’t check out the complete modification you suggest tonight, but I did replace the GetBatteryVoltage() call with a 0 in the the code below and I do seem to be running error free without any Wait() at all.

I send plenty of custom data to the dashboard, so it should be no problem to read it myself from channel 8, then pack it up and send it along with the rest of my data to the dashboard. I’ll have to move the indicator inside the case statement - I’ll try it in the morning.

Here’s some code from DriverStation.cpp:

void DriverStation::Run()
{
while (true)
{
SetData(); <— this reads the battery voltage, see below
Wait(kUpdatePeriod); <— every .02 sec, see the comment from the header file
}
}

void DriverStation::SetData() <-- no info on setStatusData, header only - FRCComm.h
{
setStatusData(GetBatteryVoltage(), m_digitalOut, m_userStatus,
USER_STATUS_DATA_SIZE, WAIT_FOREVER);

Here’s an interesting comment from DriverStation.h, but I’m not sure how relevant it is - since each Read() in GetAverageVoltage() does use a semaphore.

///< TODO: Get rid of this and use the semaphore signaling
static const float kUpdatePeriod = 0.02;

Time to sleep,
David

From the comments now appearing in the NI threads pointed to above, and the threads that they point to, it appears that many of the semaphores intended to protect analog inputs, relays, solenoids, etc…, are toothless.

David’s use of wait might have been raising the priority of the thread about to access the analog input enough that it was not getting preempted by another thread that was also accessing the analog input, so perhaps fixing the semaphores will cure all of the gremlins.

With luck, now that this has been recognized as a serious problem that is at the root of the glitchy behavior that has been reported in various quarters, perhaps it will be ironed out with a software update before the first regional, and this would be happening in two days.

Eugene

For those that might be having problems with analog inputs when
using WPIlib (C++) on the cRIO, an update was posted at
http://first.wpi.edu/FRC/frcupdates.html
that is reputed to address this problem, amongst other things.
There is a caution to keep a copy of the prior update around in
case there are unexpected issues with the new update.

Toothlessness of locking for semaphores protecting the
registers used to access the analog inputs led to problems with the
battery voltage showing up when other analog inputs were read.
If you are using analog inputs to control things you should have
a look at this update of WPIlib.

Eugene