How to detect missing CanBus or attached device to prevent crash

#4

I believe we used the getFirmwareVersion() function on the TalonSRX to tell if a device was present. I think it returned zero if the device wasn’t connected. This was in 2017, though, so I think there have been some significant changes since then. I’m sure there are a few functions you could find for most devices that depend on the presence of a device (getTemperature() would probably work as well for the talons).

It may be a bit risky, but if you want to try and ensure you have a chance to fix the issue pre-match, you can purposefully crash your code. I wasn’t able to find the specific exception that we threw, but you might be able to find something that isn’t automatically handled by WPILib. Alternatively, you could potentially just call System.exit() (I suppose just exit() in C++) if the talons aren’t present, and I think the code will restart automatically until the issue is fixed. Another option might be to hang the code with a while loop until you find that all devices are connected. I don’t know enough about how WPILib works to know if this will actually cause a visible issue in the driver station, but it might work.
If somebody does try this, you should probably add in some way to bypass this easily, as it could leave a robot sitting entirely dead for a match when it doesn’t need to be. I’m not sure that I’d recommend this approach because of that.

0 Likes

#5

I am going to try to surround some of the code with try catch block, but I have discovered so far that the command I am using to construct the motor controller actually does not appear to throw an exception. If it does not throw an exception, it is difficult to catch it. Instead, I think the code eventually times out trying to speak with the controller and then it returns the error to the robot.

So, in review, I am going to try to use try/catch, but I think some of the other approaches above may work, or rather we should look for a specific test that would return an error code that could be used to determine if the rest of the code should be initialized.

I will let everyone know how I make out.

md

0 Likes

#6

We use Talon SRXs, not Sparks, but we can operate with errors being thrown from missing devices with no observable issues. I’m not sure what we’re doing differently, if anything, me just mechanical man.

Our 2018 code is available here: https://github.com/first95/FRC2018 which may be of some use to someone who knows more than I. @golf_cart_john?

1 Like

#7

Well it does not appear that try/catch will work, because the command does not throw an exception. Rather we appear to be dealing with a subsequent “crash” caused by the condtion.

On a stackoverflow thread I found, a post states…


entry 120

"
Someone should add that one cannot catch “crashes” in C++ code. Those don’t throw exceptions, but do anything they like. When you see a program crashing because of say a null-pointer dereference, it’s doing undefined behavior. There is no std::null_pointer_exception. Trying to catch exceptions won’t help there."

And in the case of the CANSparkMax constructor, it appears to return void, and not a CANError code that I could test. Perhaps Rev Robotics could jump into the thread for insight.

So, I tried setting the Can timeout for the controller, and hoping that it would return a canerror that I could test for

sparkMaxLeft.reset(new rev::CANSparkMax(7,rev::CANSparkMaxLowLevel::MotorType::kBrushless));

if (sparkMaxLeft->SetCANTimeout(3000)) {

cout << “****************************help” << endl;

}

But…

.error: could not convert ‘((DriveSubsystem*)this)->DriveSubsystem::sparkMaxLeft.std::shared_ptrrev::CANSparkMax::.std::__shared_ptr<_Tp, _Lp>::operator-><rev::CANSparkMax, (__gnu_cxx::_Lock_policy)2u>()->rev::CANSparkMax::SetCANTimeout(3000)’ from ‘rev::CANError’ to ‘bool’
if ( sparkMaxLeft->SetCANTimeout(3000)) {
~~~~~~~~~~~~~~~~~~~~~~~^~

How then do we make use of the CANError type for a valid test?

md

0 Likes

#8

“And in the case of the CANSparkMax constructor, it appears to return void…” We use Java not C++ but I think you should be able test for void being returned and take whatever action you want at that point.

0 Likes

#9

“We use Talon SRXs, not Sparks, but we can operate with errors being thrown from missing devices with no observable issues.” It depends on whether the CAN devices are wired in “series” or “parallel” (not the right terms but you get the idea). Ours were all wired in series so if one went down they all went down, so we had no drive motors.

0 Likes

#10

So were ours in 2018. The code/robot was robust against removing a motor controller and reconnecting the remaining controllers into the daisy-chain despite throwing errors.

0 Likes

#11

This is not quite correct. The yellow and green wires that go into talons are pass-throughs. They are connected directly together inside the device. Having a talon fail should not necessarily kill the can bus, and it should not kill the devices downstream of it on the network.

1 Like

#12

We run Talon SRXs and Victor SPXs, so I was considering solving this problem by doing what was discussed here:

Unsure whether this would work for SparkMaxes as well…

0 Likes

#13

I think there is are some misconceptions on this thread. Because a few of our products were mentioned, here are some responses.

Note: The info below applies to Talon SRX, Victor SPX, CANifier, Pigeon-IMU, PCM, and PDP

Unpowering a CAN-bus Talon SRX or Victor SPX will not cause your robot code to crash, and it will not break every CAN devices on the chain.
In fact this can be a useful diagnostic…
https://phoenix-documentation.readthedocs.io/en/latest/ch08_BringUpCAN.html?highlight=driver%20station#approach-2-easier

Creating a CTRE Talon/Victor device in your code that doesn’t exist on the bus will not crash your code. Never has. Never will.

Phoenix will produce messages in your DS reminding you that a device is missing on the bus.
https://phoenix-documentation.readthedocs.io/en/latest/ch20_FAQ.html#driver-station-says-the-firmware-could-not-be-retrieved-and-to-check-the-firmware-and-id

If you disconnect/sever your CAN bus (cut the wires), then you will lose the devices downstream of the severed point. So take the time to tug test your connection points, and scrutinize your wiring.
Some general tips here…
https://phoenix-documentation.readthedocs.io/en/latest/ch08_BringUpCAN.html

I’m not sure which inertial device you were using, but I can report our Pigeon-IMU does not do this.

This is not necessary for our devices. It’s possible you had a user-level bug in your application. A common one is creating a null reference, and then trying to use it.

0 Likes

#14

To recreate your issue, I deployed code with a SPARK MAX defined, but the CAN bus unplugged. I can confirm the same set of error messages, the longer version from the console in vscode:

Unknown error status
Error at WritePacket [CAN.cpp:70]: Unknown error status
at frc::CAN::WritePacket(unsigned char const*, int, int)
at rev::CANSparkMaxLowLevel::GetFirmwareVersion(bool&)
at rev::CANSparkMaxLowLevel::CANSparkMaxLowLevel(int, rev::CANSparkMaxLowLevel::MotorType)
at rev::CANSparkMax::CANSparkMax(int, rev::CANSparkMaxLowLevel::MotorType)
at int frc::StartRobot<Robot>()
at main
at __libc_start_main
Unable to retrieve SPARK MAX firmware version please verify the deviceID field matches the configured CAN ID of the controller, and that the controller is connected to the CAN Bus.

However, this does not cause the code to crash, and I can continue with the rest of the robot code as normal.

Can you confirm what the sparkMaxLeft.reset() function does?

0 Likes

#15

Thanks Will

To answer your question

Can you confirm what the sparkMaxLeft.reset() function does?

The code structure uses shared_ptrs so in DriveSubsystem the following is in the .cpp file

sparkMaxLeft.reset(new rev::CANSparkMax(7,rev::CANSparkMaxLowLevel::MotorType::kBrushless));

and in DriveSubsystem.h

std::shared_ptr<rev::CANSparkMax> sparkMaxLeft;

The .reset, constructs the CANSparkMax motor object and “attaches” it to the shared pointer reference.

It is the structure of the code that is generated from RobotBuilder. Using this approach accesses the motor objects methods are done using a form simular to…

sparkMaxLeft>SetIdleMode(rev::CANSparkMax::IdleMode::kCoast);
…for example .

You may be correct that the code is not crashing as a result of this. It could be my use of the shared_ptr that is adding complexity. I will try to validate this today.

And once I have some more info I will post what I find.

This all said, what is the best approach to test for errors using the CANError class? I wanted to test it as a simple boolean check , (if false all is ok, otherwise something went wrong, but the type is not convertible to boolean.

md

0 Likes

#16

Thanks for that, and it’s good to know that the CAN devices won’t crash the robot if they’re not found. That’ll help solve a problem I thought I had today.
I’ll just note that the gyro that crashed our code I mentioned earlier was in the SPI port.

0 Likes

#17

CANError is a enum class, the easiest way will be to check if it is equal to rev::CanError::kOk

if( sparkMaxLeft->SetCANTimeout(3000) == rev::CANError::kOk) { }

With the firmware update we posted I would recommend leaving the CAN timeout lower however.

0 Likes

#18

Thanks everyone.
I can confirm that a missing can bus, or motor controllers in fact DO NOT crash the code.

Good to know, and also good to know how to actually test the CANError class.
I was just picking on the CANTimeout method as an example to get at the result.

again many thanks, and I hope this thread was informative to all.

md

0 Likes

#19

The SparkMAX do not show up in Phoenix tuner. Our CAN bus scanner can detect SparkMAXes (but don’t detect SRX or SPX with new firmware):

package org.usfirst.frc3620.misc;

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.*;

import edu.wpi.first.hal.can.CANJNI;

public class CANDeviceFinder {
    ArrayList<String> deviceList = new ArrayList<String>();

    private boolean pdpIsPresent = false;
    private Set<Integer> srxs = new TreeSet<>();
    private Set<Integer> spxs = new TreeSet<>();
    private Set<Integer> pcms = new TreeSet<>();
    private Set<Integer> maxs = new TreeSet<>();

    public CANDeviceFinder() {
        super();
        find();
    }

    public boolean isPDPPresent() {
        return pdpIsPresent;
    }

    public boolean isSRXPresent(int i) {
        return srxs.contains(i);
    }

    public boolean isSPXPresent(int i) {
        return spxs.contains(i);
    }

    public boolean isPCMPresent(int i) {
        return pcms.contains(i);
    }

    public boolean isMAXPresent(int i) {
        return maxs.contains(i);
    }
    
    public boolean isDevicePresent(String s) {
    	return deviceList.contains(s);
    }

    /**
     * 
     * @return ArrayList of strings holding the names of devices we've found.
     */
    public List<String> getDeviceList() {
        return deviceList;
    }
    
    /**
     * polls for received framing to determine if a device is present. This is
     * meant to be used once initially (and not periodically) since this steals
     * cached messages from the robot API.
     */
    public void find() {
        deviceList.clear();
        pdpIsPresent = false;
        maxs.clear();
        spxs.clear();
        srxs.clear();
        pcms.clear();

        /* get timestamp0 for each device */
        long pdp0_timeStamp0; // only look for PDP at '0'
        long[] pcm_timeStamp0 = new long[63];
        long[] srx_timeStamp0 = new long[63];
        long[] spx_timeStamp0 = new long[63];
        long[] max_timeStamp0 = new long[63];

        pdp0_timeStamp0 = checkMessage(0x08041400, 0);
        for (int i = 0; i < 63; ++i) {
            pcm_timeStamp0[i] = checkMessage(0x09041400, i);
            srx_timeStamp0[i] = checkMessage(0x02041400, i);
            spx_timeStamp0[i] = checkMessage(0x01041400, i);
            max_timeStamp0[i] = checkMessage(0x02051800, i);
        }

        /* wait 200ms */
        try {
            Thread.sleep(200);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        /* get timestamp1 for each device */
        long pdp0_timeStamp1; // only look for PDP at '0'
        long[] pcm_timeStamp1 = new long[63];
        long[] srx_timeStamp1 = new long[63];
        long[] spx_timeStamp1 = new long[63];
        long[] max_timeStamp1 = new long[63];

        pdp0_timeStamp1 = checkMessage(0x08041400, 0);
        for (int i = 0; i < 63; ++i) {
            pcm_timeStamp1[i] = checkMessage(0x09041400, i);
            srx_timeStamp1[i] = checkMessage(0x02041400, i);
            spx_timeStamp1[i] = checkMessage(0x01041400, i);
            max_timeStamp1[i] = checkMessage(0x02051800, i);
        }

        /*
         * compare, if timestamp0 is good and timestamp1 is good, and they are
         * different, device is healthy
         */
        if (pdp0_timeStamp0 >= 0 && pdp0_timeStamp1 >= 0
                && pdp0_timeStamp0 != pdp0_timeStamp1) {
            deviceList.add("PDP 0");
            pdpIsPresent = true;
        }

        for (int i = 0; i < 63; ++i) {
            if (pcm_timeStamp0[i] >= 0 && pcm_timeStamp1[i] >= 0
                    && pcm_timeStamp0[i] != pcm_timeStamp1[i]) {
                deviceList.add("PCM " + i);
                pcms.add(i);
            }
        }
        for (int i = 0; i < 63; ++i) {
            if (srx_timeStamp0[i] >= 0 && srx_timeStamp1[i] >= 0
                    && srx_timeStamp0[i] != srx_timeStamp1[i]) {
                deviceList.add("SRX " + i);
                srxs.add(i);
            }
        }
        for (int i = 0; i < 63; ++i) {
            if (spx_timeStamp0[i] >= 0 && spx_timeStamp1[i] >= 0
                    && spx_timeStamp0[i] != spx_timeStamp1[i]) {
                deviceList.add("SPX " + i);
                spxs.add(i);
            }
        }
        for (int i = 0; i < 63; ++i) {
            if (max_timeStamp0[i] >= 0 && max_timeStamp1[i] >= 0
                    && max_timeStamp0[i] != max_timeStamp1[i]) {
                deviceList.add("MAX " + i);
                maxs.add(i);
            }
        }
    }

    private ByteBuffer targetID = ByteBuffer.allocateDirect(4);
    private ByteBuffer timeStamp = ByteBuffer.allocateDirect(4);

    /** helper routine to get last received message for a given ID */
    private long checkMessage(int fullId, int deviceID) {
        try {
            targetID.clear();
            targetID.order(ByteOrder.LITTLE_ENDIAN);
            targetID.asIntBuffer().put(0, fullId | deviceID);

            timeStamp.clear();
            timeStamp.order(ByteOrder.LITTLE_ENDIAN);
            timeStamp.asIntBuffer().put(0, 0x00000000);

            CANJNI.FRCNetCommCANSessionMuxReceiveMessage(
                    targetID.asIntBuffer(), 0x1fffffff, timeStamp);

            long retval = timeStamp.getInt();
            retval &= 0xFFFFFFFF; /* undo sign-extension */
            return retval;
        } catch (Exception e) {
            return -1;
        }
    }
}
1 Like

#20

@Will_Toth our experience is different. We had a shorted CAN bus today, and the SparkMAX threw an exception at startup, immobilizing our robot. Perhaps we have different version of the API?

Unhandled exception: edu.wpi.first.hal.util.UncleanStatusException: Code: -35007. Unknown error status
Error at com.revrobotics.CANSparkMaxLowLevel.getFirmwareVersion(CANSparkMaxLowLevel.java:283): Unhandled exception: edu.wpi.first.hal.util.UncleanStatusException: Code: -35007. Unknown error status
at edu.wpi.first.hal.CANAPIJNI.writeCANPacket(Native Method)
Robots should not quit, but yours did!
at edu.wpi.first.wpilibj.CAN.writePacket(CAN.java:77)
at com.revrobotics.CANSparkMaxLowLevel.getFirmwareVersion(CANSparkMaxLowLevel.java:283)
at com.revrobotics.CANSparkMaxLowLevel.<init>(CANSparkMaxLowLevel.java:239)
at com.revrobotics.CANSparkMax.<init>(CANSparkMax.java:146)
at org.usfirst.frc3620.robot.RobotMap.init(RobotMap.java:87)
at org.usfirst.frc3620.robot.Robot.robotInit(Robot.java:53)
at edu.wpi.first.wpilibj.TimedRobot.startCompetition(TimedRobot.java:63)
at edu.wpi.first.wpilibj.RobotBase.startRobot(RobotBase.java:263)
at org.usfirst.frc3620.robot.Main.main(Main.java:20)
0 Likes

#21

At the lowest level the Java CAN API throws this error, we’ve wrapped these errors and send text to the driver station in the latest API (beta v1.1.5) instead of throwing an exception. This will be the case moving forward.

0 Likes

#22

cool. Do I need do anything explicit to be picking up the new API, or is that an “auto-update”?

0 Likes

#23

@ozrien Omar: our old code for sniffing the CAN bus does not seem to be working this year for SRX and SPX. Is the new firmware using different API #s, so that sniffing for 0x02041400+id (SRX) and 0x01041400+id (SPX) needs to be changed to reflect the new API?

I intend to bang up against the Phonix Tuner HTTP endpoint eventually, but fixing the existing code [for the time being] would save a bit of effort in our post-7-lost-build-days period…

0 Likes