2017 971 Software Release

We’ve finally finished cleaning up our software from this year, and are ready to release it. Thanks to all the 971 team members who contributed this year. I’m definitely proud of this year’s bot and the software that was written to make it move. I think I can safely say that this year’s bot continues the trend of being even more complicated than the previous year’s bot.

http://frc971.org/content/2017-software

We are always happy to answer questions about what we do or why we do it, and also love hearing the interesting places that it gets used.

So obviously you guys use a myriad of zeroing techniques and sensors. This year there were even more types than other years, such as the Hall Effect + Incremental encoder, etc.

I was wondering what sort of decision making process led to that choice of sensors (absolute encoder vs potentiometer) with respect to both the technical goals of the combination of sensors and the mechanical properties of the subsystem.

Continued from the 2017 971 CAD thread…

I’m assuming the kernel process scheduling still introduces jitter and the IPC that shuttles the data from the measurement thread to the control loops introduces a non-negligible delay for 971.

I haven’t figured out where the time delta used in the control loops is actually recorded. It might come from RawQueue when they make a new message. It looks like the PWM detecter thread and the PhasedLoop instance in wpilib_interface.cc is used to make the measurements occur at an accurate rate.

Disclaimer: not on 971, so everything that follows is my understanding from reading the code.

The time delta is taken as the difference in sent_time values (this value is set on ScopedMessagePtr::Send()) between two position messages. This means that it records the difference in time between calls to Send() on the superstructure message in wpilib_interface.cc. Their IPC code should be pretty consistent as far as timing goes - it’s just locked queues stored in shm IIRC - but the thread reading sensors is presumably subject to some amount of jitter that 971 deemed unacceptable to assume constant. I’m not sure exactly where that jitter would be coming from, though; it might just be that the scheduler’s resolution isn’t fine-grained enough for 971.

It is mainly a packaging problem. It would have been much much easier to keep with the Pot + encoder + index pulse from a software point of view. It probably cost a week or two of software development time to support all of them. That being said, we are ready for different methods of zeroing in the future now.

The intake’s first reduction was a Versaplanetary gearbox, and that comes with an integrated encoder. It was easiest to just use the integrated encoder (encoder + PWM to signal absolute position in a rotation) with a pot (absolute position overall) to do absolute zeroing. Of all the zeroing methods we’ve done before, that one is my favorite. You turn the robot on and it’s zeroed. That’s pretty nice.

The hood was going to originally be a pot + encoder + index pulse, but when we looked at packaging that all, it got just annoying enough. Since we already had to eat the complexity of supporting multiple zeroing types, adding a third one wasn’t too bad. It’s actually pretty easy to implement, so we just did it.

There’s no way to get an absolute encoder on the turret and indexer, so we had to throw 2 hall effect sensors on it and have fun.

Overall, it was good to push the software team to write new code, but it was annoying to add the interface layers to support the different zeroing methods generically with our profiled subsystem class.

Yes, the sampling noise tripled the noise in our shooter kalman filter. 10 rad/sec is the length of the goal (out of 350 ish). So, that’s ~3%. We need to be an order of magnitude better than that in terms of response. So, we need ~1 rad/sec of error when shooting. I think the shooter loop is one of the hardest pure control loops we’ve ever done. The results speak for themselves. The shot is incredibly consistent, and a big chunk of that is the software. The mechanical system is very nice as well, but we saw significant reduction in spread by controlling the loop better.

I’ve wanted a hybrid kalman filter for a while, and Adam took on the challenge and made it work. He took my fuzzy request and the paper or two that I sent him and made it work. We verified a couple datapoints against Matlab to double check it, and it all matches. Pretty exciting! I’m looking forwards to using it in the future. We built it into our state space infrastructure so we can use it for any loop we want in the future.

That’s correct.

There is actually a lot of jitter in the scheduler. We’ve gotten it down to ~150 uS of jitter. That’s good, but not amazing. At a 5.05 ms period, that’s ~3%, which is outside our error budget (see above). For most loops, 3% is perfectly fine for timing. Hence the reason that we’ve never had to do this before.

I see you found our PWM detector code :slight_smile:

We ran into nonlinearities in the PWM code. The way that Joe described it to me before was that the FPGA will allow a write to go into effect when changing the compare register that the FPGA is using doesn’t affect the output. So, if you are 1.5 ms into a 2 ms pulse and you try to change it to a 1 ms pulse, you get delayed until the next cycle. If you are 1.0 ms into a 1.5 ms pulse and you try to change it to a 2 ms pulse, that will go into effect immediately. The end result is that increasing the length of a pulse has a higher chance of working than decreasing the length of a pulse. That’s a nonlinearity.

We had to do a bunch of system ID on the shooter to get it to stabilize faster. When we ran a step input in, and then added a 49 hz sine wave on top of it, we saw very odd results. LTI (linear, time invariant) system theory says we should have gotten a sine wave out after the system converged to steady state. The result that we saw (red line) is that we got a 2 hz ish saw tooth wave which increased the average velocity. That was making it hard to push our loop to a 30 ms recovery time. The upper two lines are both the commanded voltage and raw PWM signal. If you were to zoom in, you’d see an aliased 49 hz signal as expected.

We started very carefully debugging why this was true. One of the experiments we did which was illuminating was to send a pattern out. Every 5 ms, we would write the next value in the pattern. We hooked up a DIO to toggle every time we started the pattern over. We hooked these two inputs up to an oscilloscope and captured a number of cycles. We then computed in python the time difference between the rising and falling edge of the PWM signal and plotted that (along with the DIO). The end result is what you see in the following plot.

This plot shows that for most of the cycles, we can measure all the PWM values. But, sometimes, we see the lower PWM values being dropped. This is consistent with the hypothesis that the PWM generation was only lengthening pulses, and the control loop was scheduled on top of the middle of the PWM pulse.

Our fix this year was to run an extra PWM port back into a digital input and interrupt when it rose. That let us synchronize our control loops with the PWM pulse, both removing delay and adding determinism. We were able to measure when the PWM pulse happens to within 5 uS.

Why do you need ~30 ms recovery time for ~20 shots / second?

Each shot takes ~20 - ~30 ms to go through the shooter. It’s impossible to spin back up while a ball is in there. It’s been awhile since I looked at the actual response plots, so I could be a bit off on numbers. It might be closer to 50 ms for the shot and ~10 - ~15 balls/sec, which would still give a very short recovery time compared to the 200 ms voltage -> flywheel velocity open loop response time constant. The end result is that when balls are coming in back to back, there is very little recovery time, and it’s on the order of a small number of control loop cycles.

This is some seriously awesome code.

One questions - why polytopes instead of constrained optimization? It seems like your cost function is some weighted L2 norm, and your constraints are linear and convex, so quadratic programming would work great, and has tons of open source, very fast solvers.

The 150 us of jitter is surprisingly high. If you haven’t already, I would highly recommend PREEMPT_RT FULL.

With SCHED_FIFO, mlocked memory, and using a timerfd on a similar ARM platform, maximum jitter on a 1 kHz loop is well under 10 us, even with high cpu usage.

Also - how do you guys deal with teaching the controls and math concepts to high school students? Does your school teach linear algebra or differential equations? What do you do with people who are interested, but haven’t taken calculus yet?

More question! How do you determine your gearbox/system design so that your crossover frequency is ~10 times your 5 ms control loops?

Our problems aren’t really big enough to require full constrained optimization. That’s somewhere on the TODO list in the future, but it’s not holding us back enough. Personally, I’m also a stickler for deterministic runtimes, and we can get with polytopes and simpler state space control systems without doing nearly as much work.

I’m not saying that there aren’t valid use cases for constrained optimization, and I would really like to do it. It’s been a couple years since I’ve looked for solvers. Do you have any suggestions? MPCs are somewhere down on my TODO list.

We’ve locked all the memory in use into RAM with mlock, switched the thread in question over to SCHED_FIFO and are running it at a priority of 40. It’s above most other things on the system, but not all of the NI modules. This is the stock roboRIO kernel. We are running with a custom PI safe mutex for all locks.

We haven’t dug into the source of the latency since it seems to be reasonably bounded. The new roboRIO kernel has event tracking enabled! This makes it easier to debug.

I did trace down a priority inversion with the SPI IRQ thread which was causing us to miss gyro readings. That one was fun.

(FYI, the roboRIO kernel has the patch which splits the timer soft IRQ thread out from the main softirq thread reverted. That’s probably one source of increased jitter and definitely a source of priority inversions. I don’t have measurements which directly show that.)

One of our students had actually taken linear algebra and had some differential equation background, which wasn’t there when I went to MVHS. He was the one who implemented the hybrid kalman filter. We’ve had a couple exceptional students over the years who have done some pretty cool stuff.

In the past, I’ve spent the summer teaching a class about control systems to the students. I assume that the students have taken some physics. Really, our systems aren’t that complicated (second order system), and we are seeing fewer and fewer new system types. You can get surprisingly far with pattern matching and an understanding of the basic concepts. There are really only so many ways to hook a mass up to a motor…

I’ve taught more advanced controls in the past by going straight to statespace discrete time controls and deriving all the math from there. You can work through all the basic concepts in a couple days in a simplified manner which is still real, but drops things like the Z transform and laplace transform. It also introduces some magical library calls to handle some of the hairy math. That gives the students a working knowledge that is technically enough to implement a loop, but not enough to completely debug it if something goes seriously wrong. I then have them try to implement a flywheel controller and deploy it to a robot.

I’m generally pretty happy if I have one student who has put in the time each year to be the controls guy. There’s really a lot of code outside the core controls code that needs to be done. The controls code is also cookie cutter enough that you don’t really need much linear algebra to get something close enough that I can give feedback and help. A basic understanding of controls is also really helpful for helping the rest of the students think about what could be non-deterministic or nonlinear in their code outside the controller.

This year, our best controls student worked on getting the hybrid kalman filter working while I adapted some of the older loops to work for the new bot. I think he learned a huge amount that way, and it definitely pushed him to solve a tough partially unbounded problem which helped him grow. I think that distribution of labor worked fine, though it was different than previous years.

We’ve got a really young group this next year. I think our lead software guy is going to be a sophomore. It’s going to be interesting to see how this goes. I’ll probably be doing more controls this coming year than in previous years, bringing the students along for the ride and pulling them deeper into the control systems as they mature. Then again, they tend to surprise me each year, so we’ll see.

Controls is definitely a part of 971. I enjoy playing with control systems, so there tends to be a piece of our system each year which has some pretty advanced control systems (for FIRST). Personally, I’m fine with there being some controls on our bots that is beyond the students, as long as they can be part of good sized pieces of it. We target ~50% student and ~50% mentor for a lot of things. I think exposing students to what is possible helps them understand how much you can actually do with controls and encourages them to study it in college if they really enjoy it. I’ve had students on multiple different teams ask questions over the years and dig in. A number of the students each year on 971 come away with a better appreciation for what is possible. That also starts to drive discussions about how control systems work, which helps drag them in to learn more. To me, that’s a different type of success that is important as well.

It’s tough to both push the limits and spend lots of time teaching. We don’t get that balance right every year. Sometimes it takes a year where the pendulum swings too far one way to help kick people into gear to help fix it.

We don’t. If knowing the crossover point matters (it did for the flywheel), we run a bunch of sine waves through the system and generate a bode plot. That bode plot was how we found the PWM nonlinearity.

I still think it would have been easier with constrained optimization (the solvers are already written!), though the solution you guys used is pretty slick. As for deterministic runtimes, I’ve found that modern solvers do a good job of knowing when a problem is poorly conditioned, non-convex, or infeasible, and I’ve never had issues with just setting a maximum number of iterations. Honestly, I had way more timing issues getting my problem to fit in cache than I’ve had with the non-determinism of the solvers.

The best solver in my opinion is qpOASES. It’s a bit of a pain to get it to build with high performance linear algebra libraries instead of the built in ones, but honestly, for the size of problems you’d encounter in FRC, I really doubt you’d ever run into performance issues. Solving a problem cold with 12 variables and 20 constraints takes around 1 ms on an embedded ARM platform (with the low-performance libraries). It’s been used in quadruped research robots.

There’s also CVXgen, which isn’t open source like qpOASES (you have to email the guy to get an education license), but allows you to describe your problem in a higher level language ahead of time, and generates a solver for that problem. I had some really strange behavior with variable solve times, which I eventually attributed to cache thrashing, as moving to a slower cpu with more cache gave me 10x performance gain. SpaceX uses CVXgen on their rockets, so it can’t be too bad, though there isn’t much detail about what it’s used for online.

Finally, there’s a CVX add-on for MATLAB that isn’t very fast, but makes prototyping convex optimization really easy.

That said, I doubt MPC would have too much of a benefit if your model is only second order.

We’ve locked all the memory in use into RAM with mlock, switched the thread in question over to SCHED_FIFO and are running it at a priority of 40. It’s above most other things on the system, but not all of the NI modules. This is the stock roboRIO kernel. We are running with a custom PI safe mutex for all locks.

We haven’t dug into the source of the latency since it seems to be reasonably bounded. The new roboRIO kernel has event tracking enabled! This makes it easier to debug.

I did trace down a priority inversion with the SPI IRQ thread which was causing us to miss gyro readings. That one was fun.

(FYI, the roboRIO kernel has the patch which splits the timer soft IRQ thread out from the main softirq thread reverted. That’s probably one source of increased jitter and definitely a source of priority inversions. I don’t have measurements which directly show that.)

That makes sense - I thought you guys were building your kernel. I bet they took out this patch because by default the timer soft irq thread gets bumped up pretty high, which can cause the ethernet softirq to be starved. In NI’s target customer’s industrial setting with various proprietary “real-time ethernet” protocols, this causes issues.

It’s really awesome that you’re able to introduce students to highly practical controls - it’s probably a better experience than most college level controls courses.

Thanks! I’ve heard good things about CVXgen from our controls guys at work. (solving vehicle dynamics problems at the CARS lab at Stanford). I hadn’t considered the pain of fitting into cache. Thanks for the hint.

MPCs do really well with delays. In FRC, there’s a 1 cycle delay on your output. (signal generation -> PWM output -> talon read). When you are trying to push 20+ hz of bandwidth or better, that starts to really matter. Our flywheel loop could have used a MPC this year.

My background with MPCs was a number of years ago. We used mpt-toolbox and that style of MPC where you precompute everything.

Longer term, I’d like to get an ILQR or ELQR controller running the drivetrain for point stabilization. There’s a prototype implementation of it in //y2014/control_loops/python:extended_lqr.

So you mentioned that you would’ve been content sticking with pot + incremental + index for position sensors. Why didn’t you guys just start with absolute encoders? That way, you don’t have to zero since both optical and magnetic encoders will maintain position tracking during poweroff.

The other option is pot + absolute encoder of some sort, or pot + incremental + absolute. We need more than 1 revolution on our sensor to hit the precision requirements. For arm positions, we need to hit 0.005 radians or better. 0.005 radians is 1/16" on a 1’ arm. To hit that, we need to measure 10x better than that. For a range of motion of ~3 radians, that’s 6000 counts, which is more precision than we can get off an absolute sensor.

With the index pulse, we setup DMA to capture the encoder value at the index pulse. That means we are within 1 count every time we zero.

With a CTRE absolute encoder, we average the PWM duty cycle while not moving for a couple seconds, and seem to be able to get close to that. The CTRE encoder didn’t exist when we started.

The main driver before the season would have been laziness. It’s a lot of work to make a widely used software interface generic enough to accept arbitrary sensor combinations. That probably cost a couple long days of engineering to make it work.

Does anybody know if there is documentation on FRC 971’s AOS and how it works? Specifically, I wanted to know how it compares to ROS in areas like pub/sub communications.

There’s not a ton of documentation as far as I know, because AOS is an in-house solution, but I’ll try to explain somewhat briefly:

When two ROS nodes want to communicate, they actually send messages (in a text-based format) through sockets. Even if two nodes are on the same host, data is sent through a socket. This makes for a very flexible system - a node doesn’t have to care where the data it receives was sent from, and it is very easy to have nodes communicate, even if they are written in different languages. This power does come at a cost, though: sending data through a socket is a big no-no in a real-time system. In a real-time system, your code must meet a set of strict timing requirements (i.e. a tick on this main update loop must execute in under 5ms, no matter what). If the code fails to hit these requirements, bad things™ might happen - for example, the control loops on the robot stop go unstable and your robot breaks.

In 971’s AOS code, things are handled very differently. Messages are “sent” by placing them in a buffer in shared memory, which is exactly what it sounds like - a special portion of the system’s memory that can be read and written to by more than one process (program) on a system. This makes for a system that’s a little less flexible than ROS: it can’t really send messages across a network (in its current form), and instead of a defined text-based standard messages are stored in binary representations, so you can definitely have some nasty undefined behavior if you run two “nodes” which each are expecting to use different versions of a particular message type.

As an aside, 971 wrote a really cool custom compiler (transpiler, technically, but even more technically there’s no difference) to generate C++ serialization/de-serialization code for their message descriptor format (“queues”, .q). It’s pretty similar to Google’s protobuf in a lot of ways - just with better real-time guarantees (turns out any dynamic memory allocation also breaks real-time guarantees, so you have to be very careful about how you use memory).

TL;DR: 971’s framework is specifically designed for their needs - that is, the need to maintain incredibly precise real-time guarantees - and is a little less flexible than ROS because of that restriction.

Kyle pretty much nailed it. I just got back from a long business trip.

Almost. We actually compute a hash of the message definition with the compiler of and use that as part of the key to look up the queue in shmem. Incompatible messages will result in the two ends not being connected rather than corruption.

It doesn’t deal with the data evolving very well over time. I keep wanting to revisit that and at least use the protobuf parser instead of a custom tool, or one of the other open-source tools. That’ll let us have less poorly documented code. Protobuf today now has arenas, so it can support hard RT systems. Flatbufs now exist, along with cap’n’proto and others. When we started this in 2012, the software world was in a different place.

ROS2 is trying to handle the hard RT requirements to stay relevant. Turns out they matter in robots :slight_smile: It’s not fully ready though, and I have no metrics on how well it achieves this.