Machine Learning or Computer Vision for Object Detection

One of the elements in which my team wants to improve this year is implementing object recognition into our robot. This past game, autonomously being able to align our intake to balls would have helped tremendously.

I’ve also noticed teams starting to use some machine learning solutions to detect objects and am curious to hear peoples experience with such solutions.

Given the fuel cells distinct color and shape, it feels like something that would be perfect for just a simple Computer Vision algorithm in OpenVC. However something like a hatch panel, would be harder to come up with an algorithm for.

I guess my question is have you worked with either of these technologies, and do you feel there is a better opportunity for certain ones. Given the chip shortage, not sure I could even get google corals to plug into the raspberry pi.


Why do you think this would have helped tremendously? Most of the problems I saw teams having picking up balls wasn’t so much aligning to them properly as having an intake that didn’t support intaking at speed or against walls or other field elements. You don’t have a team number listed in your profile so I couldn’t look at any match video to see what issues your team might have had. If you truly are having issues lining up to the ball you might find that some additional driver practice would go a long way towards helping that.

Assuming you still want to solve this problem with vision (I think this is a good solution for situations like auto and spots where driver vision is obstructed) I would do this with Photonvision’s new colored shape feature.


Documentation on this coming soon (before kickoff). Basically, this is another pipeline type that adds circle, triangle, quadrilateral, and polygon detection.


Specific to our robots (flawed in my opinion) design. We had an intake through the frame of our robot. Thus we didnt just pull balls over the bumper, we had to lineup with a lot of accuracy. Its a design flaw that I am trying to improve in software.

Yeah of course driver practice would help. That was not my question.

1 Like

Machine learning seems like overkill for visually identifying a bright yellow ball.

For starters, you would need to assemble or synthesize a large number (!) of annotated training images. As I understand it, a single training cycle on a decent data set can take several hours or even days, which means your testing iterations are much longer.

By contrast, you can tune a vision pipeline to perform basic computational operations in a matter of minutes. It is really easy to ask a computer to find pixels of a specific color. It is very much less direct to ask the computer to look at thousands of images of the thing you want it to recognize in the hopes that it will realize that a region of pixels of a specific color correlates with getting a cookie from the trainer for correctly guessing that it is a game piece centered at particular coordinates in the video frame.

I would never say “don’t try it.” You will learn a lot, and you might even defy conventional wisdom with a creative new solution. But machine learning is definitely the long road to turn your intake toward a fuel cell, probably by several orders of magnitude.

1 Like

Agreed on the overkill part. The only counter point is in future games the object might get more complicated. I’m more thinking about this for growth


Specific to machine learning, a few resource:

Wpilib is supporting Axon this coming year, but I have not yet tried it.

Team 900 has been playing in this space for years and has some cool results. However, a lot of what they do is focused on pushing the boundary of technology, with longer term goals. That’s not necessarily the same as “most robot improvement for least input effort”. They’re far from the only ones, just folks I thought of offhand.

Still, the results do end up looking pretty sweet.

I have little personal experience with the machine learning approach unfortunately.

Taking one step back, aligned with what Veg was getting at:

When the object of interest is of known size, relatively uniform color, and contrasts well with the background, an approach requiring the algorithm to learn from iteration can be inefficient - the parameters of detection are already known, they do not need to be learned. From this perspective - if you haven’t yet, try looking into some of the more “traditional” techniques of filtering for yellow objects on the field, and using solvePNP to get real-world coordinates of the object. From there, path planning can be used to properly align the robot to the object.

As Jason mentioned, photonVision can do the pose identification part. I’m sure limelight won’t be far behind.

Taking one more step back, to where Jason was poking:

If your true goal is “most points for least effort”, solving the “intake inconsistent” issue in hardware is definitely preferable. This would free up your time to improve other aspects of the robot with software.

Not saying this is easy. Convincing the humans with the relevant skills to spend time improving it is rarely trivial. A software fix may be the best answer in your particular situation.

But be careful - “fix hardware defect with software” can be a slippery slope for many reasons.


This is so cool. Wow


That looks awesome. It is really exciting to see facilities like this made available for students to put to use. Definitely gonna play with this…

I do question some of the up-front assumptions made on that page, though:

“For example, a neural network could detect the location of a FRC robot within an image, which is an unreasonable challenge for a HSV-filtering algorithm.”

If I understand this claim, I actually disagree. How could you acquire enough training images of the actual FRC robots that would be on the field with you in time to train your net? The net would undoubtedly learn to see the bumpers and ignore the confusing jumble of robot parts above it.

I could train a HSV filter to see red/blue bumpers in about 10 minutes. Maybe 5.

I am excited to see this technology made more accessible. My only concern is that its promise/motivation is being oversold a bit.


Worth noting that all of our labeled data is available publicly. We are happy to answer any questions people have.


I’m sure you have this documented somewhere but what was the data labeling process like?

I guess that would be my biggest hesitation is I need footage of the field…and it sounds like time to go frame by frame and mark objects? Its almost like functionality based off this, you have to plan for beyond week 1

At the moment, it’s very manual, with a couple of students who wanted to work on labeling data (at least for 2020’s game). The best way I can describe it is to ask “Do you have any students that just enjoy sorting bolts?” which I know sounds absurd but some people just enjoy a kind of repetitive task at times so we enabled them to do something more important to us than sorting bolts. It’s mostly just drawing bounding boxes and choosing a label. I think it’s in the paper on page 9/10. I should add that it isn’t just the students doing this - we have mentors who label data too.

We’ve actually got some potential help for the future to improve this process and make it more about data validation than data labeling. Nothing I can really share yet but there are some clear improvements to be made with our process.

There is also work that FTC is doing (Go Danny Go!) that I’m hoping bleeds back into FRC and enables much greater collaboration between teams for this kind of data… and Axon has me very excited too. It’s a wonderful time for machine learning in robotics competitions.

A lot of that data is actually coming from the images provided by Dave Lavery/NASA and WPI/wpilib folks and I’m really grateful for the effort they put into taking those pictures and making them available to everyone.

Yep. It’s just like the robot hardware in that sense. You can’t stop iterating and there is constant improvement to be made - sometimes it is the process and sometimes it is the data but it’s still constant iterating and refinement.


I imagine a useful future state might be an open collaboration site where teams could upload unlabeled images, label them through a very-efficient web UI, then download a subset of their choice of labeled images for training. Teams can choose to (or not to) trust inputs from other teams, perhaps even leaving reviews/feedback of different packs of data, or correcting inaccuracies as found…

Additionally, I see the basic flow for effectively deploying this involving doing as much pre-training in your shop and with other field-like environments first, running that model all day Thursday during camera calibration and practice matches, record all the raw footage and inferences, and then have a team of people pull a late night providing “yup/nope” feedback to the model and retraining overnight for Friday.

Having the tools and computing hardware to make this happen on a timeline is pretty key to effective rollout.


Makes sense. Thats the number one deterrent for me… just having the time to accurately label all those samples… clearly need a lot of samples to be effective… and there’s no way to automate that obviously.

1 Like

Someone long graduated has tried this and they paid people on Fiverr or a similar service to label robots from TBA.

Surprisingly, the pipeline worked pretty decently:

1 Like

Thats fun.

If you could get a working solution like the video above, the possibilities are pretty cool. If there is a difficult station to lineup to, using that to lineup would be cool.

Although first has a knack for putting blue/white/red tape everywhere. Wonder if that with CV is probably the better route.

So I was looking through the WpiLib stuff for next year, saw the Axon section, and now I have a question…

How well will these machine learning algorithms work between competitions? My biggest fear, we get an amazing tuned model at our shop, then we go to competition, and the lighting is way different, or differences in the capture camera vs the robot camera, or how jerky the video would be from the robot. I don’t want to be panic retagging and relearning all our new video data the tuning day before competition and hope it all works there.


That depends entirely on your training dataset. If you include game piece images under various lighting conditions, it’ll be robust to that.


This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.