Team 4004 Detecting Power Cells with Deep Learning


So far our results with deep learning have been promising. We trained this network on around 300+ images. I’m hoping increasing that number will help to remove some false positives (as this network struggles with lots of false positives.)

19 Likes

What’s your network architecture?

We are currently using Nvidia Digits for training our network, thus we are using Detectnet (the only object detection network Digits supports.)

1 Like

Looks like I need to hurry up and get my kids to build your robot…

3 Likes

Nice. Why did you go with a machine learning approach instead of just a computer vision approach. For example, why wouldn’t color range / contour selection work? e.g. https://www.pyimagesearch.com/2015/09/14/ball-tracking-with-opencv/

How much time does it take to process one image and on what resolution?
Also what camera and co-processor are you using?

Maybe a normal hsv filtering and contour search as @TimPoulsen suggested would be better

@hamac2003 can give the details for the reasoning behind a lot of the theory, but this started as a project for him last year when he was interested in learning about deep learning for object recognition. I chose to accept his quest and purchase him the needed equipment.
From there is has simply been to grow his and other student interest in deep learning with usage in our program.
In short: when you’ve got a stupid smart kid who continues to surpass the offering our program can give him, we have to keep getting harder things for him to work on!

4 Likes

I’ve yet to measure this precisely, but it was running at around 15-20 fps last I checked, so it takes roughly 66 ms per image at 640x480 resolution. I ran this test using a Microsoft Lifecam, but I want to try other options (like the PSEye) as the Lifecam doesn’t appear to have an absolute exposure mode, and I’ve heard that it is limited to 30 fps max (*citation needed.) As for the co-processor, we are using an Nvidia Jetson TX2.

I definitely agree that a hsv filter would be the most robust/computationally inexpensive solution to tracking power cells. The main reason we chose to use Deep Learning was for the educational experience (and because teaching computers to learn on their own is freaking awesome.)

Additionally, if we run Deep Learning on our competition robot this year, we will be more prepared come 2021 when there may be a task Deep Learning is uniquely suited to accomplish (i.e. detecting hatch panels or other oddly shaped game pieces.) Once you know how to effectively use Deep Learning for object detection, it opens up a whole new world of possibilities, because you can track basically anything with a camera (cough cough automatic defense mode that drives at robots with opposite colored bumpers cough cough)

We’ll see how it pans out, but I’d also wager that (depending on how often they replace power cells at competitions) if you used an hsv filter, you may have to re-tune it as the balls become covered in more and more marks from shooters. But as I said, we are doing this more “because we can and it’s super cool” rather than, “because it’s 100% the most efficient solution to this particular challenge.”

7 Likes

Wait, TimPoulsen — you’re saying there’s a way to do computer vision that isn’t machine learning? :astonished:

EDIT: @hamac2003 replied before I did with an excellent explanation. Thanks!


This is a pretty straightforward task for traditional computer vision, and I have a feeling that such an approach will be more robust as the practice area/field changes. It’ll also be easier to debug if something goes wonky. With traditional CV, you can step through the pipeline and see what went wrong (a lighting change, perspective distortion, too much occlusion, etc), while a NN just tells you “I see 5000 power cells” and leaves you with little to no indication why.

That said, this is totally cool and I would be happy to be proven wrong. Keep up the good work!

Hi, I wrote WPILib’s Machine Learning tool. See it here https://docs.wpilib.org/en/latest/docs/software/examples-tutorials/machine-learning/index.html

With a Raspberry Pi 4, you can hit 30fps, with very high accuracy. We also have a public dataset of 500 images. It uses a MobileNet V2, and the Pi takes up far less real estate in the robot than a TX2.

Just want to add that we also have 4,000 unlabeled images from a real field that teams can use to improve their networks.

1 Like

Great to hear there are more resources out there, thanks!

The Lifecam does have an absolute exposure mode; however, it is not amazing and can be very finicky at times.

To enable it, you’ll need to do v4l2-ctl --device /dev/video0 -- set-ctrl=exposure_auto=1 and then v4l2-ctl --device /dev/video0 --set-ctrl=exposure_absolute=(some value). /dev/video0 is the video address of your camera. One note about the absolute exposure is that it may go from super dark to absolutely blown out with no explanation. For example, I’ve found that it happens when going from an exposure of 10 to 11. It could be different for your camera, but be warned.

1 Like

The absolute exposure setting of the LifeCam HD3000 as exposed by Linux has a very wide range but only certain specific values are valid. The CameraServer library has a quirk to support this in a more user-friendly way, so if you use CameraServer to set exposure it will do the right thing. If you’re using v4l2-ctl, the exact valid values that CameraServer uses can be found here: https://github.com/wpilibsuite/allwpilib/blob/master/cscore/src/main/native/linux/UsbCameraImpl.cpp#L109

1 Like

Good to know! On another note, is there a USB camera anyone knows of that gets more than 30 fps? I struck out on the psEye, as in order to use it with the jetson, I’d have to build my own kernel (something I’m not giddy about jumping into). I’ve looked at using a logitech webcam, but it appears as though they are all limited to around 30 fps as well.

This one does 60 fps: https://www.amazon.com/gp/product/B00KA7WSSU
This one does up to 180 fps (and has a global shutter!): https://www.amazon.com/gp/product/B07RB4GYHZ

Sold.

Vision would work (and has been proven to work) - however with all of the noise on the field (yellow robots, lots of balls, and just lots of (moving) obstacles in the camera’s view) machine learning will be more precise as it (most likely) uses color and shape and other features as well.

1 Like

Where can we get access to these images?

I’m getting images from a playstation eye camera I had laying around, using a Jetson Nano with recent jetpack. The camera’s USB ID is 1415:2000, and my gstreamer pipeline for python openCV is “v4l2src device=/dev/video1 ! videoconvert ! appsink”. Your camera might be /dev/video0, and milage might otherwise vary. My CSI camera doesn’t work since I rebuilt openCV to get CUDA support, and I don’t know why.