Technically I’ve gotten ML based game piece detection working, years ago, on a pi 3B, but it was 5 seconds per frame, so not useful on the robot. But I had no idea how to build models properly (still don’t, but I was clueless then) and was mostly doing it because we had a student that was off-the-freaking-chain smart in everything and in doing this he got to play with some GCP VMs with monster GPUs at the time. Fun!
Anyhoo, the PhotonVison docs hit upon what the main problem is early on: About Object Detection - PhotonVision Docs
They build their models to run on the NPU in the Orange Pi 5 series which is a chip optimized to do the math behind AI/ML quick and efficiently. It’s like a GPU except whereas a GPU usually has hundreds or thousands of cores, and dedicated memory, the NPU is just a single core and sharing the main RAM. But the NPU can do linear algebra wicked fast compared to a CPU, much like a GPU.
Which makes an NPU really nice for inference of a model at runtime. Especially on a robot because NPUs are energy efficient.
But, what you can do, in response to “If we can’t use PhotonVision for this task, how can we send data from the Raspberry Pi to NetworkTables?” is put the pi4 in the robot, on the network, use the pynetworktables library to publish data to network tables, and drop a Python script on the pi4 setup so that systemd starts it up as a daemon of sorts and let it handle processing the camera. You can run ML inference on the pi4 CPU if you have a proper model.
You could probably convert the rknn format to onnx. I don’t recall doing this but recall trying to go the other way and getting frustrated. Though I was also trying to quantize the model at the same time. And it was not robot related.
But that also brings me to quantizing the model. If you really want to have a go at ML on a pi4 you’ll want to be going this route. A model is a series of weights which are really just matrices of numbers. And those numbers are usually 32 bit or 16 bit floating point numbers. And NPUs and GPUs love floating point numbers! Wee hoo! Though they’re even faster with integers.
CPUs aren’t as fond of floats. It’s hilarious how many PCs just din’t do float math. And the ARM series in the pi’s aren’t strangers to just leaving out some math operations on the chip like division. Instead they optimize for common operations, like integer math and multiplication.
Anyhoo quantization is when you take those fp32 or fp16 values and scale them into 0-255 integer values. You lose resolution on the inference of what is or isn’t an object but it goes faster. And you rebuild the model so it uses those new weights and the CPU has a much easier time with it all. But it still isn’t great.
All that said I think you’d be better with every other option suggested. What I outlined above are things I’ve learned from mistakes.