Deep Reinforcement Learning for FRC Proof of Concept

*Not my project, just a ML mentor for the student.

Some gifs:

This goal was to simply cross the auto line:

Wow that’s facinating. I’ve been wanting to move towards deep learning for robot actions. If I may ask, what structure are you using?I’ve been messing with Deep Q learning and that is what has been giving me e success.

The student is using Unity, and the learning taking place is the PPO algorithm, outlined in

We’re doing a bunch of proof of concepts to see how viable a RL based FRC robot would be before ramping up the training resources.

Some proof of concept tests:
(X,Y) location based on lidar data
Go to a random (X,Y) based on lidar data
Go to a cube based on lidar data
Have scripted robots moving around to make lidar data noisy for the above tests
Sparse vs dense rewards

Those experiments seem amazing!!

Can you post more information? What lidar? How are the results looking?

The “lidar” is purely simulated.

It is doing really well (much better than expected) on all the tests so far. Right now the simulation has 2 other robots in it to simulate alliance partners, which adds a significant amount of noise to the lidar data.

The model being used is a purely dense feed forward neural network. It receives heading and the lidar data as input, and outputs drive power and turn. There are a little over 1.49 million weights to be tuned in this network.

The model update algorithm uses deterministic target policy updates, as opposed to off policies such as deep q learning, which allows for continuous outputs.

Proximal Policy Optimization (PPO) was developed by OpenAI. Like other standard policy gradient methods, data is gathered through interacting with the environment, which typically is used then to compute the gradient and update the parameters using SGD, however in PPO they proposed to also utilize a “clipped surrogate objective” based on trust region policy optimization. This allows for great sample efficiency as well as stability (aka ability to escape high dimensional saddle points).

This deep reinforcement method is also the approach to how many professional researchers are approaching StarCraft II AI.

Edit: Live stream of it learning:

Project Update:

Hi. Im new for CD but just got on to browse and saw this.
In case anyone has further questions I am the said student developing this project and when new developments come on, Ill be sure to update a thread on CD since there may be interest.

This is just a small update. I’ve been testing out some different action-space types, and with early tests, I started to get promising results:

Although these were using an unrealistic perfection of input sensor data, I am now in the progress of increase its realism and trying to get it to run on the least amount of input data possible. Im hoping, if all goes well, to run an Encoder/IMU/Ultrasonic Sensor test on a real bot later in Sep.