Has any team implemented a q learning algorithm instead of a PID? Just curious. I did this in the lab I work at few times as practice with new algorithms.

I don’t know much about Q learning, but from what I can tell it’s for picking discrete moves rather than a continuous range of possibilities? If that’s the case, it would be a horrible replacement to PID or neural network.

You are right that most q learning algorithms involve searching for the best sequence of finite moves, but there is plenty of work done with a continuous state and action. The best example I can think of is with robot navigation using q learning.

Even with the discrete amount of steps, it still works. (I know first hand). What essentially happens is that its initial state is always being updated (as fast as the sensor(s) can update it at least). What I did was put a delay on the input, 5hz. I gave the learning algorithm 100ms to find the best action(s), another 100ms to execute it, then repeat.