A Vision Program that teaches itself the game

I am taking an AI/ML course online. I am just wondering if it would be FEASIBLE (I know it is possible) to create a vision system that learns the game by itself?

While it would seem quite hard, instead of writing a different program every year, it could be possible to write one program to play each game.

Such a program would need to be taught the game, what it needs to do. However, it would also need to learn how to use itself.

The last question I have is, has this ever been done? It seems extremely hard so it would be pointless to reinvent the wheel.

What would be the best environment to build something like this in? Currently, I am learning with Octave, however, OpenCV seems to have a lot of useful components, including the ML module.

I’m not a programmer, but no.

The reason is that while a single program may learn the game every year, it has to adapt to the different robots that are built. Some things stay the same, other things change–sometimes pneumatics are an advantage and sometimes not, for example. So the program will need to be changed to fit the robot every year, regardless of whether or not the game changes are minor or major.

Now add to that that no robot in FRC history has ever been fully autonomous beyond automode or a “drive straight and don’t stop”, and the odds are VERY against you actually pulling it off this side of grad school.

I would not go into this sort of project with any sort of expectation of success. I’ve fiddled a bit with general game-playing programs, in fact I wrote one for a science fair when I was in high school. It was successful, but the success criteria was that the results were statistically better than random moves. That’s a much lower bar than I would feel safe with for something controlling a 120 lb. mobile robot.

I know a couple of years ago state of the art game-playing systems could choke unexpectedly on even relatively simple board games. Unless things have improved by leaps and bounds in the last couple of years I wouldn’t even want to be in the same room as a robot controlled by one of these things.

Possibly interesting:

http://games.stanford.edu/index.php/ggp-competition-aaai-14

I know that it would be required to at least program in all the I/O, etc. However, I believe the best robot would be one that gets better at the game by experience. First Match: Roaming in circles, knowing nothing to do. Last Match: Game Pro. It will beat any robot that tries to win!

I want to get my vision program for next year, rolled out with a bit of ML. This way, it would be able to learn how to do better the next time. That is why that computer that plays checkers was so good at playing checkers

Simply put, human brains are still much better at some things than computers… so your answer is no within the context of FRC. For a game as complex as an FRC game, do not expect for a fully autonomous robot control system to be able to outperform the combination of a human brain(s) + control assist.

Computers tend to excel in games that are extremely well defined with few variables. For chess/checkers/backgammon, you may only have a handful of possible moves to a few handfuls of spaces. A basic player is capable of looking at those moves and determining which is “best” right now. An expert player or computer iterates that forward, analyzing several layers deep. If I do this, my opponent’s options change from set X to set Y, which gives me another set of options, etc. You can essentially play the game out for each of the possible moves, and look at which of your current moves has the best outcome.

If you are interested in this topic, which has intrinsic value (even if I wouldn’t recommend applying it at the level you propose), I’d recommend writing a few game solver applications first. Start with a puzzle solver (like Sudoku) where you are essentially writing an algorithm to find the single “right” answer.

Approaching a new game with a mindset “like a computer” could also be fun as well. Just start describing your action table when you play out the game. If I’m located at mid-field and my opponent is between me and the goal, what are my options? What are his options? Is he faster than me? Can does he have more traction/weight than me? Is he taller or shorter? Generally, if you are not capable of explaining all these things in words, adding the complexity of a computer will not help you. However, the process of describing them might lead to good strategies, whether implemented by a computer or a human driver.

-Steven

Ok, I’m on my phone. Lets see how this goes. It’s storming at work so I have time.

Machine learning for vision is rather common, but the approach you want to take isn’t feasible due to your lack of training examples. The rule of thumb is that you need at least 50 training examples to start learning from. You simply won’t have enough training examples to get a result worth the effort, or any noticeable result at all.

Moving on. You can use machine learning for calculating distance from characteristics in the image. You have to have training examples though. So you’d go out, record contour characteristics such as height, width, area, and center.x and .y. then you manually input distance. You do this from as many points as you possibly can bear to do. Then you run a gradient descent algorithm(regression) or apply the normal equation. You can scale your data if you don’t think it’s linear, such as taking the natural log of contour height. For this example, you are dealing with 6 dimensions, so it is impossible to visualise. You just have to guess what scaling is needed. Then you apply the squared error function (predicted-actual)^2, also called your resudual. You want this to be as close to zero as possible. This can also be applied to game pieces.

Another application is shooting pieces. You have a chart of inputs such as motor speed, angle, and distance, and the output is a 1 or 0: making a basket or miss. You have a 3d plot now. There exists a line, or multiple lines virtually the same, in 3d space that garuntees making all your shots (given your robot is 100% consistent).

Another type of ai is path planning. If you have a depth map of all your objects in front of you, then you can apply the a star path planning to get to a certain location on the field given you have a means of knowing where you are on the field. (Cough cough encoders on undriven wheels or a vision pose calculation)

I might have forgetten somethings. Feel free to ask questions.

Disclaimer: all these calculations can be done virtually instantly using octave or matlab. The a star is a bit more intensive. It is an iterative algorithm to my understanding.

This is possible. A couple of years ago i made a pong game that taught itself how to move the pedal to block the ball back. It taught itself with a neural network. The fitness was base on how long it could play without losing.

^^ That seems like something I want in my next year program. I would like it if I could have a tablet pc for the driver station, with the robot constantly generating a map of the field. If you click on a location on the field in the tablet, the robot could automatically navigate there with a high accuracy.

However, for that to be possible, the program would need to know where all the obstacles are. How do you suggest getting the exact position of other robots and field elements? Should I have a Kinect (or a couple), outputting the distance to all the field elements?

This gives me another question. What does the Kinect distance map look like? How do you get the distance measurement from a single pixel?

In addition to locating other robots and field elements, you also need to know the position of your own robot, e.g. with simultaneous localization and mapping.

You will not be able to know the exact positions of everything, and thankfully you do not need to. It’s standard practice to go further around obstacles than strictly necessary to allow for inaccuracies. This would happen when you’re deciding on the nodes to feed into A*, or whatever other algorithm you want to use.

For example:
If you use a Voronoi diagram, you’ll get nodes that are maximally far from obstacles.

And before using a visibility graph, you’d typically expand all of the obstacles by some constant distance.

The Kinect’s distance map is pretty simple: you get a 2-d array with a distance for every pixel.

A lot of the stuff I do revolves around object avoidance and detection along with some simple and complex searching pattern algorithms and I have to say a Kinect for the first bit would not be my first choice. I got the chance to use a lydar this season and boy was it nice.

This is the sort of stuff I work with on my NASA Centennial Challenge sample return team though I would also say that even the best software developers in the challenge couldn’t design an autonomous robot to combat the decision making of humans at this point in time.

This is a cool concept, and I definitely don’t want to discourage you from exploring AI to the fullest, however I want to ensure that your expectations are realistic.

First, in FRC, assuming the structure of the game does not change drastically going into next year there is no way you will know where everything on the field is, there are other robotics competitions where this is feasible, however FRC is not one of them. You can incorporate some awesome sensors into the robot and give it a ton of information, however it will never be able to process all of the info that is relevant as accurately and quickly as a human can.

Second, really think about why you would want to do this. Does it offer a competitive advantage? Do you want to do it because It will look cool? What would be cooler, a well driven robot with some automated features to assist the driver that performs very well? Or a robot that learned how to play the game its self but functions poorly compared to human operated robots built by teams with less programming expertise. Which is more inspirational to students (that is the goal in the end right?).

AI has it’s place, I spent a lot of time in college studying AI and robotics. I then got out into the real world and realized that as cool as AI is it isn’t usually the right solution. In order of preference for robotics typically your first question is can this be done faster, better, or safer if a human is controlling the robot. Then in order of preference we go through
1.) How do we Control the environment
2.) How do we react to the uncontrollable aspects of the environment
3.) How do we improve our reactions.

It isn’t until you get to 3 that AI comes up. Even then most applications it is easier to “Teach” by giving the directions directly on how to improve, rather than letting the robot learn on its own.

I love AI and there are some great competitions out there where it is key to winning. FRC is not one of them. The top AI labs in the world could not write an FRC legal AI that could beat student drivers in any FRC game (other than perhaps 2001 (? the year where 71 grabbed both goals and then shuffled) ).

My advice would be that instead of choosing an algorithm now and looking for an application you learn all you can now. Then when the game is released you look for the tasks a computer CAN do better than a human. Some examples I can think of are Aiming a shooter (2006), adjusting height for positioning an arm (2005, 2007, 2011 and others), and automating those features. If precise positioning on the field is important then maybe that is what you need to automate, however I don’t think that trying to generate a full field map is the best idea, instead let the driver handle gross movement and then allow him to automate the fine adjustment based on vision (or other) sensor feedback.

I am thinking about the problem for A* a bit more universally. Say, the field was divided into pixels, Maybe 64px long and 32px wide. The depth sensor would basically find the obstacles and mark the pixels they fall under as a place to avoid. The field would be described in an extremely detailed configuration file. Some pixels, like field elements and walls would be dead zones – navigate away from them. Anything else would be something that can move, be it a gamepiece or another robot. The algorithm could be programmed to have this as a low priority. This means, ram into here if there is no way around or if the way around is too far or impractical. The robot, then, will have the knowledge to navigate around the field with an excessively high accuracy – higher than a human player could achieve. After the map is generated, the cRIO will be sent a signal to turn – turn while told to turn. This will get the robot aligned to start. Next, the cRIO will be told to move forward. This turn command and forward command will constantly be sent, sending the robot in the right direction.

As of what everyone is saying, it seems as though a vision program that teaches itself the game would be highly impractical (and impossible for someone at my level).

However, there are a couple AI algorithms that would be lovely, like a robot that uses A* to navigate to a location automatically, or some Machine Learning algorithm to perfect the robot’s shots.

Also, JamesBrown, I want to try AI for a few reasons. I want to try something that is challenging, that if I perform well enough, will pay back in the end. AI is something cool. I know many places where some ML/AI would be just awesome, and would increase the reliability of many systems. It is a very low percentage of my interest to do this for bragging rights or for showing off.

I don’t know if you picked this up on my long post, but the method I proposed was undriven wheels with encoders or doing a pose calculation on a vision target. If you’re really wondering what camera pose is…
](Camera Calibration and 3D Reconstruction — OpenCV 2.4.13.7 documentation)

My team last summer got a birds eye view of the object in front a kinect to work: http://www.chiefdelphi.com/media/photos/39138 The next step was to implement a* path planning but we never got it to work (it is still on our to do list). (The objects in view are soccer balls, that is why they are all the same size in the top view)

On a side not. Slam is so cool. For anyone interested: ](Microsoft Research – Emerging Technology, Computer, and Software Research)

Yash, check the dropbox (pm me your email if you want to be included into the dropbox. It has…23 sample vision programs ranging from our 2012-2014 code, to game piece detection for 2013 and 2014, to depth programming. I passed the torch of computer vision to a student who uses github, so don’t be surprised if it gets switched over): TopDepthTest. It is the program that the image I linked to is from. The kinect depth map allocates distance as a pixel value (colour), for those of you who aren’t aware.

Here is the code to calculate distance from the intensity of a pixel:

Scalar intensity = depth_mat2.at<uchar>(center*);
double distance = 0.1236 * tan(intensity[0]*4 / 2842.5 + 1.1863)*100;

center* is the center of a contour (object of interest that passed all of our previous tests), it has a x and y component.

The kinect is rather intensive. We ran 3 cameras this year an analysed every aspect of the game we possibly could with vision and we got 8 fps on an odroit. You’d most certainly have to have multiple on board computers to handle multiple kinects, but that may not be necessary if you only play to move forward and you don’t have omnidirectional drive capabilities.

I’m waiting for the cheesy poofs to release their amazing autonomous code so I can apply it to autonomous path planning (instead of their predrawn paths)

There are other alternatives to the kinect, I personally prefer the asus xtion. It is smaller, faster, and lighter.*

I am not going to contribute to the reasons as to why you shouldnt do it on the scale you are seeking (Because I think the idea and concept is awesome) but I will say one thing. I have worked with AI Pathfinding algorithms to a decent extent as a Game Programmer, I was freelancing and did work in implementing specific AI Algorithms and various different Game Mechanics.

You have limited Memory on an embedded system like the RoboRIO. Of course the RoboRIO is a massive step up but I am talking about 2 GB RAM vs. 256 MB RAM. A* is in its most basic form an informed Djikstra Pathfinding algorithm. Unlike Djikstra where all moves have a Heuristic cost of 1, A* has ways of assigning a cost to each movement. Depending on your method you will usually get an O((V+E)log(V)) or even O(V^2) algorithm. Pathfinding is an expensive task and if the field was a perfect size where a resolution of 64 px by 32 px worked then you could end up with an extremely large Fringe if enough obstacles exist. In certain scenarios this could be a bit long for an Autonomous period and if proper threading isnt implemented it could cripple your Teleoperated period if you have to wait too long for the calculations to finish in a dynamically changing field of non-standard robots.

Also this could work for shooting but if the game calls for a much different scoring system then your AI and Learning may be even further crippled by complexity… Also you dont want a friendly memory error popping up and killing your robot for that round.

Its an awesome idea and you should definitely follow through but probably not immediately on a 120 lbs. robot. Experiment first with Game Algorithms and get used to implementing it in an efficient and workable way, then move to the robot where efficiency will really matter. I cant speak for how efficient you will need to be… again… Game Developer but again I really like your concept of pixels but I think you should be wary of how much time and the maintainability of your code.

Aside from the many technical limitations, there is one glaring barrier to such a learning system. Vision systems play very specific roles in each game, and in each robot. They typically tracking geometric, retroreflective targets, but the vision systems my team has created have had no say in the robot’s logic – they effectively turn the camera from an image sensor to a target sensor, streaming data about where the targets are back to the robot. For a vision system to learn the game, it must learn not only what the targets look like, but also what data the robot’s central control needs – whether it wants “is the target there?” data like this year’s hot goals, or “At what angle is the target?” as in Rebound Rumble and Ultimate Ascent. Any learning system requires feedback to adapt, and when it has to learn so many different things, designing that feedback system would be at least as complex as making a new vision system, and certainly more error-prone.

I don’t quite understand what the big deal is. A 64x32 grid is only 2048 nodes. I’d expect that you could have an order of magintude more before you ran into speed problems. I also don’t think you’d have memory issues. If you assume that you have 256 MB of memory, half of which is already used, and 2048 nodes then you’d get 64 bytes per node. That seems like plenty.

The two tasks you just described are in themselves not difficult to achieve through a vision program (An example method to do this is called cascade training), the real problem is how the robot would act on it. This task would be a no brainer for Yash, in fact, he has already done it for the 2014 game if I remember correctly. This only looks at one aspect of the game though. It also has to know what is in front of it, find game pieces, know whether it has game pieces, and go to where it needs to to score or pass. We did most of this in our code this year with 3 cameras and we were lucky to get 10 fps. It would take months at least for there to be enough generations of the learning algorithm for there to be any noticeable result.

Isn’t there a simulation for each year’s game? In my mind, that would be a perfect place to start.

I actually wanted to treat this like a game. That is the reason why I thought of creating a field grid. Are you saying that 2GB of RAM won’t be enough. The program will have access to 1GB in the worst case scenario. The data collection using OpenCV will use well under 16 MB of RAM.

If you are saying that A* is too ineficient, what do you suggest I should try instead. If anything, I could have 3 computers – vision processor, AI, and cRIO.

Also, 64 by 32 px was just a crude example. By testing the performance of the system, I could tell whether I need to reduce the resolution or what. Otherwise, I could treat everything as either go there or not.

My buddy programmer and I would like to use an nVidia Jetson Dev board. Should we use that for AI, or vision processing? We can use an ODROID for the other task!

I have already figured out how to effectively use OpenCV and optimize it for a very high performance. I can use a configuration file to make the same setup track multiple target types, and I understand how to use OpenCV to get accurate target data even if the target is tilted!

It depends. Personally I would use the Jetson Dev Board for Vision and ODROID for AI if you are going to separate it that way just from a quick skim of their specifications but I would need to look at it more.

A* is likely your best bet, Pathfinding algorithms are known for being either Time consuming (if restricted) or Memory consuming (if you want speed) and you are right for looking at as if it is a Video Game AI. A* is commonly used because it is fast while being reasonably smart. I would recommend the 3 basics to choose from. Check up on your knowledge of Djikstra, A* and Best First Search. Each have trade offs. Most simply you either get Slow With the Best Path, or Fast with A Good Enough Path. If you have the ability to multi-thread with several CPU’s you could possibly get away with a Multi-Threaded Djikstra approach that can quickly search through the Fringe and determine the true shortest path. But sticking to A* might be your best bet.

If you separate it into 3 computers and each process has access to its own dedicated memory then you could pull it off in terms of processing power, 1 GB should be well enough I would think. I am still concerned though with how you plan on it being useful outside of an awesome project. On the field I still think it will be hard to make it adaptive to a dynamically changing field (though not impossible) sufficiently and too slow to calculate the best path in a short time frame, though I suppose it also depends on what you consider the best path. I think its awesome and I do honestly support the idea (Because I dont have access to the same materials on my team :stuck_out_tongue: ), just trying to gauge where your head is at.

Also I agree if you follow through you will definitely need to constantly tweak (or dynamically update) the resolution of the graph you are analyzing.

I have questions such as how you tested your optimizations and how the data is being collected?

AI is so hard to discuss since it all depends on your goals and how it needs to adapt to its current scenario.