The first thing I recommend is understand the concept.
Typically what you are going to do is teach the vision system to an image that it can track. You'll use Vision Assistant for that. Then extract a pixel value. Take this pixel value, and apply it to the drive motors.
This is an AWESOME tutorial at NI about Vision Assistant
https://decibel.ni.com/content/docs/DOC-14726
The basic concept is to extract the "X" value, typically left and right on the robot in relation to the camera, once you have that value, you determine an "error" value where the center of the pixels is, this error value is used to steer your robot to keep the camera tracking the part in the middle of the camera FOV (field of view). For example say 80 pixels is the middle of the vision frame. If your camera is mounted on your robot, and it sees the object, where is that object in the frame? If it's located 80 pixels, then there is 0 error. If the image is located at 75 pixels, then there is a 5 pixel error. In order to correct that, you need to find some scalar that will take 5 pixels and interpret that into a drive motor value. This will begin to move the robot base to turn it towards the image, and correct the error back to 80 pixels again. Generally a simple "P" algorithm works good to get the concepts then work your way up from that.
I did an old, old concept way back when, but the above NI document is current.
http://dl.dropbox.com/u/31492126/cmu.pdf