Unfortunately I do not have code in Python for the following algorithm…
This was my presentation I gave at various academic competitions. This was the 2013 challenge. Ignore the math at the beginning, it is irrelevant.
Basically, the steps are as follows:
- acquire the image
- apply a binary threshold (white is 255, black is 0, pixels are either 255 or 0, white or black)
- Reduce noise using morphological functions such as erode, dilate, and (Gaussian) blur.
- Convert your binary image with reduced noise to a different type of data, a contour (the c++ function is findcontours). (This is my all time favorite algorithm of all time) What this does is organizes your white pixels into a hierarchy of contours. The algorithm was developed in the 1980s.
- Do tests on your contours to make sure you are left with what you want.
Some common ones: area test (get rid of contours too small or too big), concavity test.
- You can also do things like bound a rectangle around your contour, which in itself isn’t very cool, but it provides something that will come in handy this year, a ratio. Take the area of your contour / area of bounding rectangle. The contours this year are L’s, so the ratio might be something like .6, but if you get a false positive through all your tests (say a reflective piece of metal on a robot), it will return a number close to one.
If anyone has any questions, shoot me a pm or comment a question. I’m on ivs right now so I have nothing but time 
edit: 2073 developed in python last year and used opencv. Here is their code:
Edit 2: Spoiler: To differentiate between left and right vision tape, take the moment of the contour and test it vs the center of the bounding rectangle.
if (moment.x < center.x)
contour = right;
else
contour = left;
Then you can match lefts to rights to group them to “create” a yellow tote. How I would match them: take a left contour, double the height of the bounding rect, and if a center for a right height*2 pixels away or closer to the center of the left, they are a pair, match them, and save them off to a struct or class for yellow totes.
If you really want to get fancy, and if your camera is less than the height of the totes, you can check to see if an element of your array of yellow totes are stacked by a simple checking of their center.x values. You have to check all your yellow totes to every other yellow tote. If they are stacked, you can push them to an array, if they aren’t, push them to another array. This might be needed in case you try to grab at the closest yellow tote but it has a tote on top of it.
The last problem is a geometric problem. How do you tell if you’re aligned with box and looking at it’s center. The latter is easy, is the yellowtote.center.x = center of screen (the origin in opencv is the top left. For this calculation I would remap the center to the bottom of the center of the screen, but that’s just me). You can do a proportion of image width/fov_x = center.x/unknown x rotation to center of target.
But that simply lines you up with the center, you could still be looking at the box when it is at an angle to the camera. I haven’t thought about this problem much, but you could “cheat” and simply check the areas of the left and right tape, and if they are about the same (area left/area right ~=1), you know you are lined up. I haven’t thought of a method to explicitly return the degree of offset the tote is to the camera.