Around the start of this year's build season, I contacted Spectrum 3847 about using OpenCV on an onboard Raspberry Pi for vision processing. They replied describing a problem with ffmpeg and OpenCV causing significant lag. It is probably the same issue you found above.
Well, a few months later and here I am. I created a fully working Node.js based OpenCV processing server. I'm using the Axis VAPIX Video Streaming API (which is just MJPG over HTTP) and it's giving me 25 processed target images (640x480) per second. I created node-vapix to implement this and interface with the camera and I'm using node-opencv for OpenCV bindings.
This entire time, I never even had to introduce FFmpeg into my project. So I'm asking, why even use OpenCV's VideoCapture instead of directly providing images?
oculus.js:
https://github.com/team178/oculus.js
node-vapix:
https://github.com/gluxon/node-vapix
node-opencv:
https://github.com/peterbraden/node-opencv