Quote:
Originally Posted by euhlmann
I was digging through the source and found this
https://github.com/opencv/opencv/blo...pp#L1536-L1690
So now I'm curious: does the performance boost come from not using CV_NEON in your OpenCV library build, or because NEON intrinsics are significantly slower than using plain assembly?
|
Ideally intrinsics will get pretty close to ASM. But older versions of GCC had some significant problems with them. Newer versions have improved but can still be caught out by weird code. The optimization efforts were somewhere in the 4.8->4.9 range, so if the code is built with older versions that could be the issue.
An objdump would tell for sure, though, if anyone's up to a challenge.