I may be wrong, but I got the feeling that the issue stems from using OpenMP within a method that it itself called from a child thread of the program (and not from its main thread). Normally, OpenMP would be used from the main thread (the whole point of OpenMP being to avoid dealing yourself with threads). Maybe it works under Windows and not under Linux...
Also, regarding this particular mesh optimizing method, I don't think the result you got while benchmarking it is very relevant (at least on multi-core CPUs). Why ? Because you apparently (if I understood correctly what you wrote) measured the CPU load, but not the relative time slice it takes in the main thread (which would be close to zero and only the cause of mutex locks overhead and such, since this method is called from the mesh repository thread).
On a mono-core CPU, your "5%" would definitely be a relevant measure and would deserve optimization, but on a multi-core CPU and considering this method is called in a child thread, it does not really impact the main thread loop run time (i.e. the rendering itself, the objects list and UI refreshes, plus all the sundry synchronization tasks between threads).
Granted, spreading the mesh optimization on more cores would definitely lower the rezzing time of meshes (so your work is not at all useless), even if by a small amount (relatively to the download time of the said meshes).
I do not know how the VS profiler works, but if possible, it would be worth running it only on the main viewer thread to highlight the hot spots that do count for the user.