Cool VL Viewer forum http://sldev.free.fr/forum/ |
|
Some more AVX/SSE2 for llface.cpp http://sldev.free.fr/forum/viewtopic.php?f=10&t=2149 |
Page 1 of 1 |
Author: | kathrine [ 2021-03-23 01:12:36 ] | ||
Post subject: | Some more AVX/SSE2 for llface.cpp | ||
Hi Henri, the Visual Studio Profiler told that llface.cpp was still a little hot, so i added some more AVX2/SSE2 stuff. It is mostly identical to the previous one, just for the planar projection case. Kathrine P.S. i looked at ll_memcpy_nonaliased_aligned_16() in llmemory.h too and wondered if that is any faster than the VS2017 memcpy() or the one in recent glibc() anymore, especially when AVX2 is available with the stream instructions. Would probably be worth it to just throw out and replace with memcpy() and see what happens. It did not look worse in the profiler for Windows.
|
Author: | Henri Beauchamp [ 2021-03-23 13:40:57 ] | ||||||||||||||||||
Post subject: | Re: Some more AVX/SSE2 for llface.cpp | ||||||||||||||||||
![]()
|
Author: | kathrine [ 2021-03-23 18:24:15 ] |
Post subject: | Re: Some more AVX/SSE2 for llface.cpp |
Sounds good. Your framerates are just 10x as high as mine (AMD Vega 56, Ryzen 2700x, ..., crappy OpenGL on Windows, i never get above 90fps, with EEP and shadows disabled), so you probably see the difference for memcpy() better. I might give it a try with an AVX2 version with non cached stream ops, but my first attempt wasn't really worth it. According to my profile it was just 0.2% of CPU time anyway even with memcpy(). Kathrine |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |