Cool VL Viewer forum http://sldev.free.fr/forum/ |
|
AVX2 optimizations http://sldev.free.fr/forum/viewtopic.php?f=10&t=2538 |
Page 1 of 1 |
Author: | kathrine [ 2024-07-21 13:04:15 ] | ||
Post subject: | AVX2 optimizations | ||
Hello Henri, some minor AVX2 / SSE optimizations for llface.cpp and llaoavatar.cpp. It showed up in the profiler, so maybe it helps a little when shadows are enabled. I did not look at the magic necessary for SSE2NEON support, so might need some ifdefs changed. Kathrine
|
Author: | Henri Beauchamp [ 2024-07-21 14:06:48 ] | ||||||||||||||||||
Post subject: | Re: AVX2/SSE optimizations | ||||||||||||||||||
![]()
EDIT: contribution accepted for next release. My only changes are the removal of the useless intrinsics headers inclusion in llvoavatar.cpp (this is already taken care of in llmemory.h, which is always included for all newview/ modules via llmath.h, itself included by llviewerprecompiledheaders.h), and the comments style (C++ style should be used everywhere in the viewer sources, with the only exception for the license/copyright block using C-style comment). |
Author: | Henri Beauchamp [ 2024-07-21 14:23:26 ] | |||||||||
Post subject: | Re: AVX2/SSE optimizations | |||||||||
Ooops ! Just got a crash with this patch... I need to investigate !
|
Author: | Henri Beauchamp [ 2024-07-21 14:27:05 ] |
Post subject: | Re: AVX2/SSE optimizations |
Yup, systematic crash on login, wearing a rigged mesh body, and as soon as the latter rezzes... |
Author: | kathrine [ 2024-07-23 17:09:22 ] |
Post subject: | Re: AVX2 optimizations |
Ok, we figured out the patch was actually not SSE, as all the _gather_ intrinsics are AVX2 only. In addition there was some crash due to unaligned access on Linux, when trying to to do the _mm256_store_ps, while an unaligned storeu worked. I just wonder, if LLVM on Linux has the same quirk as the one found in the gcc optimization tables (https://www.phoronix.com/news/GCC-Zen-U ... Load-Store) where the optimizer forcefully misaligned access, because the optimizer instruction cost data was wrong. That could explain why it worked on Windows and crashed hard on Linux, if the Linux compiler adds some unalignment offsets as an "optimization". Actually both have the same cost, unless the store crosses a cache line. |
Author: | Henri Beauchamp [ 2024-07-23 17:19:41 ] | ||||||||||||||||||||||||||||||||||||
Post subject: | Re: AVX2 optimizations | ||||||||||||||||||||||||||||||||||||
|
Author: | Henri Beauchamp [ 2024-07-27 09:18:32 ] |
Post subject: | Re: AVX2 optimizations |
I implemented the AVX2 part of this optimization in today's release. ![]() |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |