Cool VL Viewer forum

View unanswered posts | View active topics It is currently 2024-09-15 01:04:15



Reply to topic  [ 7 posts ] 
AVX2 optimizations 
Author Message

Joined: 2011-10-07 10:39:20
Posts: 196
Reply with quote
Hello Henri,

some minor AVX2 / SSE optimizations for llface.cpp and llaoavatar.cpp.

It showed up in the profiler, so maybe it helps a little when shadows are enabled.

I did not look at the magic necessary for SSE2NEON support, so might need some ifdefs changed.

Kathrine


Attachments:
avx2_patch.diff.gz [1.13 KiB]
Downloaded 144 times


Last edited by kathrine on 2024-07-23 17:05:17, edited 1 time in total.

2024-07-21 13:04:15
Profile

Joined: 2009-03-17 18:42:51
Posts: 5772
Reply with quote
kathrine wrote:
some minor AVX2 / SSE optimizations for llface.cpp and llaoavatar.cpp.

It showed up in the profiler, so maybe it helps a little when shadows are enabled.
Many thanks ! :)

kathrine wrote:
I did not look at the magic necessary for SSE2NEON support, so might need some ifdefs changed.
As anything "magic" it does not require any specific change: every module including llmath.h (which itself includes llmemory.h where the "magic" happens), i.e. any newview/ module, will "auto-magically" translate SSE2 intrinsics into NEON ones (AVX2 is not supported by sse2neon.h, for now, so it won't be compiled in).

EDIT: contribution accepted for next release. My only changes are the removal of the useless intrinsics headers inclusion in llvoavatar.cpp (this is already taken care of in llmemory.h, which is always included for all newview/ modules via llmath.h, itself included by llviewerprecompiledheaders.h), and the comments style (C++ style should be used everywhere in the viewer sources, with the only exception for the license/copyright block using C-style comment).


2024-07-21 14:06:48
Profile WWW

Joined: 2009-03-17 18:42:51
Posts: 5772
Reply with quote
Ooops !

Just got a crash with this patch...

I need to investigate !

Code:
0   com.secondlife.indra.viewer   0x2159767 LLAppViewerLinux::handleSyncCrashTrace() + 1079
1   com.secondlife.indra.viewer   0x276a558 default_unix_signal_handler(int, siginfo_t*, void*) + 136
2   unknown   0x369d0 /lib64/libc.so.6(+0x369d0) [0x5555462649d0]
3   com.secondlife.indra.viewer   0x2018624 LLVOAvatar::initRiggedMatrixCache(LLMeshSkinInfo const*, unsigned int&) + 692
4   com.secondlife.indra.viewer   0x12420c2 LLRenderPass::uploadMatrixPalette(LLVOAvatar*, LLMeshSkinInfo*) + 66
5   com.secondlife.indra.viewer   0x12408a5 LLRenderPass::pushRiggedMaskBatches(unsigned int, unsigned int, bool, bool) + 261
6   com.secondlife.indra.viewer   0x12467db LLDrawPoolAlphaMask::render(int) + 347
7   com.secondlife.indra.viewer   0x1a6fa45 LLPipeline::renderGeom(LLCamera&) + 3189
8   com.secondlife.indra.viewer   0x1d5b398 display(bool, float, int, bool) + 8984
9   com.secondlife.indra.viewer   0x11b5364 LLAppViewer::mainLoop() + 5220
10  com.secondlife.indra.viewer   0x21592aa main + 1626
11  unknown   0x23737 /lib64/libc.so.6(+0x23737) [0x555546251737]
12  unknown   0x5555462517f5 /lib64/libc.so.6(__libc_start_main+0x85) [0x5555462517f5]
13  com.secondlife.indra.viewer   0x992e21 _start + 33


2024-07-21 14:23:26
Profile WWW

Joined: 2009-03-17 18:42:51
Posts: 5772
Reply with quote
Yup, systematic crash on login, wearing a rigged mesh body, and as soon as the latter rezzes...


2024-07-21 14:27:05
Profile WWW

Joined: 2011-10-07 10:39:20
Posts: 196
Reply with quote
Ok, we figured out the patch was actually not SSE, as all the _gather_ intrinsics are AVX2 only.

In addition there was some crash due to unaligned access on Linux, when trying to to do the _mm256_store_ps, while an unaligned storeu worked.

I just wonder, if LLVM on Linux has the same quirk as the one found in the gcc optimization tables (https://www.phoronix.com/news/GCC-Zen-U ... Load-Store) where the optimizer forcefully misaligned access, because the optimizer instruction cost data was wrong. That could explain why it worked on Windows and crashed hard on Linux, if the Linux compiler adds some unalignment offsets as an "optimization". Actually both have the same cost, unless the store crosses a cache line.


2024-07-23 17:09:22
Profile

Joined: 2009-03-17 18:42:51
Posts: 5772
Reply with quote
kathrine wrote:
Ok, we figured out the patch was actually not SSE, as all the _gather_ intrinsics are AVX2 only.
Could you come up with a proper SSE2 version ?... Because, by default, and unless users build the viewer themselves with the AVX optimizations on (or the --tune option for the Linux build script), only SSE2 is used in the official builds...

kathrine wrote:
In addition there was some crash due to unaligned access on Linux, when trying to to do the _mm256_store_ps, while an unaligned storeu worked.
Yes, I could test AVX builds with _mm256_storeu_ps() and it works fine then.

kathrine wrote:
I just wonder, if LLVM on Linux has the same quirk as the one found in the gcc
The crash actually happened with clang builds (I have been using clang for devel builds for a while now, because clang builds much faster than gcc, so I loose less time waiting for the build to complete), so clang and gcc seem to behave exactly the same way in this respect, under Linux... Maybe clang for MSVC would also behave the same.

kathrine wrote:
optimization tables (https://www.phoronix.com/news/GCC-Zen-U ... Load-Store) where the optimizer forcefully misaligned access, because the optimizer instruction cost data was wrong. That could explain why it worked on Windows and crashed hard on Linux, if the Linux compiler adds some unalignment offsets as an "optimization". Actually both have the same cost, unless the store crosses a cache line.
You should not rely at all on default C/C++ compilers automatic alignments: depending on the compiler version and "brand" you can get differently aligned variables, and must always explicitly align() them in the source code to avoid issues, where alignment does matter.


2024-07-23 17:19:41
Profile WWW

Joined: 2009-03-17 18:42:51
Posts: 5772
Reply with quote
I implemented the AVX2 part of this optimization in today's release. :)


2024-07-27 09:18:32
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 7 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software.