Cool VL Viewer forum

View unanswered posts | View active topics It is currently 2021-10-20 17:24:50



Reply to topic  [ 2 posts ] 
CPU thread affinity setting 
Author Message

Joined: 2009-03-17 18:42:51
Posts: 4751
Reply with quote
In today's v1.28.2.40 release, the Cool VL Viewer accepts (under Linux only for now) a CPU thread affinity setting. By default, it is off, and the viewer behaves like all others, letting the OS scheduler take care about dispatching each of its threads to the CPU cores (virtual ones, for a CPU with SMT). The OS is however not aware of the usage of each of the threads spawned by the viewer, and their relative importance for its performance...

Linux (and Windows, but I did not yet implement it for the latter) allows to specify what core(s) a given thread may use. By affecting a thread to a fixed core, you can avoid excessive cache invalidations (which happen when a thread is migrated from one core to another) and when affecting it to more than one core, you may choose cores that share the same caches (e.g. the same L3 cache like in each Zen's CCX). You may also choose the best core (best turbo frequency or lowest heating) for the most important thread (the main thread in the viewer). Finally, by preventing other threads to use that core (or those cores), you may improve the frame rate "smoothness" (less "hiccups" in rezzing-intensive situations such as when moving around), since no other thread will compete with it on the same core(s).

I therefore implemented a way to affect given cores to the main thread of the viewer, with all other threads spawned via its LLThread mechanism being automatically affected to all the other cores (i.e. to the complementary set of cores).

There is just one drawback: because the standard/default mechanism (i.e. the one not using LLThread) for spawning threads (pthread under Linux) is not aware of what core it should affect to child threads, the latter will be bound to the same core(s) as the parent thread; this is an issue for implicit multi-threading happening, for example, in the OpenGL driver (NVIDIA's or Mesa's drivers can do that under Linux). It means that you cannot affect just one core to the main thread if you do not want to loose performances and, in the case of the viewer, two cores will therefore have to be affected to the main thread; also, SMT virtual cores not being true cores, on SMT-enabled CPUs, two full (physical) cores must be affected to the main thread...

So, how to experiment with this and figure out if this could improve your SL experience with the Cool VL Viewer ?...

There is a new "MainThreadCPUAffinity" debug setting available, which holds an unsigned 32 bits number representing a bit mask of the (virtual) cores to affect to the main thread (which means that only the first 32 cores can be affected to it); the child threads will automatically be bound to all other available cores (including the ones above the 32th, if any). This setting is set at 0 by default, meaning no affinity will be set (and the OS scheduler will then do as it sees fit).
Like I pointed out above, in order to avoid seeing child threads spawned aside from the LLThread mechanism starve the main thread, two full cores need to be affected to the latter. Let's say that you wish to affect core 2 and 3 (cores being numbered from 0), you will then need to set, on a non-SMT CPU, the bits 2 and 3 of that bit mask, i.e. set the debug setting value to: 2^2+2^3=12. For a SMT-enabled CPU, we instead use virtual cores in that bit-mask (the first CPU physical core regroups virtual cores 0 and 1, the second, virtual cores 2 and 3, etc), so to affect the full physical cores 2 and 3 (virtual cores 4, 5, 6 and 7), we need to set the mask to: 2^(2*2)+2^(2*2+1)+2^(2*3)+2^(2*3+1)=240
Once the setting value has been modified, you need to restart the viewer so that it takes effect.

Have fun experimenting and let me know how it fares for you... :D

PS: for obvious reasons, the affinity setting is not applied when your CPU got less than 4 available (physical) cores.


2021-09-18 09:06:54
Profile WWW

Joined: 2016-06-19 21:33:37
Posts: 237
Location: San Francisco bay area, CA, USA
Reply with quote
I have taken the opportunity to experiment with this new debug setting. For my AMD Ryzen 7 3700X 8-Core Processor (16 virtual), I have opted to use a mask of 960 (physical cores 3 & 4, zero-indexed). I used the System Monitor to watch activity with mostly only the viewer running. It's easy to see the activity is constrained. Within SL, I have noticed a slight frame rate improvement, especially after a scene is rendered and is largely 'idle'. I don't have specific metrics, however. I appreciate the new feature. Thank you for the continued improvements.


2021-09-19 21:07:08
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 2 posts ] 

Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software.