Cool VL Viewer forum

View unanswered posts | View active topics It is currently 2024-03-28 08:49:35



Reply to topic  [ 13 posts ]  Go to page 1, 2  Next
Try a threadpool for llimageworker 
Author Message

Joined: 2011-10-07 10:39:20
Posts: 181
Reply with quote
Hi Henri,

another little patch, not quite polished, but maybe have a look if it is worth it.

It adds a 4 thread pool to the image decode worker thread to increase texture throughput a little bit.

Greetings Kathrine


Attachments:
threadpool.patch.gz [2.66 KiB]
Downloaded 209 times
2021-03-14 21:57:07
Profile

Joined: 2009-03-17 18:42:51
Posts: 5523
Reply with quote
Nice code, thanks ! :)

A couple changes needed: isBusy() must also return true when STATUS_QUEUED, and LLImageBase::sRawImageCount (and likely sGlobalRawMemory as well) needs to be turned into an LLAtomicS32.

Even after these changes, I'm still seeing two glitches that will need to be resolved however:
  • In heavy texture fetching situations (e.g. flying over main land with a 512m draw distance) some textures get stuck at the decode step ("DEC" in texture console).
  • At viewer launch, the login panel lower blue strip image gets decoded way too slowly (it stays white for a second or so).

But aside from these glitches, the textures decoding throughput does increase nicely. With some more work, it could be a great addition to the viewer !


2021-03-15 08:57:15
Profile WWW

Joined: 2009-03-17 18:42:51
Posts: 5523
Reply with quote
I think I found (part of) the issue. Your sendToPool() code is racy:
Code:
bool LLImageDecodeThread::sendToPool(ImageRequest* req)
{
   for (auto it : mThreadPool)
   {
      if (!it->isBusy())
      {
         it->setRequest(req);
         return true;
      }
   }
   return false;
 }
The problem here being that you test for isBusy() without locking the thread data and assume it will not change when you call setRequest(), and ignore the return result from the latter which is false when isBusy() is true. It means the texture fetcher receives a "decoding queued successfully" event while it may actually have failed to get queued.

The proper code is:
Code:
bool LLImageDecodeThread::sendToPool(ImageRequest* req)
{
   for (auto it : mThreadPool)
   {
      if (it->setRequest(req))
      {
         return true;
      }
   }
   return false;
 }

This solves the startup login screen issue, at least, but I am still seeing a few (but significantly less) "stuck DEC" in the textures console...


2021-03-15 10:37:43
Profile WWW

Joined: 2009-03-17 18:42:51
Posts: 5523
Reply with quote
I found the cause for the last problem: isBusy() should return true for all states above STATUS_QUEUED, i.e. change it to read:
Code:
      LL_INLINE bool isBusy()
      {
         return !mCurrentRequest ||
               mCurrentRequest->getStatus() >= STATUS_QUEUED;
      }
Once this is done, everything works fine.

However I see much lower spikes in CPU usage; not a big surprise since instead of taking shortcuts (and risking to see the texture fetcher not being able to recover the decoded data for the previous decoded texture before a new one is assigned to the decode thread), we properly wait for the decode thread to expire.

With a quick benchmark (on mainland, 512m draw distance, at a fixed position and camera FOV and with pre-cached textures, meshes, etc), I see a 15-20 seconds difference in rezzing time after login for a total of 1'45" with the threaded image decoder (and 7 threads, for an octo-core CPU) against 2'00" with the mono-threaded decoder (please note that the accuracy of this benchmark is of only +/- 5s). Not a dramatic increase in performances, but still worth it.

For a better effect, the textures cache file reads/writes would need to be multi-threaded as well (i.e. multithreading LLLFSThread in the same way; note that you'll need to enable the ThreadedFilesystem setting as well): your next challenge ?... :D


EDIT: scratch that !
isBusy() should return false for anything else than STATUS_QUEUED or STATUS_INPROGRESS, meaning the stuck DEC state is still unsolved (but also that we can expect a much higher throughput once this will be fixed).


2021-03-15 11:22:37
Profile WWW

Joined: 2011-10-07 10:39:20
Posts: 181
Reply with quote
Hi Henri,

i sprinkled a bunch of std::atomic<> things over the code and that seems to not get stuck on decode.

Might be possible to port to LLAtomic but didn't look yet,

Kathrine


Attachments:
llimageworker.cpp [7.56 KiB]
Downloaded 209 times
llimageworker.h [4.73 KiB]
Downloaded 188 times
2021-03-16 01:33:49
Profile

Joined: 2009-03-17 18:42:51
Posts: 5523
Reply with quote
kathrine wrote:
i sprinkled a bunch of std::atomic<> things over the code and that seems to not get stuck on decode.
Yep, working just fine and even beautifully now !

Congratulations for a very nice (and very useful) work and contribution !

I'm currently refining it (round-robin threads allocation to avoid always hammering the first threads with requests while they are busy, more stats, adjustment of the number of sub-threads to allocate depending on the number of CPU cores), but it shows a great improvement in rezzing already: "Yava-podding" at 100% speed with a 256m draw distance is becoming an enjoyable experience (only a few blurry textures, compared to a lot more before) !

Provided all my testings go fine, this contribution will be integrated to the next release !

Quote:
Might be possible to port to LLAtomic but didn't look yet
Not needed: I just changed LLAtomic to use std::atomic instead of boost::atomic, so that the viewer does not have both implementations mixed in. Easier than adding the missing (wrapper) methods to LLAtomic, and in the line of the "elvenification" of the code (i.e. using C++11 features and getting rid of old boost equivalents).


2021-03-16 10:25:01
Profile WWW

Joined: 2016-06-19 21:33:37
Posts: 337
Location: Columbus area, OH, USA
Reply with quote
I have been playing around with kathrine's patch and integrated the suggestions from Henri, in this thread. I even managed, I think, to get sRawImageCount and sGlobalRawMemory switched to LLAtomicS32. My C knowledge is quite old so who knows if I actually did it properly. Given my hardware, I may not see as much benefit at you guys. It does seem to help a bit especially since I keep DD fairly small and use speed rezzing. Thanks for the great work guys. I look forward to the improvement being integrated.


2021-03-16 12:33:57
Profile

Joined: 2009-03-17 18:42:51
Posts: 5523
Reply with quote
ZaneZimer wrote:
I have been playing around with kathrine's patch and integrated the suggestions from Henri, in this thread. I even managed, I think, to get sRawImageCount and sGlobalRawMemory switched to LLAtomicS32. My C knowledge is quite old so who knows if I actually did it properly.
You'd better wait till next Saturday... It's easy to mess up the code and end up not actually running the sub-threads of the decoder.


2021-03-16 14:16:09
Profile WWW

Joined: 2016-06-19 21:33:37
Posts: 337
Location: Columbus area, OH, USA
Reply with quote
Henri Beauchamp wrote:
You'd better wait till next Saturday... It's easy to mess up the code and end up not actually running the sub-threads of the decoder.
That is my intent once I saw your message. I'll stop tinkering. :-)


2021-03-16 14:57:45
Profile

Joined: 2009-03-17 18:42:51
Posts: 5523
Reply with quote
Feature implemented in today's release !

I added a few refinements to it (round-robin threads affectation, configurable number of image decode threads in the pool, including auto-sizing of the pool based on available threads concurrency, and a couple useful stats), unleashed the power of the queue thread by removing the silly ms_sleep(1) (1ms is a very large delay for modern CPUs !) calls in LL's LLQueuedThread code and replacing them with yield() calls, and I modified the texture fetcher algorithm by adding "boosting" features to it. The latter features work as follow (and allow to take benefit from the new threaded decoding):

  • A first boosting mechanism intervenes after login and far TPs, and lasts by default for 30 seconds (see the new "Duration of texture fetch boost after TP:" setting in the "Preferences" floater -> "Cool features" tab -> "Miscellaneous" sub-tab).

  • The second (off by default since it incurs frame rate hiccups in texture-heavy rezzing situations) intervenes to boost the number of updates per frame of the textures LOD while your avatar (or more exactly the camera) moves (or turns) around: the faster the move (or turn), the higher the boosting factor; this allows to take into account much quicker the change in the importance to the camera of textures and rez them with an appropriate LOD, reducing greatly the number of blurry or grey textures. The new setting toggle is in "Advanced" -> "Rendering" -> "Textures" -> "Boost textures fetches with speed". Note that to reduce FPS hiccups and as long as you are not too tight on memory, you may also disable the "Scale down fetched textures" setting in that same menu.


2021-03-20 11:23:06
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 13 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: No registered users and 16 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software.