Cool VL Viewer forum
http://sldev.free.fr/forum/

Try a threadpool for llimageworker
http://sldev.free.fr/forum/viewtopic.php?f=10&t=2141
Page 1 of 2

Author:  kathrine [ 2021-03-14 21:57:07 ]
Post subject:  Try a threadpool for llimageworker

Hi Henri,

another little patch, not quite polished, but maybe have a look if it is worth it.

It adds a 4 thread pool to the image decode worker thread to increase texture throughput a little bit.

Greetings Kathrine

Attachments:
threadpool.patch.gz [2.66 KiB]
Downloaded 214 times

Author:  Henri Beauchamp [ 2021-03-15 08:57:15 ]
Post subject:  Re: Try a threadpool for llimageworker

Nice code, thanks ! :)

A couple changes needed: isBusy() must also return true when STATUS_QUEUED, and LLImageBase::sRawImageCount (and likely sGlobalRawMemory as well) needs to be turned into an LLAtomicS32.

Even after these changes, I'm still seeing two glitches that will need to be resolved however:
  • In heavy texture fetching situations (e.g. flying over main land with a 512m draw distance) some textures get stuck at the decode step ("DEC" in texture console).
  • At viewer launch, the login panel lower blue strip image gets decoded way too slowly (it stays white for a second or so).

But aside from these glitches, the textures decoding throughput does increase nicely. With some more work, it could be a great addition to the viewer !

Author:  Henri Beauchamp [ 2021-03-15 10:37:43 ]
Post subject:  Re: Try a threadpool for llimageworker

I think I found (part of) the issue. Your sendToPool() code is racy:
Code:
bool LLImageDecodeThread::sendToPool(ImageRequest* req)
{
   for (auto it : mThreadPool)
   {
      if (!it->isBusy())
      {
         it->setRequest(req);
         return true;
      }
   }
   return false;
 }
The problem here being that you test for isBusy() without locking the thread data and assume it will not change when you call setRequest(), and ignore the return result from the latter which is false when isBusy() is true. It means the texture fetcher receives a "decoding queued successfully" event while it may actually have failed to get queued.

The proper code is:
Code:
bool LLImageDecodeThread::sendToPool(ImageRequest* req)
{
   for (auto it : mThreadPool)
   {
      if (it->setRequest(req))
      {
         return true;
      }
   }
   return false;
 }

This solves the startup login screen issue, at least, but I am still seeing a few (but significantly less) "stuck DEC" in the textures console...

Author:  Henri Beauchamp [ 2021-03-15 11:22:37 ]
Post subject:  Re: Try a threadpool for llimageworker

I found the cause for the last problem: isBusy() should return true for all states above STATUS_QUEUED, i.e. change it to read:
Code:
      LL_INLINE bool isBusy()
      {
         return !mCurrentRequest ||
               mCurrentRequest->getStatus() >= STATUS_QUEUED;
      }
Once this is done, everything works fine.

However I see much lower spikes in CPU usage; not a big surprise since instead of taking shortcuts (and risking to see the texture fetcher not being able to recover the decoded data for the previous decoded texture before a new one is assigned to the decode thread), we properly wait for the decode thread to expire.

With a quick benchmark (on mainland, 512m draw distance, at a fixed position and camera FOV and with pre-cached textures, meshes, etc), I see a 15-20 seconds difference in rezzing time after login for a total of 1'45" with the threaded image decoder (and 7 threads, for an octo-core CPU) against 2'00" with the mono-threaded decoder (please note that the accuracy of this benchmark is of only +/- 5s). Not a dramatic increase in performances, but still worth it.

For a better effect, the textures cache file reads/writes would need to be multi-threaded as well (i.e. multithreading LLLFSThread in the same way; note that you'll need to enable the ThreadedFilesystem setting as well): your next challenge ?... :D


EDIT: scratch that !
isBusy() should return false for anything else than STATUS_QUEUED or STATUS_INPROGRESS, meaning the stuck DEC state is still unsolved (but also that we can expect a much higher throughput once this will be fixed).

Author:  kathrine [ 2021-03-16 01:33:49 ]
Post subject:  Re: Try a threadpool for llimageworker

Hi Henri,

i sprinkled a bunch of std::atomic<> things over the code and that seems to not get stuck on decode.

Might be possible to port to LLAtomic but didn't look yet,

Kathrine

Attachments:
llimageworker.cpp [7.56 KiB]
Downloaded 216 times
llimageworker.h [4.73 KiB]
Downloaded 195 times

Author:  Henri Beauchamp [ 2021-03-16 10:25:01 ]
Post subject:  Re: Try a threadpool for llimageworker

kathrine wrote:
i sprinkled a bunch of std::atomic<> things over the code and that seems to not get stuck on decode.
Yep, working just fine and even beautifully now !

Congratulations for a very nice (and very useful) work and contribution !

I'm currently refining it (round-robin threads allocation to avoid always hammering the first threads with requests while they are busy, more stats, adjustment of the number of sub-threads to allocate depending on the number of CPU cores), but it shows a great improvement in rezzing already: "Yava-podding" at 100% speed with a 256m draw distance is becoming an enjoyable experience (only a few blurry textures, compared to a lot more before) !

Provided all my testings go fine, this contribution will be integrated to the next release !

Quote:
Might be possible to port to LLAtomic but didn't look yet
Not needed: I just changed LLAtomic to use std::atomic instead of boost::atomic, so that the viewer does not have both implementations mixed in. Easier than adding the missing (wrapper) methods to LLAtomic, and in the line of the "elvenification" of the code (i.e. using C++11 features and getting rid of old boost equivalents).

Author:  ZaneZimer [ 2021-03-16 12:33:57 ]
Post subject:  Re: Try a threadpool for llimageworker

I have been playing around with kathrine's patch and integrated the suggestions from Henri, in this thread. I even managed, I think, to get sRawImageCount and sGlobalRawMemory switched to LLAtomicS32. My C knowledge is quite old so who knows if I actually did it properly. Given my hardware, I may not see as much benefit at you guys. It does seem to help a bit especially since I keep DD fairly small and use speed rezzing. Thanks for the great work guys. I look forward to the improvement being integrated.

Author:  Henri Beauchamp [ 2021-03-16 14:16:09 ]
Post subject:  Re: Try a threadpool for llimageworker

ZaneZimer wrote:
I have been playing around with kathrine's patch and integrated the suggestions from Henri, in this thread. I even managed, I think, to get sRawImageCount and sGlobalRawMemory switched to LLAtomicS32. My C knowledge is quite old so who knows if I actually did it properly.
You'd better wait till next Saturday... It's easy to mess up the code and end up not actually running the sub-threads of the decoder.

Author:  ZaneZimer [ 2021-03-16 14:57:45 ]
Post subject:  Re: Try a threadpool for llimageworker

Henri Beauchamp wrote:
You'd better wait till next Saturday... It's easy to mess up the code and end up not actually running the sub-threads of the decoder.
That is my intent once I saw your message. I'll stop tinkering. :-)

Author:  Henri Beauchamp [ 2021-03-20 11:23:06 ]
Post subject:  Re: Try a threadpool for llimageworker

Feature implemented in today's release !

I added a few refinements to it (round-robin threads affectation, configurable number of image decode threads in the pool, including auto-sizing of the pool based on available threads concurrency, and a couple useful stats), unleashed the power of the queue thread by removing the silly ms_sleep(1) (1ms is a very large delay for modern CPUs !) calls in LL's LLQueuedThread code and replacing them with yield() calls, and I modified the texture fetcher algorithm by adding "boosting" features to it. The latter features work as follow (and allow to take benefit from the new threaded decoding):

  • A first boosting mechanism intervenes after login and far TPs, and lasts by default for 30 seconds (see the new "Duration of texture fetch boost after TP:" setting in the "Preferences" floater -> "Cool features" tab -> "Miscellaneous" sub-tab).

  • The second (off by default since it incurs frame rate hiccups in texture-heavy rezzing situations) intervenes to boost the number of updates per frame of the textures LOD while your avatar (or more exactly the camera) moves (or turns) around: the faster the move (or turn), the higher the boosting factor; this allows to take into account much quicker the change in the importance to the camera of textures and rez them with an appropriate LOD, reducing greatly the number of blurry or grey textures. The new setting toggle is in "Advanced" -> "Rendering" -> "Textures" -> "Boost textures fetches with speed". Note that to reduce FPS hiccups and as long as you are not too tight on memory, you may also disable the "Scale down fetched textures" setting in that same menu.

Page 1 of 2 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/