Thank you ! Added to next release.

Replacing it with memcpy() seems to make the code slightly slower here (glibc v2.31 with AVX2 optimizations on): 720-730fps against 730-740fps, in my skybox... Beside, I must ensure the code will run well when not optimized for AVX (the official builds are SSE2 ones only), *and* not everyone got a glibc with proper optimizations, *and* using AVX2 opts in glibc could well handicap Intel users when using one of those CPUs (e.g. Coffe Lake, when not setting the appropriate BIOS override for the "AVX offset") that lower their operating frequency when a single AVX instruction gets executed... Until proven LL's ll_memcpy_nonaliased_aligned_16() is slower than memcpy(), I'll keep it...