Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32bit memset-sse2.S assumes cache size is multiple of 128 bytes. If
it isn't true, memset-sse2.S will fail. For example, a processor can
have 24576 KB L3 cache and 20 cores. That is 2516582 byte per core. Half
of it is 1258291, which isn't helpful for vector instructions. This
patch rounds cache sizes to multiple of 256 bytes and adds "raw" cache
sizes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
While at it, beef up the test suite for strnlen and add performance
tests for it, too.
|
|
Using the new SSE4.2 instructions is cool but not really the fastest.
Some older SSE instructions can do the trick faster.
|
|
|
|
|
|
|
|
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and
Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and
up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to
4X on Core 2 and up to 2X on Core i7.
|
|
|
|
|
|
|
|
|
|
|