Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This patch enables SSE2 memset for AMD's upcoming Orochi processor.
This patch also fixes the following bug:
For misaligned blocks larger than > 144 Bytes, memset branches into
the integer code path depending on the value of misalignment even if
the startup code chooses the SSE2 code path upfront, when multiarch
is enabled.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32bit memset-sse2.S assumes cache size is multiple of 128 bytes. If
it isn't true, memset-sse2.S will fail. For example, a processor can
have 24576 KB L3 cache and 20 cores. That is 2516582 byte per core. Half
of it is 1258291, which isn't helpful for vector instructions. This
patch rounds cache sizes to multiple of 256 bytes and adds "raw" cache
sizes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
While at it, beef up the test suite for strnlen and add performance
tests for it, too.
|
|
Using the new SSE4.2 instructions is cool but not really the fastest.
Some older SSE instructions can do the trick faster.
|
|
|