Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Actually describe in the C code what is going on.
|
|
Some of the AVX-specific code is not giving enough speed-up to
justify the extra code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Move the FMA4 code into its own section. Avoid some of the duplication
of data resulting from the double use of source files.
|
|
It's better to use __builtin_fma if it works. Use it for gcc 4.6 and
higher. Move the x86-64 dla.h to the correct place.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Branch prediction for the 32-bit implementation and a new optimized
64-bit implementation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|