Age | Commit message (Collapse) | Author |
|
include vpx_ports/msvc.h to avoid issues with snprintf issues with MSVC.
Change-Id: Ida09cff8ee3b84e09fd61de131f84b32c113fa1a
|
|
VSX versions of the SAD functions of width 8.
SADTest Speed Test (POWER8 Model 2.1)
8x4 C time = 68.7 ms (±0.3 ms), VSX time = 31.8 ms (±0.1 ms) [2.2x]
8x8 C time = 55.6 ms (±0.3 ms), VSX time = 18.3 ms (±0.1 ms) [3.0x]
8x16 C time = 46.5 ms (±0.1 ms), VSX time = 15.6 ms (±0.1 ms) [3.0x]
Change-Id: I843f3b34e103b72deeade4a939193d8b53cee460
|
|
Speed tests are added for the SADTest test suite. These test use the
AbstractBench and print the median run time of SAD operations. Speed
tests are disabled by default.
Change-Id: I5d0957248f9b5b307ae2d757d5f8d4761a1dd712
|
|
Speed change is marginal.
Change-Id: I4d548e9763ce43bd546f19132202f7a8509a32bf
|
|
Change-Id: I42dd3df8c13c0a6d08ce28e27e8917b5d831fc1a
|
|
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.
Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
|
|
1. vpx_sadWxH_c
2. vpx_sadWxH_avg_c
3. vpx_sadWxHx3_c
4. vpx_sadWxHx8_c
5. vpx_sadWxHx4d_c
Change-Id: Ie13161e3d73a052ea6ea7bac9cfadf55598fea7a
|
|
Rewrite 64x64.
BUG=webm:1425
Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf
|
|
Rewrite 32x32. Use half the accumulator registers.
BUG=webm:1425
Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13
|
|
Rewrite 16x16. Use half the accumulator registers.
BUG=webm:1425
Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2
|
|
BUG=webm:1425
Change-Id: I7de2500cca4b621f21478c4b0333c56d76dbc9a4
|
|
BUG=webm:1425
Change-Id: I5081b5ce131821d590c53ac1206a94f50cb8b468
|
|
BUG=webm:1425
Change-Id: Id84d97807a6a0fbcc889c4dfe11929d54f85493d
|
|
BUG=webm:1425
Change-Id: I3362e0dded3b46ca032caa7f44db42f324bc596d
|
|
BUG=webm:1425
Change-Id: Ia42e4f36547c5fe12114fb58379e34bce82eb2f2
|
|
BUG=webm:1425
Change-Id: If2ab51e3050e078b0011b174efe41fcb65a15f44
|
|
BUG=webm:1425
Change-Id: Ifc685a96cb34f7fd9243b4c674027480564b84fb
|
|
BUG=webm:1425
Change-Id: Ib454762d1c61b05a98324fe81ad58c9e09784717
|
|
BUG=webm:1425
Change-Id: Ie126553e5fffcdfaf3d82a85b368ac10ce9ab082
|
|
BUG=webm:1425
Change-Id: I068f06c67b841f09ea07c04ada0c2f1706102138
|
|
The previous implementation loaded 8 values (discarding half)
BUG=webm:1425
Change-Id: Icb72a94e2557a4ee2db7091266ab58fd92f72158
|
|
Change-Id: I547d0099e15591655eae954e3ce65fdf3b003123
|
|
Change-Id: Ic9639b1331d8c5cbc207c2a036891ff0137fc56f
|
|
Change-Id: I84e3705fa52f75cb91b2bab4abf5cc77585ee3e2
|
|
Change-Id: I3c4f9d595275669580413a71b3c3c810e7ddcacd
|
|
Change-Id: I60619d28fffd9809f93b1af510a50e1aa02519a9
|
|
applied against a x86_64 configure with and without
--enable-vp9-highbitdepth
clang-tidy-3.7.1 \
-checks='-*,google-readability-braces-around-statements' \
-header-filter='.*' -fix
+ clang-format afterward
Change-Id: Ia2993ec64cf1eb3505d3bfb39068d9e44cfbce8d
|
|
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
|
|
Change-Id: I0d9ab85855eb723f653a7bb09b3d0d31dd6cfd2f
|
|
Change-Id: I6f2481509b0aa94338ed6185f80c4a6b65532280
|
|
+ general clean-up
Change-Id: Ib9dca3d1a3b7f0c1bedef2a26c9ff5ae1c289e8a
|
|
BUG=b:29583530
Change-Id: I296a0b81755e3086bc0a40cb126d0200ff03c095
|
|
there are sse2 equivalents which is a reasonable modern baseline
Change-Id: Ibbe536a5ad1c2cccef6bdcc75c13b3dde35a56ba
|
|
Replace MMX with SSE2.
Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6
|
|
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont.
Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
|
|
this helps some toolchains (vs9) resolve the type of the parameter
Change-Id: I4acc8a844d1e55b766f66482bd6d32998174d70f
|
|
Change-Id: I9582a8d74990125b71e8fe620f7f3f2585a30798
|
|
average improvement ~3x-5x
Change-Id: Ie30748cfbedebbd544b7ef4f286055ccb7f60306
|
|
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.
Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
|
|
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.
This reduces the size of vpxenc/dec by 36k on x86_64 builds.
Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
|
|
On Nexus 7 speed -6 saw ~18% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c
|
|
On Nexus 7 speed -6 saw ~15% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9
|
|
On Nexus 7 speed -6 saw ~30% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: Id12af7d1883243c23e6692e898aea82299633d58
|
|
previously 'bit_depth_', which is later used to calculate 'mask_', would
be left uninitialized in non-high-bitdepth builds
Change-Id: Ia72035f4645baf3bb0f191504f491b934cdf1e0e
|
|
ROUND_POWER_OF_TWO() is defined in vp9 headers currently, avoid it in
non-high-bitdepth code
Change-Id: Ic28b8f95ef7964800475ee8b35be5f9cea9afab6
|
|
Change-Id: If74510370723e497f4f33d988b8b398124edf69b
|
|
Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f
(cherry picked from commit b1a6f6b9cb47eafe0ce86eaf0318612806091fe5)
|
|
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain
Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
|
|
Removed functions:
* vp9_sad_16x16_mmx
* vp9_sad_8x16_mmx
* vp9_sad_16x8_mmx
* vp9_sad_8x8_mmx
* vp9_sad_4x4_mmx
Change-Id: Ic5174b93b64d65d846f0c11e72cab149e9472bc3
|
|
in the function sad32x32x4d and sad64x64x4d the source is aligned to 16 bytes
and not to 32 bytes - the load is now unaligned.
Change-Id: I922fdba56d0936b5cf72e4503519f185645a168c
|