Age | Commit message (Collapse) | Author |
|
BUG=webm:1413
Change-Id: I8d7eeae1bd219eb848c1a86071046a477f7a91af
|
|
Change-Id: I4f3052d8748e16b06e9155f8daf22f867dfaa7a3
|
|
|
|
BUG=webm:1413
Change-Id: Id9038226902b2d793fc6c17ac81bb104c1a18988
|
|
Remove comments above #define statements because they get
indented unnecessarily.
https://bugs.llvm.org/show_bug.cgi?id=35930
Add blank lines to prevent comments from being treated as
blocks.
Change-Id: I04dce21b2a10e13b8dc07411a0019c098f6dd705
|
|
At least the changes that don't conflict with 4.0.1
Change-Id: I9b6a7c14dadc0738cd0f628a10ece90fc7ee89fd
|
|
BUG=webm:1413
Change-Id: I14930d0af24370a44ab359de5bba5512eef4e29f
|
|
Make 8-bit functions testing available in high bitdepth.
Change-Id: Ic030c75aa4c6b649c52426abb4bb2122882de0fe
|
|
Change-Id: I21ff81df0d6898170a3b80b3b5220f9f3ac7f4e8
|
|
Allows them to pass the license check in chromium.
BUG=chromium:98319
Change-Id: Iefc1706152a549d8c4ae774c917596bf1c9492d8
|
|
|
|
nasm should infer .text but does not for windows:
https://bugzilla.nasm.us/show_bug.cgi?id=3392451
Change-Id: Ib195465e5f33405f5ff61c4cf88aa2a72640cacb
|
|
1. Delete unnecessary zero setting process.
2. Optimize the method of calculating SSE in vpx_varianceWxH.
Change-Id: I58890c6a2ed1543379acb48e03e620c144f6515f
|
|
|
|
|
|
|
|
Change-Id: Ie60381a0c6ee01f828cd364a43f01517f4cb03e9
|
|
Change-Id: I638507b360c71489ab0e87bd558d2719ad995333
|
|
Which cause failed case:
1. MMI/VpxSubpelVarianceTest.Ref/6
2. MMI/VpxSubpelVarianceTest.Ref/7
3. MMI/VpxSubpelVarianceTest.ExtremeRef/6
4. MMI/VpxSubpelVarianceTest.ExtremeRef/7
Change-Id: I122ca20089e14ac324edd61295cf8f506e06afc8
|
|
Change-Id: I9f95f47bc7ecbb7980f21cbc3a91f699624141af
|
|
|
|
Change-Id: If8b91aaa883c01107f0ea3468139fa24cfb301d2
|
|
Fixes a build issue when relocation is not allowed:
relocation R_X86_64_32 against '.rodata' can not be used when making a shared object
Change-Id: Ica3e90c926847bc384e818d7854f0030f4d69aa0
|
|
SSE2 instrinsic vs AVX2 intrinsic speed gains:
blocksize 16: ~1.33
blocksize 64: ~1.51
blocksize 256: ~3.03
blocksize 1024: ~3.71
Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58
|
|
Change-Id: I7364a157de39eb7137b599808474b8d46d19d376
|
|
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.
Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
|
|
Eliminates the following instruction for the x86 (64 bit)
intrinsic code:
movslq %esi,%rax
Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae
|
|
|
|
Changed the intrinsics to perform summation similiar to the way the assembly does.
The new code diverges from the assembly by preferring unsaturated additions.
Results for haswell
SSSE3
Horiz/Vert Size Speedup
Horiz x4 ~32%
Horiz x8 ~6%
Vert x8 ~4%
AVX2
Horiz/Vert Size Speedup
Horiz x16 ~16%
Vert x16 ~14%
BUG=webm:1471
Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
|
|
Use an intermediate buffer before storing to coeffs when
highbitdepth is enabled.
Change-Id: I101981a1995f1108ad107c55c37d6e09eadb404b
|
|
~10% performance gain. Fixed the cosmetics noted in the
previous commit.
Change-Id: Iddf475f34d0d0a3e356b2143682aeabac459ed13
|
|
|
|
This version is ~1.91x faster than the sse2 version. When
highbitdepth is enabled, it is ~1.74x.
Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
|
|
|
|
* changes:
Add 4 to 3 scaling SSSE3 optimization
Test extreme inputs in frame scale functions
|
|
Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305
|
|
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.
Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
|
|
* changes:
Extend 16 wide AVX2 convolve8 code to support averaging.
Add AVX2 version of vpx_convolve8_avg.
|
|
Change-Id: I51c190f0a88685867df36912522e67bdae58a673
|
|
* changes:
Rename some inline functions in NEON scaling
Generalize 2:1 vp9_scale_and_extend_frame_ssse3()
|
|
Also adds vpx_convolve8_avg_horiz_avx2.
Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf
|
|
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.
The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.
vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.
Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
|
|
|
|
Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5
|
|
* changes:
Refactor x86/vpx_subpixel_8t_intrin_ssse3.c
Add vpx_dsp/x86/mem_sse2.h
Add transpose_8bit_{4x4,8x8}() x86 optimization
|
|
|
|
BUG=webm:1462,766721
Change-Id: Icfa536a8e38623636b96c396e3c94889bfde7a98
|
|
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
|
|
Add some load and store sse2 inline functions.
Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222
|
|
Change-Id: Ic369dd86b3b81686f68fbc13ad34ab8ea8846878
|