Age | Commit message (Collapse) | Author |
|
Change-Id: I5f878e9b6581bcb427ecc29ce490feb68378f8af
|
|
Change-Id: Ia244bfd4b4eb9d703653792bc4f64c6f5358ae19
|
|
Started from vp9_quantize_fp_sse2 and tweaked to use avx2.
Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
|
|
eob is a pointer to a uint16_t. previously the code would store 64-bits
causing a crash or test failure with the right stack layout.
Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a
|
|
The extra 'e' was causing the chromium license check to flag this file.
BUG=chromium:98319
Change-Id: Ic875ba66370298bf998438d14ff5f7e760293706
|
|
Change-Id: Ie60381a0c6ee01f828cd364a43f01517f4cb03e9
|
|
SSE2 asm vs AVX2 intrinsics speed gains:
blocksize 16: ~1.00
blocksize 64: ~1.17
blocksize 256: ~1.67
blocksize 1024: ~1.81
Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
|
|
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.
Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
|
|
Change-Id: I51c190f0a88685867df36912522e67bdae58a673
|
|
Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5
|
|
Change-Id: I22622faebfcc36f7a4d1f37e3800ae8ab87c8cd4
|
|
BUG=webm:1450
Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
|
|
Move class VpxScaleBase to new file test/vpx_scale_test.h.
Add new file test/vp9_scale_test.cc with ScaleFrameTest.
BUG=webm:1419
Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1
|
|
Change-Id: I1272917c49cf6e6710e52c36535b2fc8c8dced78
|
|
quiets -Wmissing-prototypes
Change-Id: I841cfc019d592f2bc6b3fec5818051a31f4c53b5
|
|
Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0
|
|
Duplicate of transpose_16bit_8x8()
Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f
|
|
Add option in SVC to set the filter type and phase for
the frame level downsampling filters.
For 3 spatial layers: set downsampling filter type to bilinear
and set phase to 8, for lowest spatial layer.
Change-Id: Id81f4b1ba93db19c1cd37b6a46d1281a2c61bc43
|
|
|
|
Change-Id: I2b8a9253f2c3d1fd85304c2970ebe70213870fe9
|
|
Add 31bit pairs before unpacking in x86 block error code
AVX2 code provides a very minor performance improvement.
BUG=webm:1210
Change-Id: I4c82308eaf65741dca2f5c6db9be9c85f905073a
|
|
Add 31bit pairs before unpacking in x86 block error code
BUG=webm:1210
Change-Id: I5ca8c7f7775585a17fe09d6bbfc25e1f2955eb0a
|
|
There is only one avx2 implementation. Drop '_intrin'
Change-Id: I887a0d27d58567eaad49f749f127eca61313f312
|
|
Be specific about the data type size.
Use convenience macro vp9_zero_array.
Change-Id: I5fadf7dbd408befb73820d85db0be4832e8cfcbd
|
|
Approximates division using multiply and shift.
Speeds up both sizes (8x8 and 16x16) by 30 times.
Fix the call sites to use the RTCD function.
Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015:
ece4fd5d2247c9512b31a93dd593de567beaf928
BUG=webm:1378
Change-Id: I0818e767a802966520b5c6e7999584ad13159276
|
|
The scaling filter with zero shift will give sub-sampling for
2x downsampling. Allow for a phase shift to get an averaging filter.
Usage is for source scaling in 1 pass SVC mode for 1:2 downscale.
Reduces aliasing in downsampled image.
Keep the phase to 0/off for now.
Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
|
|
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.
In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.
Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
|
|
|
|
Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb
|
|
|
|
|
|
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use
of these parameters.
scan is used for C code and iscan is used for SIMD implementations.
Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
|
|
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
|
|
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
|
|
Change-Id: Id96a8df33354a7987ce890a3d6798c7375ffa4aa
|
|
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.
Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
|
|
BUG=webm:1365
Change-Id: Id2ed3ebaaaa6a4b68628c23e08b64ea5f1341761
|
|
Create new helper files specifically for converting tran_low_t types.
Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905
|
|
Borrow transition functions from fdct.h nee vpx_quantize_b_sse2
BUG=webm:1304
Change-Id: I9c88c3eec3ff8bb461411d98c26c3c236ea28ef1
|
|
Change-Id: Ifebdc9ef37850508eb4b8e572fd0f6026ab04987
|
|
Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2
|
|
BUG=b/29583530
Change-Id: Iafd05637eb65f4da54a9c857e79204a77646858a
|
|
development has moved to the nextgenv2 branch and a snapshot from here
was used to seed aomedia
BUG=b/29457125
Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0
|
|
Fixed cosmetic issues noted in Change 349854.
Change-Id: I1d94070e4066fa920173013c5a36a30dd1cb357d
|
|
This reverts commit be12fefa4b7d224e9f39275a6bb4fab01b8bae3b
and commit 057c1c4034ba5b9bf360c5c1f600ebc6d0718c3a.
Also, the mismatch between the avx version and the
c version has been fixed.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
For a rt encode using 1080p@60fps material, up to 11% performance
improvement overall was seen.
Change-Id: Icd1f216209ebc6fc0b8da885f32f356fa4355ed0
|
|
|
|
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.
Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.
Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.
Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
|
|
vp9_diamond_search_sad_avx was disabled in:
057c1c4 disable vp9_diamond_search_sad_avx
this removes a missing prototype warning as the prototype is no longer
included in vp9_rtcd.h. the file can be restored if someone gets around
to fixing the issue.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
Change-Id: Ia9fda4b81c53dc5fba7c31d780d761f886940b52
|
|
downsample_2_to_1_ssse3/upsample_1_to_2_ssse3() are local to this module
Change-Id: I78a9de8e1eca475ba1bf137102580c531aa3f7dd
|
|
Denoiser is ~1.5% faster in speed 6~8.
Change-Id: I7b350f3c50cce6773d9c4eded4c0c1b722d0a5fc
|