Age | Commit message (Collapse) | Author |
|
Added comments related to re-arranging the
elements of the SAD vector to find the
minimum.
Change-Id: I58b702d304a6cdd32f04775fba603e39c19a8947
|
|
In the function vp9_diamond_search_sad_avx(), arranged
the cost vector in a specific order. This ensures that
the motion vector with the least index is selected,
when there exists more than one candidate motion
vector with the minimum cost, thus resolving the
c vs avx mismatch.
STATS_CHANGED
Change-Id: I4f8864f464f9ea2aae6250db3d8ad91cb08b26e2
|
|
|
|
Bug: webm:1793
Change-Id: I85608ac7bb6d3a61649ba342c13c3bf6a39a5dea
|
|
|
|
Added a speed feature to skip every other row
in SAD computation during motion search.
Instruction Count BD-Rate Loss(%)
cpu Resolution Reduction(%) avg.psnr ovr.psnr ssim
0 LOWRES2 0.958 0.0204 0.0095 0.0275
0 MIDRES2 1.891 -0.0636 0.0032 0.0247
0 HDRES2 2.869 0.0434 0.0345 0.0686
0 Average 1.905 0.0000 0.0157 0.0403
STATS_CHANGED
Change-Id: I1a8692757ed0cbcb2259729b3ecfb0436cdf49ce
|
|
Avoided repeated calculation of start MV
SAD during full pixel motion search.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 0.162
0 MIDRES2 0.246
0 HDRES2 0.325
0 Average 0.245
Change-Id: I2b4786901f254ce32ee8ca8a3d56f1c9f112f1d4
|
|
Pack nz_mask with zero. After the result is permuted this has the effect
of ignoring the upper half of the iscan register which is only loaded
with 128-bits. Depending on the optimization level and the load used the
upper half of the ymm register may contain undefined values which can
produce an incorrect eob. If this is large enough it can cause a crash.
Bug: chromium:1431729
Change-Id: I4ebae9fa39f228bdd29dcc19935f3f07759d75f5
|
|
Both are around 3x faster than original C version. 8-bit gives a
small 0.5% speed increase, whereas highbd gives ~2.5%.
Change-Id: I71d75ddd2757b19aa201e879fd9fa8f3a25431ad
|
|
While porting this function to NEON, using SSE4_1 implementation
as base I noticed that both were producing files with different
checksums to the C reference implementation. After investigating
further I found that this saturating pack was the culprit. Doing
the multiplication on the 32-bit values, leads to producing the
correct results with the C implementation.
Change-Id: I40c2a36551b2db363a58ea9aa19ef327f2676de3
|
|
|
|
Change-Id: I43de579e30f2967b97064063e29676e0af1a498f
|
|
Change-Id: I3177251a5935453a23a23c39ea5f6fd41254775e
|
|
In assembly it made sense to iterate using n_coeffs.
In intrinsics it's just as fast to use index and
easier to read.
Change-Id: I403c959709309dad68123d0a3d0efe183874543d
|
|
Match style for vpx_quantize_b_sse2 and prepare to rewrite
ssse3 version in intrinsics.
Need to evaluate the value of threshold breakout before
going further.
Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b
|
|
All of the assembly adds 1 to iscan to convert from
a 0 based array to the EOB value.
Add 1 to all iscan values and remove the extra
instructions from the assembly.
Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9
|
|
over *_set1_*(0)
Change-Id: I136e1798a2ce286480ebb9418db67a2f1e92b9a2
|
|
Add build fix for _mm256_extract_epi16() being undefined.
Bug: b/237714063
Change-Id: I855b1828ce1b6b2b2f063fe097999481881bf074
|
|
~4x faster than vp9_highbd_quantize_fp_32x32_c() for full
calculations.
Bug: b/237714063
Change-Id: Iff2182b8e7b1ac79811e33080d1f6cac6679382d
|
|
Up to 5.37x faster than vp9_highbd_quantize_fp_c() for full
calculations.
~1.6% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: I584fd1f60a3e02f1ded092de98970725fc66c5b8
|
|
Up to 1.80x faster than vp9_quantize_fp_32x32_ssse3() for full
calculations.
Bug: b/237714063
Change-Id: Ic4ae4724fce7ac85c7a089535b16a999e02f0a10
|
|
No change in performance.
Bug: b/237714063
Change-Id: I8ea42759cc4dc57be6a29c23784997cb90ad4090
|
|
this clears warnings under clang-13 of the form:
vp9/encoder/x86/highbd_temporal_filter_sse4.c|196 col 63| warning:
parameter 'v_pre' set but not used [-Wunused-but-set-parameter]
this is the high-bitdepth version of:
73b8aade8 temporal_filter_sse4: remove unused function params
Change-Id: I9b2c9bf27c16975e4855df6a2c967da4c8c63a3a
|
|
Added datarate unittest for 4:4:4 and 4:2:2 input,
for spatial and temporal layers.
Fix is needed in vp9_set_literal_size():
the sampling_x/y should be passed into update_inital_width(),
othewise sampling_x/y = 1/1 (4:2:0) was forced.
vp9_set_literal_size() is only called by the svc and
on dynamic resize.
Fix issue with the normative optimized scaler:
UV width/height was assumed to be 1/2 of Y, for
the ssse and neon code.
Also fix to assert for the scaled width/height:
in case scaled width/height is odd it should be
incremented by 1 (make it even).
Change-Id: I3a2e40effa53c505f44ef05aaa3132e1b7f57dd5
|
|
the issues fixed in this change are related to implicit conversions
between int / unsigned int:
vp9/encoder/vp9_segmentation.c:42:36: runtime error: implicit conversion
from type 'int' of value -9 (32-bit, signed) to type 'unsigned int'
changed the value to 4294967287 (32-bit, unsigned)
vpx_dsp/x86/sum_squares_sse2.c:36:52: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
vpx_dsp/x86/sum_squares_sse2.c:36:67: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
vp9/encoder/x86/vp9_diamond_search_sad_avx.c:81:45: runtime error:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
4290576316 (32-bit, unsigned) to type 'int' changed the value to
-4390980 (32-bit, signed)
vp9/encoder/vp9_rdopt.c:3472:31: runtime error: implicit conversion from
type 'int' of value -1024 (32-bit, signed) to type 'uint16_t' (aka
'unsigned short') changed the value to 64512 (16-bit, unsigned)
unsigned is forced for masks and int is used with intel intrinsics
Bug: webm:1767
Change-Id: Icfa4179e13bc98a36ac29586b60d65819d3ce9ee
Fixed: webm:1767
|
|
Change-Id: If8318068a32da52d15c0ba595f80092611f4c847
|
|
this clears warnings under clang-13 of the form:
../vp9/encoder/x86/temporal_filter_sse4.c:275:39: warning: parameter
'u_pre' set but not used [-Wunused-but-set-parameter]
Change-Id: I21519b5b0b9c21b04b174327415e0e73b56bdfda
|
|
Whether a block is skipped is handled by mi->skip. x->skip_block
is kept exclusively to verify that the quantize functions are not
called for skip blocks.
Finishes the cleanup in 13eed991f
Bug: libvpx:1612
Change-Id: I1598c3b682d3c5e6c57a15fa4cb5df2c65b3a58a
|
|
w/gcc-11
v_these_mv_w is always initialized in this block with _mm_add_epi16();
converting this to a _mm_storeu_si32(tmp) call also works, but
introduces more stack usage
|| ../vp9/encoder/x86/vp9_diamond_search_sad_avx.c: In function
‘vp9_diamond_search_sad_avx’:
vp9/encoder/x86/vp9_diamond_search_sad_avx.c|285 col 19| warning:
‘v_these_mv_w’ may be used uninitialized [-Wmaybe-uninitialized]
|| 285 | new_bmv = ((const int_mv *)&v_these_mv_w)[local_best_idx];
|| | ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vp9/encoder/x86/vp9_diamond_search_sad_avx.c|149 col 21| note:
‘v_these_mv_w’ declared here
|| 149 | const __m128i v_these_mv_w = _mm_add_epi16(v_bmv_w, v_ss_mv_w);
|| | ^~~~~~~~~~~~
Change-Id: I1cd2fcb41030db16f51c94f3a70eb8eb2a526401
|
|
this prevents redefinition warnings if a toolchain sets one
BUG=b/117240165
Change-Id: Ib5d8c303cd05b4dbcc8d42c71ecfcba8f6d7b90c
|
|
nzflag is used as a boolean, it doesn't need to be a sized type, int is
enough (and _mm_movemask_epi8 returns one)
fixes:
vp9_quantize_sse2.c:136:16: implicit conversion from type
'int' of value 65535 (32-bit, signed) to type 'int16_t' (aka 'short')
changed the value to -1 (16-bit, signed)
BUG=webm:1649
Change-Id: I0e3f5278af49d84760f3dfb607f28099cf02f21d
|
|
clang 7 integer sanitizer warns on unsigned->signed conversions when
the highest bit is 1.
BUG=webm:1615
Change-Id: I6381efaff9233254b40cb78f7bcf87090e0ad353
|
|
added files that are affected by clang-format version 7.
BUG=b/120815481
Change-Id: I40662ce962e4f4b1fcdf183b700f85cc5c0f9f82
|
|
Change-Id: If146bbf24f446f71be9147402e6d30533eee99d1
|
|
- Change some unaligned loads to aligned loads
- Preload filter weights
BUG=webm:1591
Change-Id: I4e5e755e1fa5613d1c14191265bf80b0bfd0b75c
|
|
The SSE4_1 version of temporal filter does not distinguish between bd 10
and bd 12.
Speed up:
Function Level:
| !SS_X | SS_X
!SS_Y | 6.44X | 6.37X
SS_Y | 6.56X | 6.63X
Video Level:
2.5% speed up on basketballpass_240p over 150 frames on speed 1,
bitdepth 10, auto-alt-ref=1
BUG=webm:1591
Change-Id: I49aa2ed4acfe80a8d627038322de66cbe691296e
|
|
bug=webm:1595
Change-Id: I7fbb16444a8526eb9479007772fbf52b09ff8338
|
|
BUG=webm:1591
Change-Id: I34fd7e6cbe6f3d5486a669d0895402fd21de7641
|
|
BUG=webm:1591
Change-Id: I926566ac1bf4bac8cb1ce1c6ded9ba940109283e
|
|
Change-Id: I6503ebc79beaac2947992437ac133f3ac4379019
|
|
This adds a preliminary version of vp9_apply_temporal_filter in SSE4.1.
This patch merely adds the function and does not enable it yet.
Speed Up:
| ss_x=1 | ss_x=0 |
ss_y=1 | 19.80X | 19.04X |
ss_y=0 | 21.09X | 20.21X |
BUG=webm:1591
Change-Id: If590f1ccf1d0c6c3b47410541d54f2ce37d8305b
|
|
Values in [q]coeff1 were not correctly stored. This caused a segfault
in the sse2 libvpx__nightly_optimization jobs.
Broken in:
commit 85032bac388917916f7a149173db8b34e93e8f6e
Author: Johann <johannkoenig@google.com>
Date: Fri Dec 21 00:27:00 2018 +0000
fdct_quant: resolve missing declarations
BUG=webm:1584
Change-Id: I5f5fad34ec5e32023f5b40ff3691125754c11ced
|
|
BUG=webm:1584
Change-Id: I43d051c538bf4a6f6210eefa398dc0901ab8d157
|
|
Store outputs using store_tran_low()
BUG=webm:1584
Change-Id: I213abe047e14625c5ef80df7fa6fdc2a31e40fb6
|
|
Pave the way for new quantize_OPT.h helper files.
Change-Id: Ice7225612983f5587a9660af3320c7d0c8bb1c2f
|
|
BUG=webm:1444
Change-Id: I6823635eb1a99c3fcca0a8f091878e3ab2fdd2ac
|
|
Change-Id: I5f878e9b6581bcb427ecc29ce490feb68378f8af
|
|
Change-Id: Ia244bfd4b4eb9d703653792bc4f64c6f5358ae19
|
|
Started from vp9_quantize_fp_sse2 and tweaked to use avx2.
Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
|
|
eob is a pointer to a uint16_t. previously the code would store 64-bits
causing a crash or test failure with the right stack layout.
Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a
|