Age | Commit message (Collapse) | Author |
|
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.
Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
|
|
mmi."
|
|
|
|
|
|
|
|
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.
Very close in speed to the assembly.
Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.
Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
|
|
Add 1 if negative to get dqcoeff to round towards zero.
10-15% faster than converting to positive before shifting.
Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d
|
|
Change-Id: I2c782d18d9004414ba61b77238e0caf3e022d8f2
|
|
Change-Id: I53f8a160e640c674ea035fc112e207b6dca42598
|
|
Simplify eob calculations based on ssse3 implementation.
General clean up and re-scoping.
Change-Id: I48f282bf9bd28ee9bc2c7a6779be9d45b5a3a3ee
|
|
* changes:
quantize: ignore skip_block in arm
quantize: ignore skip_block in x86
quantize fp: ignore skip_block in arm
quantize fp: ignore skip_block in x86
|
|
|
|
with mmi."
|
|
Change-Id: Icfb70687476b2edb25d255793ba325b261d40584
|
|
Change-Id: I9a963e99f08761f0c8d6a305619270b2f1c4edf8
|
|
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.
Add assert() and comments regarding the usage of skip_block.
Removing the parameter is a fairly involved process so leave it be for
the moment.
Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
|
|
renamed to get32x16var_avx2()
BUG=webm:1404
Change-Id: Icb8f3986c9c9c646e13a69430db7235fc7e1a036
|
|
|
|
BUG=webm:1404
Change-Id: I88aceb07f4db4870a06eee21d87296974ce3221a
|
|
|
|
Change-Id: Ia120ad1064d0b6106d9685cf075bdab373eef19e
|
|
135 -> 34
fixes unused function warnings for highbd_idct32_34_4x32_quarter_[12]
Change-Id: I4f50ff6ea514200af93dd59ff94c7f9717409682
|
|
Despite abs_coeff being a positive value, all the other implementations
treat it as signed which simplifies restoring the sign.
HBD builds cast qcoeff to avoid a visual studio warning. Match
vp9_quantize.c style of casting the entire expression.
Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5
|
|
fixes mismatch between prototypes and definitions
Change-Id: Ib5e7dfcce244dbb8401815be2cdd183d96792652
|
|
* changes:
Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}
Update highbd idct x86 optimizations.
Update 32x32 idct sse2 and ssse3 optimizations.
|
|
|
|
Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files.
Delete some flags that can be enabled during build.
To enable writing denoised YUV, use the following command line:
CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure
--enable-vp9-temporal-denoising
For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP'
Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528
|
|
The result of the xor operation is unsigned. If coeff was negative,
this results in an unsigned value - INT_MIN.
Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0
|
|
BUG=webm:1412
Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f
|
|
BUG=webm:1412
Change-Id: Ia275940af7d7d8637e9a851a9e39d655bfbe4069
|
|
Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70
|
|
|
|
|
|
|
|
Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()
BUG=webm:1412
Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a
|
|
With skip block the neon is about twice as fast as C.
The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.
BUG=webm:1426
Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
|
|
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
|
|
Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a
|
|
in idct x86 code
Change-Id: I5159499a73a5c1b680516f6ca9c3d84f00c35083
|
|
Change-Id: I266e45a3d75a5357c7d6e6f20ab5c6fdbfe4982e
|
|
Change-Id: Ic73e03bab9fdc085146f52094014db4af36ad701
|
|
BUG=webm:1412
Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca
|
|
Change-Id: Ided7895eaf41d5bc9d64fe536a17f5a078da68d4
|
|
Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.
BUG=webm:1412
Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a
|
|
BUG=webm:1404
Change-Id: Ieb8f85c3811b05df78722cb41eeb1166966ceec4
|
|
|
|
|
|
BUG=webm:1412
Change-Id: I945f0fb6807b8948747243794dc7352b959221f7
|
|
* changes:
Extract inlined 16x16 idct sse2 code into header file
Add transpose_32bit_8x4() sse2 optimization
Update x86 idct optimization
|
|
vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2
see:
17fae3a Change to use correct check for halfpel
Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c
|