Age | Commit message (Collapse) | Author |
|
1. vpx_idct32x32_1024_add_lsx
2. vpx_idct32x32_34_add_lsx
3. vpx_idct32x32_1_add_lsx
Bug: webm:1755
Change-Id: I9c24f75e0d93613754d8e30da7e007b8d1374e60
|
|
This should clean up clangtidy warnings
Change-Id: Ifb5a986121b2d0bd71b9ad39a79dd46c63bdb998
|
|
this moves the framework to c++11 and changes *_TEST_CASE* to
_TEST_SUITE
BUG=webm:1695
Change-Id: I07f2c20850312a9c7e381b38353d2f9f45889cb1
|
|
since:
77fa51003 Replace deprecated scoped_ptr with unique_ptr
c++11 has been required so <tuple> is safe to use
Change-Id: I873cb953104b361a8503b5839a3372ce2b99e73c
|
|
googletest imports tuple into testing to allow for compatibility across
c++ versions where tuple may be in std::tr1 or std. fixes deprecation
warnings under visual studio 2017
Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
|
|
BUG=webm:1412
Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f
|
|
Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a
|
|
BUG=webm:1412
Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca
|
|
BUG=webm:1412
Change-Id: I945f0fb6807b8948747243794dc7352b959221f7
|
|
Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898
|
|
BUG=webm:1412
Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec
|
|
+ vpx_dsp/, test/
itxfm -> inv_txfm, ftxfm -> fwd_txfm
Change-Id: I3aacdb65143576d64cfe5c9b14dd358c17c1fe7e
|
|
BUG=webm:1412
Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a
|
|
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.
Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
|
|
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.
Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
|
|
promote coeff to signed 64-bit to avoid exceeding integer bounds when
squaring the value
Change-Id: If77bef6bc0a6a4c39ca3013e5e2ddb426a1c6e1f
|
|
Make it work in high bit depth.
BUG=webm:1412
Change-Id: Ic5cfd410a69709f01e2924774356a108a349d273
|
|
BUG=webm:1412
Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468
|
|
Add PartialIDctTest::PrintDiff() to help debugging.
In RunQuantCheck, try all combinations of +/-mask_ input for 4x4 idct.
Update PartialIDctTest::InitInput().
Change-Id: I13fd163954a4c1a3a6cfeb5e4a4d3d0e7ff901f4
|
|
Makes more sense to call the corresponding partial idct C function
instead of the full idct C function as the reference.
Change-Id: Ibb7681dd063edd6307ba582c10c26c4c6a4b78c6
|
|
BUG=webm:1388
Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
|
|
BUG=webm:1388
Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112
|
|
BUG=webm:1301
Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca
|
|
BUG=webm:1301
Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06
|
|
BUG=webm:1301
Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a
|
|
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.
BUG=webm:1301
Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
|
|
- vpx_idct8x8_12_add_ssse3
vpx_idct8x8_64_add_ssse3
vpx_idct32x32_34_add_ssse3
vpx_idct32x32_135_add_ssse3
vpx_idct32x32_1024_add_ssse3
- turn on unit tests.
Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7
|
|
- In SSSE3 optimization, 16-bit addition and subtraction would
overflow when input coefficient is 16-bit signed extreme values.
- Function-level speed becomes slower (unit ms):
idct8x8_64: 284 -> 294
idct8x8_12: 145 -> 158.
BUG=webm:1332
Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b
|
|
|
|
- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on
i7-6700, no obvious user-level speed performance downgrade.
- Passed unit tests.
Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
|
|
BUG=webm:1301
Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab
|
|
BUG=webm:1301
Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe
|
|
When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.
BUG=webm:1301
Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
|
|
BUG=webm:1301
Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec
|
|
and update vpx_highbd_idct8x8_1_add_neon()
BUG=webm:1301
Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4
|
|
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.
Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
|
|
When eob is less than or equal to 38 for 16x16 idct, call this function.
Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087
|
|
|
|
|
|
The intrinsic version reduces the average cycles from 183 to 175.
Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03
|
|
This currently runs 1000 * 1000 = one *million* times which is quite
unnecessary. It's one of the slowest items in Jenkins and takes over an
hour for each of the larger transforms.
Change-Id: I01653b5e610683e1a2d778ec60cf5065562ab8db
|
|
Tested using test/partial_idct_test.cc:DISABLED_Speed
Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements
using the intrinsics:
<function> <clang asm> <gcc asm> <clang intrin> <gcc intrin>
idct16x16_256 1720ms 1703ms 1546ms 1554ms
idct16x16_10 1320ms 1247ms 518ms 488ms
idct16x16_1 107ms 108ms 64ms 68ms
idct8x8_64 924ms 931ms 866ms 989ms
idct8x8_12 826ms 824ms 519ms 514ms
idct8x8_1 172ms 166ms 110ms 125ms
idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics)
but as a counter example idct16x16_10 goes from ~1300ms to ~500ms
On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly
stable.
BUG=webm:1303
Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4
|
|
Change-Id: Idf4003ea6f9a2a42a9f26e156bee73697acb7a37
|
|
BUG=webm:1301
Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
|
|
BUG=webm:1301
Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
|
|
Change-Id: Icc0eb9c0ddf2a13ec832877a089450972134e8ec
|
|
1. Use correct projections when copying real dct/quant outputs.
2. Remove local random number generator and combine loops.
3. Quantization with minimum allowed step sizes instead of maximum.
This may generate larger inputs.
Change-Id: I154afc26230c894d564671cff4b8fd5485b69598
|
|
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
|
|
Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10
|
|
Change-Id: I3b5fd3b36cac1fb3a93e27fd8fd0781c91d412ce
|