summaryrefslogtreecommitdiff
path: root/test/partial_idct_test.cc
AgeCommit message (Collapse)Author
2022-04-15vp9[loongarch]: Optimize idct32x32_1024/1/34_addyuanhecai
1. vpx_idct32x32_1024_add_lsx 2. vpx_idct32x32_34_add_lsx 3. vpx_idct32x32_1_add_lsx Bug: webm:1755 Change-Id: I9c24f75e0d93613754d8e30da7e007b8d1374e60
2020-07-27NULL -> nullptr in CPP filesJerome Jiang
This should clean up clangtidy warnings Change-Id: Ifb5a986121b2d0bd71b9ad39a79dd46c63bdb998
2020-06-18update googletest to v1.10.0James Zern
this moves the framework to c++11 and changes *_TEST_CASE* to _TEST_SUITE BUG=webm:1695 Change-Id: I07f2c20850312a9c7e381b38353d2f9f45889cb1
2018-12-07test/*: use std::*tupleJames Zern
since: 77fa51003 Replace deprecated scoped_ptr with unique_ptr c++11 has been required so <tuple> is safe to use Change-Id: I873cb953104b361a8503b5839a3372ce2b99e73c
2018-03-28test: use testing::*tuple instead of std::tr1James Zern
googletest imports tuple into testing to allow for compatibility across c++ versions where tuple may be in std::tr1 or std. fixes deprecation warnings under visual studio 2017 Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
2017-08-14Add vpx_highbd_idct32x32_{34, 135, 1024}_add_{sse2, sse4_1}Linfeng Zhang
BUG=webm:1412 Change-Id: I08b562b60fa85fbc2fec1c15c323a3444b44618f
2017-08-07Update 32x32 idct sse2 funcs, add partial case 135Linfeng Zhang
Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a
2017-08-04Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1Linfeng Zhang
BUG=webm:1412 Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca
2017-08-03Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 functionLinfeng Zhang
BUG=webm:1412 Change-Id: I945f0fb6807b8948747243794dc7352b959221f7
2017-07-27Add vpx_idct16x16_38_add_sse2()Linfeng Zhang
Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898
2017-07-27Refactor highbd idct 4x4 and 8x8 x86 functionsLinfeng Zhang
BUG=webm:1412 Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec
2017-07-06cosmetics,vp9/: normalize inv/fwd_txfm namingJames Zern
+ vpx_dsp/, test/ itxfm -> inv_txfm, ftxfm -> fwd_txfm Change-Id: I3aacdb65143576d64cfe5c9b14dd358c17c1fe7e
2017-06-23Add vpx_highbd_idct4x4_16_add_sse4_1()Linfeng Zhang
BUG=webm:1412 Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a
2017-06-21Clean 32x32 full idct sse2 and ssse3 codeLinfeng Zhang
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are code relocations, no new code. Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
2017-06-15Remove vpx_idct8x8_64_add_ssse3()Linfeng Zhang
It's almost identical with vpx_idct8x8_64_add_sse2(), except little difference in instructions order. Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
2017-05-24partial_idct_test,InitInput: fix rollover in multJames Zern
promote coeff to signed 64-bit to avoid exceeding integer bounds when squaring the value Change-Id: If77bef6bc0a6a4c39ca3013e5e2ddb426a1c6e1f
2017-05-23Update InitInput() in test/partial_idct_test.ccLinfeng Zhang
Make it work in high bit depth. BUG=webm:1412 Change-Id: Ic5cfd410a69709f01e2924774356a108a349d273
2017-05-22Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2Linfeng Zhang
BUG=webm:1412 Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468
2017-05-17Update partial idct testing codeLinfeng Zhang
Add PartialIDctTest::PrintDiff() to help debugging. In RunQuantCheck, try all combinations of +/-mask_ input for 4x4 idct. Update PartialIDctTest::InitInput(). Change-Id: I13fd163954a4c1a3a6cfeb5e4a4d3d0e7ff901f4
2017-05-09Update test/partial_idct_test.ccLinfeng Zhang
Makes more sense to call the corresponding partial idct C function instead of the full idct C function as the reference. Change-Id: Ibb7681dd063edd6307ba582c10c26c4c6a4b78c6
2017-05-03Update highbd idct functions arguments to use uint16_t dstLinfeng Zhang
BUG=webm:1388 Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
2017-05-03Clean CONVERT_TO_BYTEPTR/SHORTPTR in idctLinfeng Zhang
BUG=webm:1388 Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112
2017-03-17Add vpx_highbd_idct32x32_1024_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca
2017-03-17Add vpx_highbd_idct32x32_34_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06
2017-03-16Add vpx_highbd_idct32x32_135_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a
2017-03-08Add vpx_highbd_idct32x32_135_add_c()Linfeng Zhang
When eob is less than or equal to 135 for high-bitdepth 32x32 idct, call this function. BUG=webm:1301 Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
2017-02-21Following SSSE3 intrinsics functions also work for HBDYi Luo
- vpx_idct8x8_12_add_ssse3 vpx_idct8x8_64_add_ssse3 vpx_idct32x32_34_add_ssse3 vpx_idct32x32_135_add_ssse3 vpx_idct32x32_1024_add_ssse3 - turn on unit tests. Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7
2017-02-17Fix idct8x8 SSSE3 SingleExtremeCoeff unit testsYi Luo
- In SSSE3 optimization, 16-bit addition and subtraction would overflow when input coefficient is 16-bit signed extreme values. - Function-level speed becomes slower (unit ms): idct8x8_64: 284 -> 294 idct8x8_12: 145 -> 158. BUG=webm:1332 Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b
2017-02-17Merge "Add vpx_highbd_idct16x16_10_add_neon()"James Zern
2017-02-16Replace idct32x32_1024_add_ssse3 assembly with intrinsicsYi Luo
- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on i7-6700, no obvious user-level speed performance downgrade. - Passed unit tests. Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
2017-02-16Add vpx_highbd_idct16x16_10_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab
2017-02-15Add vpx_highbd_idct16x16_38_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe
2017-02-14Add vpx_highbd_idct16x16_38_add_c()Linfeng Zhang
When eob is less than or equal to 38 for high-bitdepth 16x16 idct, call this function. BUG=webm:1301 Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
2017-02-13Add vpx_highbd_idct16x16_256_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec
2017-02-13Add vpx_highbd_idct{16x16,32x32}_1_add_neon()Linfeng Zhang
and update vpx_highbd_idct8x8_1_add_neon() BUG=webm:1301 Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4
2017-02-08Add vpx_idct16x16_38_add_neon()Linfeng Zhang
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of pass 2. Change to use saturating add/sub for both vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high bitdepth. Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
2017-02-07Add vpx_idct16x16_38_add_c()Linfeng Zhang
When eob is less than or equal to 38 for 16x16 idct, call this function. Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087
2017-02-02Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"Jingning Han
2017-02-02Merge "Remove neon assembly for idct 16x16 and 8x8"Johann Koenig
2017-02-01Add SSSE3 intrinsic 8x8 inverse 2D-DCTJingning Han
The intrinsic version reduces the average cycles from 183 to 175. Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03
2017-01-23PartialIDctTest: reduce number of RunQuantCheck iterationsJohann
This currently runs 1000 * 1000 = one *million* times which is quite unnecessary. It's one of the slowest items in Jenkins and takes over an hour for each of the larger transforms. Change-Id: I01653b5e610683e1a2d778ec60cf5065562ab8db
2017-01-19Remove neon assembly for idct 16x16 and 8x8Johann
Tested using test/partial_idct_test.cc:DISABLED_Speed Both gcc 4.9 and clang 3.8 from the r13 Android NDK offer improvements using the intrinsics: <function> <clang asm> <gcc asm> <clang intrin> <gcc intrin> idct16x16_256 1720ms 1703ms 1546ms 1554ms idct16x16_10 1320ms 1247ms 518ms 488ms idct16x16_1 107ms 108ms 64ms 68ms idct8x8_64 924ms 931ms 866ms 989ms idct8x8_12 826ms 824ms 519ms 514ms idct8x8_1 172ms 166ms 110ms 125ms idct8x8_64 isn't quite perfect (slight regression with gcc intrinsics) but as a counter example idct16x16_10 goes from ~1300ms to ~500ms On a sample clip, clang improved from 48.5 to 49fps and gcc stayed roughly stable. BUG=webm:1303 Change-Id: I9d4fd2b41b46ea6174a887b40a82c8e6e4769ed4
2017-01-09Add mips dspr2 partial idct testsKaustubh Raste
Change-Id: Idf4003ea6f9a2a42a9f26e156bee73697acb7a37
2016-12-27Add high bitdepth 8x8 idct NEON intrinsicsLinfeng Zhang
BUG=webm:1301 Change-Id: I56e3bc3aab9214e2debac93796389a7194991084
2016-12-14Clean hbd idct 4x4 neon functions and otherLinfeng Zhang
BUG=webm:1301 Change-Id: I387b7eae716a7df15c691dc6f368b07602df7342
2016-12-13Update idct test code to test 8-bit & high bitdepth simultaneouslyLinfeng Zhang
Change-Id: Icc0eb9c0ddf2a13ec832877a089450972134e8ec
2016-12-07Update TEST_P(PartialIDctTest, RunQuantCheck)Linfeng Zhang
1. Use correct projections when copying real dct/quant outputs. 2. Remove local random number generator and combine loops. 3. Quantization with minimum allowed step sizes instead of maximum. This may generate larger inputs. Change-Id: I154afc26230c894d564671cff4b8fd5485b69598
2016-11-30Add high bitdepth 4x4 idct NEON intrinsicsLinfeng Zhang
Change-Id: I4afc130effa05b8be2e9f982967216b1beb2ce4b
2016-11-22Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff testLinfeng Zhang
Change-Id: Icc4ead05506797d12bf134e8790443676fef5c10
2016-11-22Add idct speed test.Linfeng Zhang
Change-Id: I3b5fd3b36cac1fb3a93e27fd8fd0781c91d412ce