summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2017-03-01Improve idct32x32_34_add SSSE3 intrinsics performanceYi Luo
- Split the transform into first half and second half. - Reschedule the instructions to avoid stack spillover. - Function level speed improves ~16%. Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
2017-02-24get_prob(): rationalize int typesJames Zern
promote the unsigned int calculation to uint64_t rather than int64_t for type consistency Change-Id: Ic34dee1dc707d9faf6a3ae250bfe39b60bef3438
2017-02-22Merge "Fix segmentation fault caused by denoiser working with spatial SVC."Jerome Jiang
2017-02-21Following SSSE3 intrinsics functions also work for HBDYi Luo
- vpx_idct8x8_12_add_ssse3 vpx_idct8x8_64_add_ssse3 vpx_idct32x32_34_add_ssse3 vpx_idct32x32_135_add_ssse3 vpx_idct32x32_1024_add_ssse3 - turn on unit tests. Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7
2017-02-21Fix segmentation fault caused by denoiser working with spatial SVC.Jerome Jiang
Re-enable the affected test. BUG=webm:1374 Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
2017-02-17Fix idct8x8 SSSE3 SingleExtremeCoeff unit testsYi Luo
- In SSSE3 optimization, 16-bit addition and subtraction would overflow when input coefficient is 16-bit signed extreme values. - Function-level speed becomes slower (unit ms): idct8x8_64: 284 -> 294 idct8x8_12: 145 -> 158. BUG=webm:1332 Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b
2017-02-17Merge "Add vpx_highbd_idct16x16_10_add_neon()"James Zern
2017-02-16Replace idct32x32_1024_add_ssse3 assembly with intrinsicsYi Luo
- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on i7-6700, no obvious user-level speed performance downgrade. - Passed unit tests. Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
2017-02-16Merge "block error avx2: use tran_low_t"Johann Koenig
2017-02-16Add vpx_highbd_idct16x16_10_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab
2017-02-16Merge "Fix mips vpx_post_proc_down_and_across_mb_row_msa function"James Zern
2017-02-16Merge "correct bitdepth_conversion_sse2.h header guard"Johann Koenig
2017-02-16correct bitdepth_conversion_sse2.h header guardJohann
Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519
2017-02-16Merge "Add idct32x32_135_add SSSE3 intrinsics"Yi Luo
2017-02-16block error avx2: use tran_low_tJohann
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
2017-02-16Add idct32x32_135_add SSSE3 intrinsicsYi Luo
- Replace the corresponding assembly code. - No user level speed performance degrade. - Unit tests passed. Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
2017-02-16quantize_fp highbd ssse3: use tran_low_t for coeffJohann
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
2017-02-16bitdepth conversion: really use num elementsJohann
The previous implementation confused bit/bytes/elements. It was using '32' as the multiplier but that was mistakenly adopted because a 32x32 transform embedded the stride. Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
2017-02-16Fix mips vpx_post_proc_down_and_across_mb_row_msa functionKaustubh Raste
Added fix to handle non-multiple of 16 cols case for size 16 Change-Id: If3a6d772d112077c5e0a9be9e612e1148f04338c
2017-02-16Merge "Use 'packssdw' for loading tran_low_t values"Johann Koenig
2017-02-15cosmetics,dsp/inv_txfm.c: reorder functionsLinfeng Zhang
Change-Id: Ie0f7689ebe230c68eadb22a32b14838c1a7543a6
2017-02-15Add vpx_highbd_idct16x16_38_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe
2017-02-14Add vpx_highbd_idct16x16_38_add_c()Linfeng Zhang
When eob is less than or equal to 38 for high-bitdepth 16x16 idct, call this function. BUG=webm:1301 Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
2017-02-14Use 'packssdw' for loading tran_low_t valuesJohann
This matches bitdepth_conversion_sse2.asm and produces substantially better assembly. The old way had lots of 'movzwl' and 'shl' and storing back to memory before loading into an xmm register. Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b
2017-02-14Replace 14 with DCT_CONST_BITS in idct NEON functions' shiftsLinfeng Zhang
Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6
2017-02-14apply clang-formatclang-format
Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce
2017-02-14Merge "Replace idct32x32_34_add_ssse3 assembly with intrinsics"Yi Luo
2017-02-14Replace idct32x32_34_add_ssse3 assembly with intrinsicsYi Luo
- No user-level speed performance change. - Pass unit tests. Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6
2017-02-14Merge "Add vpx_highbd_idct16x16_256_add_neon()"Linfeng Zhang
2017-02-13Add vpx_highbd_idct16x16_256_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec
2017-02-13fdct8x8 highbd neon: use tran_low_t for outputJohann
Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba
2017-02-13Add vpx_highbd_idct{16x16,32x32}_1_add_neon()Linfeng Zhang
and update vpx_highbd_idct8x8_1_add_neon() BUG=webm:1301 Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4
2017-02-11Merge "Add vpx_idct16x16_38_add_neon()"James Zern
2017-02-08Add vpx_idct16x16_38_add_neon()Linfeng Zhang
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of pass 2. Change to use saturating add/sub for both vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high bitdepth. Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
2017-02-08Replace idct8x8_12_add_ssse3 assembly code with intrinsicsYi Luo
- Performance achieves the same as assembly. - Unit tests pass. Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd
2017-02-07Add vpx_idct16x16_38_add_c()Linfeng Zhang
When eob is less than or equal to 38 for 16x16 idct, call this function. Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087
2017-02-07Merge "Update 16x16 8-bit idct NEON intrinsics"Linfeng Zhang
2017-02-06highbd x86: consolidate tran_low_t conversionsJohann
Create new helper files specifically for converting tran_low_t types. Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905
2017-02-02Merge "Add SSSE3 intrinsic 8x8 inverse 2D-DCT"Jingning Han
2017-02-02Merge "Add mips msa sum_squares_2d_i16 function"Kaustubh Raste
2017-02-02Merge "Remove neon assembly for idct 16x16 and 8x8"Johann Koenig
2017-02-02Merge changes I43521ad3,I013659f6Johann Koenig
* changes: satd highbd neon: use tran_low_t for coeff satd highbd sse2: use tran_low_t for coeff
2017-02-01Update 16x16 8-bit idct NEON intrinsicsLinfeng Zhang
Remove redundant memory accesses. Change-Id: I8049074bdba5f49eab7e735b2b377423a69cd4c8
2017-02-01Add SSSE3 intrinsic 8x8 inverse 2D-DCTJingning Han
The intrinsic version reduces the average cycles from 183 to 175. Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03
2017-02-01Merge changes I374dfc08,I7e15192e,Ica414007Johann Koenig
* changes: hadamard highbd ssse3: use tran_low_t for coeff hadamard highbd neon: use tran_low_t for coeff hadamard highbd sse2: use tran_low_t for coeff
2017-02-01Merge "deblock: annotate postproc parameters"Johann Koenig
2017-02-01satd highbd neon: use tran_low_t for coeffJohann
BUG=webm:1365 Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84
2017-02-01satd highbd sse2: use tran_low_t for coeffJohann
BUG=webm:1365 Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb
2017-02-01hadamard highbd ssse3: use tran_low_t for coeffJohann
BUG=webm:1365 Change-Id: I374dfc08732932382043905f128e928b08cb4f57
2017-02-01hadamard highbd neon: use tran_low_t for coeffJohann
BUG=webm:1365 Change-Id: I7e15192ead3a3631755b386f102c979f06e26279