summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-02[SSE4_1] Fix overflow in highbd temporal_filterKonstantinos Margaritis
While porting this function to NEON, using SSE4_1 implementation as base I noticed that both were producing files with different checksums to the C reference implementation. After investigating further I found that this saturating pack was the culprit. Doing the multiplication on the 32-bit values, leads to producing the correct results with the C implementation. Change-Id: I40c2a36551b2db363a58ea9aa19ef327f2676de3
2023-03-01Merge "Optimize Neon implementation of high bitdepth MSE functions" into mainJames Zern
2023-03-01Revert "Implement highbd_d63_predictor using Neon"James Zern
This reverts commit 7cdf139e3d6237386e0f93bdb0bdc1b459c663bf. This causes failures in the VP9/ExternalFrameBufferMD5Test and VP9/TestVectorTest.MD5Match tests in both armv7 and aarch64 builds. Change-Id: I7ac4ba0ddc70e7e7860df9f962e6658defe1cdd5
2023-03-01Optimize Neon implementation of high bitdepth MSE functionsSalome Thirot
Currently MSE functions just call the variance helpers but don't actually use the computed sum. This patch adds dedicated helpers to perform the computation of sse. Add the corresponding tests as well. Change-Id: I96a8590e3410e84d77f7187344688e02efe03902
2023-03-01quantize: use scan_order instead of passing scan/iscanJohann
further reduces the arguments for the 32x32. This will be applied to the base version as well. Change-Id: I25a162b5248b14af53d9e20c6a7fa2a77028a6d1
2023-03-01quantize: simplifly highbd 32x32_b argsJohann
Change-Id: I431a41279c4c4193bc70cfe819da6ea7e1d2fba1
2023-02-28Merge changes I892fbd2c,Ic59df16c,I7228327b,Ib4a1a2cb into mainJames Zern
* changes: Implement highbd_d117_predictor using Neon Implement highbd_d63_predictor using Neon Implement d117_predictor using Neon Implement d63_predictor using Neon
2023-02-28Merge "quantize: simplify 32x32_b args" into mainJames Zern
2023-02-28Implement highbd_d117_predictor using NeonGeorge Steed
Add Neon implementations of the highbd d117 predictor for 4x4, 8x8, 16x16 and 32x32 block sizes. Also update tests to add new corresponding cases. An explanation of the general implementation strategy is given in the 8x8 implementation body, and is mostly identical to the non-highbd version. Speedups over the C code (higher is better): Microarch. | Compiler | Block | Speedup Neoverse N1 | LLVM 15 | 4x4 | 1.99 Neoverse N1 | LLVM 15 | 8x8 | 4.37 Neoverse N1 | LLVM 15 | 16x16 | 6.81 Neoverse N1 | LLVM 15 | 32x32 | 6.49 Neoverse N1 | GCC 12 | 4x4 | 2.49 Neoverse N1 | GCC 12 | 8x8 | 4.10 Neoverse N1 | GCC 12 | 16x16 | 5.58 Neoverse N1 | GCC 12 | 32x32 | 2.16 Neoverse V1 | LLVM 15 | 4x4 | 1.99 Neoverse V1 | LLVM 15 | 8x8 | 5.03 Neoverse V1 | LLVM 15 | 16x16 | 6.61 Neoverse V1 | LLVM 15 | 32x32 | 6.01 Neoverse V1 | GCC 12 | 4x4 | 2.09 Neoverse V1 | GCC 12 | 8x8 | 4.52 Neoverse V1 | GCC 12 | 16x16 | 4.23 Neoverse V1 | GCC 12 | 32x32 | 2.70 Change-Id: I892fbd2c17ac527ddc22b91acca907ffc84c5cd2
2023-02-28Implement highbd_d63_predictor using NeonGeorge Steed
Add Neon implementations of the highbd d63 predictor for 4x4, 8x8, 16x16 and 32x32 block sizes. Also update tests to add new corresponding cases. Speedups over the C code (higher is better): Microarch. | Compiler | Block | Speedup Neoverse N1 | LLVM 15 | 4x4 | 2.43 Neoverse N1 | LLVM 15 | 8x8 | 4.03 Neoverse N1 | LLVM 15 | 16x16 | 3.07 Neoverse N1 | LLVM 15 | 32x32 | 4.11 Neoverse N1 | GCC 12 | 4x4 | 2.92 Neoverse N1 | GCC 12 | 8x8 | 7.20 Neoverse N1 | GCC 12 | 16x16 | 4.43 Neoverse N1 | GCC 12 | 32x32 | 3.18 Neoverse V1 | LLVM 15 | 4x4 | 1.99 Neoverse V1 | LLVM 15 | 8x8 | 3.66 Neoverse V1 | LLVM 15 | 16x16 | 3.60 Neoverse V1 | LLVM 15 | 32x32 | 3.29 Neoverse V1 | GCC 12 | 4x4 | 2.39 Neoverse V1 | GCC 12 | 8x8 | 4.76 Neoverse V1 | GCC 12 | 16x16 | 3.29 Neoverse V1 | GCC 12 | 32x32 | 2.43 Change-Id: Ic59df16ceeb468003754b4374be2f4d9af6589e4
2023-02-28Implement d117_predictor using NeonGeorge Steed
Add Neon implementations of the d117 predictor for 4x4, 8x8, 16x16 and 32x32 block sizes. Also update tests to add new corresponding cases. An explanation of the general implementation strategy is given in the 8x8 implementation body. Speedups over the C code (higher is better): Microarch. | Compiler | Block | Speedup Neoverse N1 | LLVM 15 | 4x4 | 1.73 Neoverse N1 | LLVM 15 | 8x8 | 5.24 Neoverse N1 | LLVM 15 | 16x16 | 9.77 Neoverse N1 | LLVM 15 | 32x32 | 14.13 Neoverse N1 | GCC 12 | 4x4 | 2.04 Neoverse N1 | GCC 12 | 8x8 | 4.70 Neoverse N1 | GCC 12 | 16x16 | 8.64 Neoverse N1 | GCC 12 | 32x32 | 4.57 Neoverse V1 | LLVM 15 | 4x4 | 1.75 Neoverse V1 | LLVM 15 | 8x8 | 6.79 Neoverse V1 | LLVM 15 | 16x16 | 9.16 Neoverse V1 | LLVM 15 | 32x32 | 14.47 Neoverse V1 | GCC 12 | 4x4 | 1.75 Neoverse V1 | GCC 12 | 8x8 | 6.00 Neoverse V1 | GCC 12 | 16x16 | 7.63 Neoverse V1 | GCC 12 | 32x32 | 4.32 Change-Id: I7228327b5be27ee7a68deecafa05be0bd2a40ff4
2023-02-28Implement d63_predictor using NeonGeorge Steed
Add Neon implementations of the d63 predictor for 4x4, 8x8, 16x16 and 32x32 block sizes. Also update tests to add new corresponding cases. Speedups over the C code (higher is better): Microarch. | Compiler | Block | Speedup Neoverse N1 | LLVM 15 | 4x4 | 2.10 Neoverse N1 | LLVM 15 | 8x8 | 4.45 Neoverse N1 | LLVM 15 | 16x16 | 4.74 Neoverse N1 | LLVM 15 | 32x32 | 2.27 Neoverse N1 | GCC 12 | 4x4 | 2.46 Neoverse N1 | GCC 12 | 8x8 | 10.37 Neoverse N1 | GCC 12 | 16x16 | 11.46 Neoverse N1 | GCC 12 | 32x32 | 6.57 Neoverse V1 | LLVM 15 | 4x4 | 2.24 Neoverse V1 | LLVM 15 | 8x8 | 3.53 Neoverse V1 | LLVM 15 | 16x16 | 4.44 Neoverse V1 | LLVM 15 | 32x32 | 2.17 Neoverse V1 | GCC 12 | 4x4 | 2.25 Neoverse V1 | GCC 12 | 8x8 | 7.67 Neoverse V1 | GCC 12 | 16x16 | 8.97 Neoverse V1 | GCC 12 | 32x32 | 4.77 Change-Id: Ib4a1a2cb5a5c4495ae329529f8847664cbd0dfe0
2023-02-28quantize: simplify 32x32_b argsJohann
Now that all the implementations of the 32x32 quantize are in intrinsics we can reference struct members directly. Saves pushing them to the stack. n_coeffs is not used at all for this function. Change-Id: I2104fea3fa20c455087e21b347d6abd7ea1f3e1e
2023-02-28Merge "Add Neon implementations of standard bitdepth MSE functions" into mainJames Zern
2023-02-28Merge "Optimize transpose_neon.h helper functions" into mainJames Zern
2023-02-27tools_common,VpxInterface: remove unneeded constJames Zern
Change-Id: Ic309aab2ff1750bdbcc36e8aafe05d52930ba694
2023-02-27Merge "tools_common,VpxInterface: fix interface fn ptr proto" into mainJames Zern
2023-02-27Add Neon implementations of standard bitdepth MSE functionsSalome Thirot
Currently only vpx_mse16x16 has a Neon implementation. This patch adds optimized Armv8.0 and Armv8.4 dot-product paths for all block sizes: 8x8, 8x16, 16x8 and 16x16. Add the corresponding tests as well. Change-Id: Ib0357fdcdeb05860385fec89633386e34395e260
2023-02-27Optimize transpose_neon.h helper functionsJonathan Wright
1) Use vtrn[12]q_[su]64 in vpx_vtrnq_[su]64* helpers on AArch64 targets. This produces half as many TRN1/2 instructions compared to the number of MOVs that result from vcombine. 2) Use vpx_vtrnq_[su]64* helpers wherever applicable. 3) Refactor transpose_4x8_s16 to operate on 128-bit vectors. Change-Id: I9a8b1c1fe2a98a429e0c5f39def5eb2f65759127
2023-02-24tools_common,VpxInterface: fix interface fn ptr protoJames Zern
Use (void) to indicate an empty parameter list and match the declaration of vpx_codec_vp[89]_[cd]x. This fixes a cfi sanitizer error. Change-Id: I190f432eea4d1765afffd84c7458ec44d863f90c
2023-02-24Merge changes I65d86038,If3299fe5,I3ef1ff19 into mainJames Zern
* changes: Add Neon implementation of high bitdepth 32x32 hadamard transform Add Neon implementation of high bitdepth 16x16 hadamard transform Add Neon implementation of high bitdepth 8x8 hadamard transform
2023-02-24Merge changes Ia64d175a,Ie4ea8f0a into mainJames Zern
* changes: vp9_loop_filter_alloc: clear -Wshadow warnings vp9_adapt_mode_probs: clear -Wshadow warning
2023-02-24Add Neon implementation of high bitdepth 32x32 hadamard transformSalome Thirot
Add Neon implementation of vpx_highbd_hadamard_32x32 as well as the corresponding tests. Change-Id: I65d8603896649de1996b353aa79eee54824b4708
2023-02-24Add Neon implementation of high bitdepth 16x16 hadamard transformSalome Thirot
Add Neon implementation of vpx_highbd_hadamard_16x16 as well as the corresponding tests. Change-Id: If3299fe556351dfe3db994ac171d83a95ea1504b
2023-02-24Merge "vp9 rc test: change param type to bool" into mainJerome Jiang
2023-02-23vp9 rc test: change param type to boolJerome Jiang
Change-Id: Ib45522e32d9137678da9062830044e9dd87537e5
2023-02-23Merge "Disable some intra modes for TX_32X32" into mainChi Yo Tsai
2023-02-23Add Neon implementation of high bitdepth 8x8 hadamard transformSalome Thirot
Add Neon implementation of vpx_highbd_hadamard_8x8 as well as the corresponding tests. Change-Id: I3ef1ff199d76b6b010591ef15a81b0f36c9ded03
2023-02-22vp9_loop_filter_alloc: clear -Wshadow warningsJames Zern
Bug: webm:1793 Change-Id: Ia64d175aa69dc2ecde2babf64bde04f02b32795b
2023-02-22vp9_adapt_mode_probs: clear -Wshadow warningJames Zern
Bug: webm:1793 Change-Id: Ie4ea8f0a3295e6f58dc6f7d5c61d46700c539d40
2023-02-23Merge "vp9_block.h: rename diff struct to Diff" into mainJames Zern
2023-02-22Disable some intra modes for TX_32X32chiyotsai
Performance: | SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T | |---------|---------|----------|----------|---------|-------| | 0 | hdres2 | +0.036% | +0.032% | +0.014% | -3.9% | | 0 | lowres2 | -0.002% | -0.011% | +0.020% | -3.6% | | 0 | midres2 | +0.045% | +0.025% | -0.007% | -4.0% | STATS_CHANGED Change-Id: I75a927333d26f2a37f0dda57a641b455b845f5b9
2023-02-22vpx_subpixel_8t_intrin_avx2: clear -Wshadow warningsJames Zern
no changes to assembly Bug: webm:1793 Change-Id: I6a82290cafee7f4a7909d497ccfdefd5a78fb8ed
2023-02-22vp9_block.h: rename diff struct to DiffJames Zern
This matches the style guide and fixes some -Wshadow warnings related to variables with the same name. Something similar was done in libaom in: 863b04994b Fix warnings reported by -Wshadow: Part2: av1 directory Bug: webm:1793 Change-Id: I4df1bbc8d079a3174d75f0d35d54c200ffdbb677
2023-02-22Merge "Skip redundant iterations in joint motion search " into mainYunqing Wang
2023-02-22Merge "vp9 rc: Make it work for SVC parallel encoding" into mainJerome Jiang
2023-02-21Optimize Neon implementation of high bitpdeth variance functionsSalome Thirot
Specialize implementation of high bitdepth variance functions such that we only widen data processing element types when absolutely necessary. Change-Id: If4cc3fea7b5ab0821e3129ebd79ff63706a512bf
2023-02-21Skip redundant iterations in joint motion search Deepa K G
In joint_motion_search, there are four iterations. Even iterations search in the first reference frame and odd iterations search in the second. The last two iterations use the search result of the first two iterations as the start point. If the search result does not change,last two iterations are not necessary and can be skipped. Instruction Count cpu-used Reduction(%) 0 1.411 Change-Id: Ie583c9f75dd0a22bbdfb432ccdd62eea6ec4fce8
2023-02-17vp9 rc: Make it work for SVC parallel encodingJerome Jiang
Added unit test. Keep track of spatial layer id and frame type in case where spatial layers are encoded parallel by the hardware encoder. ComputeQP() / PostEncodeUpdate() doesn't need to be called sequentially when there is no inter layer prediction. Bug: b/257368998 Change-Id: I50beaefcfc205d3f9a9d3dbe11fead5bfdc71489
2023-02-17Merge "vp9 rc: Verify QP for all spatial layers" into mainJerome Jiang
2023-02-16vp9 rc: Verify QP for all spatial layersJerome Jiang
Change-Id: Ic669c96d25d7c039d370e9acd00dc45e09054552
2023-02-16Relax frame recode tolerance on speed 0 to 1 above 480pchiyotsai
Performance: | SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T | |---------|---------|----------|----------|---------|-------| | 0 | hdres2 | -0.028% | +0.030% | -0.408% | -2.0% | | 0 | lowres2 | +0.000% | +0.000% | +0.000% | +0.0% | | 0 | midres2 | -0.138% | +0.042% | -0.427% | -2.5% | |---------|---------|----------|----------|---------|-------| | 1 | hdres2 | -0.032% | +0.018% | -0.342% | -1.1% | | 1 | lowres2 | +0.000% | +0.000% | +0.000% | +0.0% | | 1 | midres2 | +0.050% | +0.060% | -0.257% | -1.6% | Rate Error: | | | AVG_RC_ERROR | MAX_RC_ERROR | | | |---------------------|---------------------| | SPD_SET | TESTSET | BASE | TEST | BASE | TEST | |---------|---------|----------|----------|----------|----------| | 0 | hdres2 | 33.044% | 33.065% | 149.903% | 149.903% | | 0 | midres2 | 59.632% | 59.566% | 79.091% | 79.249% | |---------|---------|----------|----------|----------|----------| | 1 | hdres2 | 33.050% | 33.057% | 151.278% | 151.278% | | 1 | midres2 | 59.640% | 59.614% | 78.707% | 78.842% | STATS_CHANGED Change-Id: I5d09601fede3912d5173717ce9dd070df3a97ec8
2023-02-14Enable some more speed features on speed 0 to 2chiyotsai
Performance: | SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T | |---------|---------|----------|----------|---------|-------| | 0 | hdres2 | +0.034% | +0.030% | +0.033% | -3.7% | | 0 | lowres2 | +0.012% | +0.017% | +0.044% | -2.1% | | 0 | midres2 | +0.030% | +0.035% | +0.060% | -1.9% | |---------|---------|----------|----------|---------|-------| | 1 | hdres2 | +0.027% | +0.036% | +0.030% | -2.7% | | 1 | lowres2 | -0.006% | -0.002% | +0.006% | -1.0% | | 1 | midres2 | -0.006% | -0.012% | -0.010% | -1.0% | |---------|---------|----------|----------|---------|-------| | 2 | hdres2 | -0.006% | -0.001% | -0.020% | -2.4% | | 2 | lowres2 | -0.010% | -0.015% | -0.001% | -0.9% | | 2 | midres2 | +0.006% | -0.005% | +0.009% | -1.0% | STATS_CHANGED Change-Id: I1431ac07215bb844739a410697387b9aead82792
2023-02-14Merge changes Id74a6d9c,I5c31e0e9,Id5a2b2d9,I73182c97,I2f5916d5, ... into mainJames Zern
* changes: Optimize vpx_highbd_comp_avg_pred_neon Add Neon AvgPredTestHBD test suite Specialize Neon high bitdepth avg subpel variance by filter value Specialize Neon high bitdepth subpel variance by filter value Refactor Neon high bitdepth avg subpel variance functions Optimize Neon high bitdepth subpel variance functions
2023-02-13Optimize vpx_highbd_comp_avg_pred_neonSalome Thirot
Optimize the implementation of vpx_highbd_comp_avg_pred_neon by making use of the URHADD instruction to compute the average. Change-Id: Id74a6d9c33e89bc548c3c7ecace59af69051b4a7
2023-02-13Add Neon AvgPredTestHBD test suiteSalome Thirot
Add test suite for vpx_highbd_comp_avg_pred_neon. Change-Id: I5c31e0e990661ee3b8030bb517829c088fceae4d
2023-02-13Specialize Neon high bitdepth avg subpel variance by filter valueSalome Thirot
Use the same specialization as for standard bitdepth. The rationale for the specialization is as follows: The optimal implementation of the bilinear interpolation depends on the filter values being used. For both horizontal and vertical interpolation this can simplify to just taking the source values, or averaging the source and reference values - which can be computed more easily than a bilinear interpolation with arbitrary filter values. This patch introduces tests to find the most optimal bilinear interpolation implementation based on the filter values being used. This new specialization is only used for larger block sizes. Change-Id: Id5a2b2d9fac6f878795a6ed9de2bc27d9e62d661
2023-02-13Specialize Neon high bitdepth subpel variance by filter valueSalome Thirot
Use the same specialization as for standard bitdepth. The rationale for the specialization is as follows: The optimal implementation of the bilinear interpolation depends on the filter values being used. For both horizontal and vertical interpolation this can simplify to just taking the source values, or averaging the source and reference values - which can be computed more easily than a bilinear interpolation with arbitrary filter values. This patch introduces tests to find the most optimal bilinear interpolation implementation based on the filter values being used. This new specialization is only used for larger block sizes. Change-Id: I73182c979255f0332a274f2e5907df7f38c9eeb3
2023-02-13Refactor Neon high bitdepth avg subpel variance functionsSalome Thirot
Use the same general code style as in the standard bitdepth Neon implementation - merging the computation of vpx_highbd_comp_avg_pred with the second pass of the bilinear filter to avoid storing and loading the block again. Also move vpx_highbd_comp_avg_pred_neon to its own file (like the standard bitdepth implementation) since we're no longer using it for averaging sub-pixel variance. Change-Id: I2f5916d5b397db44b3247b478ef57046797dae6c
2023-02-13Optimize Neon high bitdepth subpel variance functionsSalome Thirot
Use the same general code style as in the standard bitdepth Neon implementation. Additionally, do not unnecessarily widen to 32-bit data types when doing bilinear filtering - allowing us to process twice as many elements per instruction. Change-Id: I1e178991d2aa71f5f77a376e145d19257481e90f