summaryrefslogtreecommitdiff
path: root/vp9
AgeCommit message (Collapse)Author
2023-02-21Skip redundant iterations in joint motion search Deepa K G
In joint_motion_search, there are four iterations. Even iterations search in the first reference frame and odd iterations search in the second. The last two iterations use the search result of the first two iterations as the start point. If the search result does not change,last two iterations are not necessary and can be skipped. Instruction Count cpu-used Reduction(%) 0 1.411 Change-Id: Ie583c9f75dd0a22bbdfb432ccdd62eea6ec4fce8
2023-02-16Relax frame recode tolerance on speed 0 to 1 above 480pchiyotsai
Performance: | SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T | |---------|---------|----------|----------|---------|-------| | 0 | hdres2 | -0.028% | +0.030% | -0.408% | -2.0% | | 0 | lowres2 | +0.000% | +0.000% | +0.000% | +0.0% | | 0 | midres2 | -0.138% | +0.042% | -0.427% | -2.5% | |---------|---------|----------|----------|---------|-------| | 1 | hdres2 | -0.032% | +0.018% | -0.342% | -1.1% | | 1 | lowres2 | +0.000% | +0.000% | +0.000% | +0.0% | | 1 | midres2 | +0.050% | +0.060% | -0.257% | -1.6% | Rate Error: | | | AVG_RC_ERROR | MAX_RC_ERROR | | | |---------------------|---------------------| | SPD_SET | TESTSET | BASE | TEST | BASE | TEST | |---------|---------|----------|----------|----------|----------| | 0 | hdres2 | 33.044% | 33.065% | 149.903% | 149.903% | | 0 | midres2 | 59.632% | 59.566% | 79.091% | 79.249% | |---------|---------|----------|----------|----------|----------| | 1 | hdres2 | 33.050% | 33.057% | 151.278% | 151.278% | | 1 | midres2 | 59.640% | 59.614% | 78.707% | 78.842% | STATS_CHANGED Change-Id: I5d09601fede3912d5173717ce9dd070df3a97ec8
2023-02-14Enable some more speed features on speed 0 to 2chiyotsai
Performance: | SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T | |---------|---------|----------|----------|---------|-------| | 0 | hdres2 | +0.034% | +0.030% | +0.033% | -3.7% | | 0 | lowres2 | +0.012% | +0.017% | +0.044% | -2.1% | | 0 | midres2 | +0.030% | +0.035% | +0.060% | -1.9% | |---------|---------|----------|----------|---------|-------| | 1 | hdres2 | +0.027% | +0.036% | +0.030% | -2.7% | | 1 | lowres2 | -0.006% | -0.002% | +0.006% | -1.0% | | 1 | midres2 | -0.006% | -0.012% | -0.010% | -1.0% | |---------|---------|----------|----------|---------|-------| | 2 | hdres2 | -0.006% | -0.001% | -0.020% | -2.4% | | 2 | lowres2 | -0.010% | -0.015% | -0.001% | -0.9% | | 2 | midres2 | +0.006% | -0.005% | +0.009% | -1.0% | STATS_CHANGED Change-Id: I1431ac07215bb844739a410697387b9aead82792
2023-02-10Remove CONFIG_CONSISTENT_RECODE flagchiyotsai
Currently, libvpx does not properly clear and re-initialize the memories when it re-encodes a frame. As a result, out-of-date values are used in the encoding process, and re-encoding a frame with the same parameter will give different outputs. This commit enables the code under CONFIG_CONSISTENT_RECODE to correct this behavior. This change has minor effect on the coding performance, but it ensures valid values are used in the encoding process. Furthermore, the flag is removed as it is now always turned on. Performance: | SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T | |---------|---------|----------|----------|---------|-------| | 0 | hdres2 | -0.012% | -0.021% | -0.030% | +0.1% | | 0 | lowres2 | +0.029% | +0.019% | +0.047% | +0.1% | | 0 | midres2 | -0.004% | +0.009% | +0.026% | +0.1% | |---------|---------|----------|----------|---------|-------| | 1 | hdres2 | +0.032% | +0.032% | -0.000% | -0.0% | | 1 | lowres2 | -0.005% | -0.011% | -0.014% | +0.0% | | 1 | midres2 | +0.004% | +0.020% | +0.027% | +0.2% | |---------|---------|----------|----------|---------|-------| | 2 | hdres2 | +0.048% | +0.056% | +0.057% | +0.1% | | 2 | lowres2 | +0.007% | +0.002% | -0.016% | -0.0% | | 2 | midres2 | -0.015% | -0.008% | -0.002% | +0.1% | |---------|---------|----------|----------|---------|-------| | 3 | hdres2 | +0.010% | +0.014% | +0.004% | -0.0% | | 3 | lowres2 | +0.000% | -0.021% | -0.001% | +0.0% | | 3 | midres2 | +0.007% | -0.038% | +0.012% | -0.2% | |---------|---------|----------|----------|---------|-------| | 4 | hdres2 | +0.107% | +0.136% | +0.124% | -0.0% | | 4 | lowres2 | -0.012% | -0.024% | -0.020% | -0.0% | | 4 | midres2 | +0.055% | -0.004% | +0.048% | -0.1% | |---------|---------|----------|----------|---------|-------| | 5 | hdres2 | +0.026% | +0.027% | +0.020% | -0.0% | | 5 | lowres2 | +0.009% | -0.008% | +0.028% | +0.1% | | 5 | midres2 | -0.025% | +0.021% | -0.020% | -0.1% | STATS_CHANGED Change-Id: I3967aee8c8e4d0608a492e07f99ab8de9744ba57
2023-02-09Merge "Remove onyx_int.h from vp8 rc header" into mainJerome Jiang
2023-02-09Remove onyx_int.h from vp8 rc headerJerome Jiang
Also move the FRAME_TYPE declaration to common.h Bug: webm:1766 Change-Id: Ic3016bd16548a5d2e0ae828a7fd7ad8adda8b8f6
2023-02-08Copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::micchiyotsai
STATS_CHANGED BUG=webm:1789 Change-Id: I74efe28bdf90a179c59fe3d1f5a15d497f57080d
2023-02-08Merge "Enable some speed features on speed 0" into mainChi Yo Tsai
2023-02-07Enable some speed features on speed 0chiyotsai
Performance: | SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR | SSIM | ENC_T | |---------|---------|----------|----------|---------|-------| | 0 | hdres2 | +0.069% | +0.067% | +0.100% | -8.6% | | 0 | midres2 | +0.116% | +0.103% | +0.062% | -9.6% | | 0 | lowres2 | +0.276% | +0.283% | +0.214% |-11.9% | STATS_CHANGED Change-Id: I8b26c0be2312fcd0f8c9e889367682e80ea8de4b
2023-02-07Merge "Move TPL to a new file" into mainYunqing Wang
2023-02-06Move TPL to a new fileYunqing Wang
This is a refactoring CL. Change-Id: Ic8c1575601d27f14ecd1b1bf0a038e447eaae458
2023-02-06Merge "Remove duplicated VPX_SCALING declaration" into mainJerome Jiang
2023-02-06Remove duplicated VPX_SCALING declarationJerome Jiang
Use VPX_SCALING_MODE instead Change-Id: Iab9d29f20838703e00bd9f7641035d8ebd69af53
2023-02-03Fix uninitialized mesh feature for BEST modeYunqing Wang
At BEST encoding mode, the mesh search range wasn't initialized for non FC_GRAPHICS_ANIMATION content type, which actually/mistakenly used speed 0's setting. Fixed it by adding the initialization. There were 2 ways to fix this. Patchset 1 set to use speed 0's setting for non FC_GRAPHICS_ANIMATION type. This didn't change BEST mode's encoding results much, and only a couple of clips' results were changed. Borg result for BEST mode: avg_psnr: ovr_psnr: ssim: encoding_spdup: lowres2: -0.004 -0.003 -0.000 0.030 midres2: -0.006 -0.009 -0.012 0.033 hdres2: 0.002 0.002 0.004 0.015 Patchset 2 set to use BEST's setting for non FC_GRAPHICS_ANIMATION type. However, the majority of test clips' BDrate got changed up to ~0.5% (gain or loss), and overall it didn't give better performance than patchset 1. So, we chose to use patchset 1. Change-Id: Ibbf578dad04420e6ba22cb9a3ddec137a7e4deef
2023-02-01vp9_diamond_search_sad_neon: use DECLARE_ALIGNEDJames Zern
rather than the gcc specific __attribute__((aligned())); fixes build targeting ARM64 windows. Bug: webm:1788 Change-Id: I2210fc215f44d90c1ce9dee9b54888eb1b78c99e
2023-01-28Merge "Add encoder component timing information" into mainYunqing Wang
2023-01-27Add encoder component timing informationYunqing Wang
Change-Id: Iaa5b73a9593ecfd74b6426ed47d2b529ec7ae2b5
2023-01-26Merge "Fix per frame qp for temporal layers" into mainJerome Jiang
2023-01-26Fix per frame qp for temporal layersJerome Jiang
Also add tests with fixed temporal layering mode. Change-Id: If516fe94e3fb7f5a745821d1788bfe6cf90edaac
2023-01-26Merge "[NEON] Add Highbd FHT 8x8/16x16 functions" into mainJames Zern
2023-01-24[NEON] Add Highbd FHT 8x8/16x16 functionsKonstantinos Margaritis
In total this gives about 9% extra performance for both rt/best profiles. Furthermore, add transpose_s32 16x16 function Change-Id: Ib6f368bbb9af7f03c9ce0deba1664cef77632fe2
2023-01-24Skip calculating internal stats when frame droppedJerome Jiang
Bug: webm:1771 Change-Id: I30cd5b7ec0945b521a1cc03999d39ec6a25f1696
2023-01-20Merge "Add codec control to set per frame QP" into mainJerome Jiang
2023-01-19Add codec control to set per frame QPJerome Jiang
Use case is for 1 pass encoding. Forces max_quantizer = min_quantizer and aq-mode = 0. Applicalble to spatial layers, where user may set the QP per spatial layer. Change-Id: Idfcb7daefde94c475ed1bc0eb8af47c9f309110b
2023-01-13Fix to segfault for external resize test in vp9Marco Paniconi
Failure occurs for 1 pass non-realtime mode at speed 0. Due to speed feautre rd_ml_partition.var_pruning, which doesn't check for scaled reference in simple_motion_search(). Bug: webm:1768 Change-Id: Iddcb56033bac042faebb5196eed788317590b23f
2023-01-05Use Neon load/store helper functions consistentlyJonathan Wright
Define all Neon load/store helper functions in mem_neon.h and use them consistently in Neon convolution functions. Change-Id: I57905bc0a3574c77999cf4f4a73442c3420fa2be
2022-12-07L2E: Add a new interface to control rdmultCheng Chen
Allow external model to control frame rdmult. A function is called per frame to get the value of rdmult from the external model. The external rdmult will overwrite libvpx's default rdmult unless a reserved value is selected. A unit test is added to test when the default rdmult value is set. Change-Id: I2f17a036c188de66dc00709beef4bf2ed86a919a
2022-11-18Merge "vp9/rate_ctrl_rtc: Improve get cyclic refresh data" into mainMarco Paniconi
2022-11-18vp9/rate_ctrl_rtc: Improve get cyclic refresh dataHirokazu Honda
A client of the vp9 rate controller needs to know whether the segmentation is enabled and the size of delta_q. It is also nicer to know the size of map. This CL changes the interface to achieve these. Bug: b:259487065 Test: Build Change-Id: If05854530f97e1430a7b97788910f277ab673a87
2022-11-15Merge "vp9-svc: Fixes to make SVC work with VBR" into mainMarco Paniconi
2022-11-15vp9-svc: Fixes to make SVC work with VBRMarco Paniconi
Prior to this CL SVC with VBR mode was broken. Fixes made here to make VBR rate control work for SVC. Rename is_one_pass_cbr_svc() --> is_one_pass_svc(), as it can be used now for both CBR and VBR. Added rate targetting unittest for (2SL, 3TL). Bug: chromium:1375111 Change-Id: I5a62ffe7fbea29dc5949c88a284768386b1907a9
2022-11-15Merge "[NEON] Optimize FHT functions, add highbd FHT 4x4" into mainJames Zern
2022-11-14quantize: remove vp9_regular_quantize_b_4x4Johann
This was just a helper function which called vpx_quantize_b or vpx_highbd_quantize_b. It also checked for skip_block, which was necessary when webm:1439 was filed but does not appear to be necessary now. Removes a quantize variant and makes subsequent cleanups easier. Change-Id: Ibe545eccd19370f07ff26c8e151f290c642efd2a
2022-11-11[NEON] Optimize FHT functions, add highbd FHT 4x4Konstantinos Margaritis
Refactor & optimize FHT functions further, use new butterfly functions 4x4 5% faster, 8x8 & 16x16 10% faster than previous versions. Highbd 4x4 FHT version 2.27x faster than C version for --rt. Change-Id: I3ebcd26010f6c5c067026aa9353cde46669c5d94
2022-11-10vp9-rc: Fix key frame setting in external RCMarco Paniconi
Bug: b/257368998 Change-Id: I03e35915ac99b50cb6bdf7bce8b8f9ec5aef75b7
2022-11-01[NEON] Optimize and homogenize Butterfly DCT functionsKonstantinos Margaritis
Provide a set of commonly used Butterfly DCT functions for use in DCT 4x4, 8x8, 16x16, 32x32 functions. These are provided in various forms, using vqrdmulh_s16/vqrdmulh_s32 for _fast variants, which unfortunately are only usable in pass1 of most DCTs, as they do not provide the necessary precision in pass2. This gave a performance gain ranging from 5% to 15% in 16x16 case. Also, for 32x32, the loads were rearranged, along with the butterfly optimizations, this gave 10% gain in 32x32_rd function. This refactoring was necessary to allow easier porting of highbd 32x32 functions -follows this patchset. Change-Id: I6282e640b95a95938faff76c3b2bace3dc298bc3
2022-10-25Merge changes I36545ff4,Id1aa29da into mainJames Zern
* changes: vp9_highbd_quantize_fp*_neon: normalize fn param name highbd_sad_avx2: normalize function param names
2022-10-25Merge "quantize: consolidate sse2 conditionals" into mainJohann Koenig
2022-10-24vp9_highbd_quantize_fp*_neon: normalize fn param nameJames Zern
count -> n_coeffs. aligns the name with the rtcd header; clears a clang-tidy warning Change-Id: I36545ff479df92b117c95e494f16002e6990f433
2022-10-17quantize: consolidate sse2 conditionalsJohann
Change-Id: I43de579e30f2967b97064063e29676e0af1a498f
2022-10-17vp9 quantize: rewrite ssse3 in intrinsicsJohann
Change-Id: I3177251a5935453a23a23c39ea5f6fd41254775e
2022-10-07Merge "vp9 quantize: change index" into mainJohann Koenig
2022-10-04Merge "L2E: Rework recode decisions for external max frame size and q" into mainCheng Chen
2022-10-01vp9 quantize: change indexJohann
In assembly it made sense to iterate using n_coeffs. In intrinsics it's just as fast to use index and easier to read. Change-Id: I403c959709309dad68123d0a3d0efe183874543d
2022-09-26Merge "vp9_rd.c quiet -Wstringop-overflow" into mainScott LaVarnway
2022-09-26quantize: standardize vp9_quantize_fp_sse2Johann
Match style for vpx_quantize_b_sse2 and prepare to rewrite ssse3 version in intrinsics. Need to evaluate the value of threshold breakout before going further. Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b
2022-09-26vp9_rd.c quiet -Wstringop-overflowScott LaVarnway
../libvpx/vp9/encoder/vp9_rd.c:594:20: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 594 | t_above[i] = !!*(const uint32_t *)&above[i]; | ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../libvpx/vp9/encoder/vp9_rd.c:572:47: note: at offset [64, 254] into destination object ‘t_above’ of size [0, 16] 572 | ENTROPY_CONTEXT t_above[16], | ~~~~~~~~~~~~~~~~^~~~~~~~~~~ Change-Id: Ie9ef24e685af417cdd35f6aa7284805e422b6ae2
2022-09-23quantize: increase iscan by 1Johann
All of the assembly adds 1 to iscan to convert from a 0 based array to the EOB value. Add 1 to all iscan values and remove the extra instructions from the assembly. Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9
2022-09-14L2E: Rework recode decisions for external max frame size and qCheng Chen
Allow to handle external q and external max frame size separately. Rely on libvpx's decision to catch overshoot/undershoot and recode frames. Previously, when external max frame size is set, we didn't handle undershoot cases, and now we fall back to libvpx's decision to recode a frame if overshoot/undershoot is seen. Change-Id: Ic3eee042cfe104b528c5f2c6c82b98dd5d8fa8ca
2022-09-12CHECK_MEM_ERROR: add an assert for a valid jmp targetJames Zern
callers of CHECK_MEM_ERROR() expect failures to not return tested with: configure --enable-debug --enable-vp9-postproc --enable-postproc \ --enable-multi-res-encoding --enable-vp9-temporal-denoising \ --enable-error-concealment --enable-internal-stats has unrelated assertion failures currently Change-Id: Ic12073b1ae80a6f434f14d24f652e64d30f63eea