summaryrefslogtreecommitdiff
path: root/vp8/common/rtcd_defs.pl
AgeCommit message (Collapse)Author
2023-08-10RISC-V: optimize vp8_copy_mem with RVVriscv64_android_optimizationYuuta Liang
Test environment: 8c 1804Mhz i5-1140G7 RVV Impl: % CROSS=riscv64-unknown-linux-gnu- configure --target=riscv64-linux-gcc \ --enable-debug --enable-gprof && make -j % time qemu-riscv64 -cpu rv64,v=true,zba=true,vlen=128 -L /path/to/sysroot/ \ ./vpxenc --codec=vp8 -w 352 -h 288 -o akiyol.vpx ./akiyo_cif.yuv Pass 1/1 frame 300/300 314977B 8399b/f 251981b/s 92226 ms (3.25 fps) user 1m30.108s % gprof -abp ./vpxenc ./gmon.out | grep vp8_copy_mem 1.36 53.09 1.04 1025863 0.00 0.00 vp8_copy_mem16x16_rvv 0.72 59.01 0.55 1641368 0.00 0.00 vp8_copy_mem8x8_rvv 0.05 65.95 0.04 764377 0.00 0.00 vp8_copy_mem8x4_rvv C Impl: % CROSS=riscv64-unknown-linux-gnu- configure --target=generic-gnu --enable-debug \ --enable-gprof && make -j % time qemu-riscv64 -cpu rv64,v=true,zba=true,vlen=128 -L /path/to/sysroot/ \ ./vpxenc --codec=vp8 -w 352 -h 288 -o akiyol.vpx ./akiyo_cif.yuv Pass 1/1 frame 300/300 314977B 8399b/f 251981b/s 98417 ms (3.05 fps) user 1m36.146s % gprof -abp ./vpxenc ./gmon.out | grep vp8_copy_mem 0.38 63.96 0.31 vp8_copy_mem8x4_c 0.04 70.61 0.03 204336 0.00 0.00 vp8_copy_mem16x16_c Signed-off-by: Yuuta Liang <yuuta@yuuta.moe>
2023-07-10add example how to use rtcdWang Chen
Just use vp8_sixtap_predict as example but have not implemented it actually. Test: $ CROSS=riscv64-unknown-linux-gnu- ../libvpx/configure --target=riscv64-linux-gcc $ make Check if vp8_sixtap_predict functions have been replaced with those suffixed with "_rvv": $ riscv64-unknown-linux-gnu-nm ./vp8/decoder/decodeframe.c.o | grep vp8_sixtap_predict16x16 U vp8_sixtap_predict16x16_rvv Check if vp8_sixtap_predictMxN_rvv work. $ qemu-riscv64 -L $SYSROOT_RV64 ./build-test/test_libvpx --gtest_filter="RVV/SixtapPredictTest.TestWithPresetData/*" You should see print log output such as: "--> vp8_sixtap_predict4x4_rvv" "FAILED" is expected due to we have not implemented the actual algorithm. Signed-off-by: Wang Chen <wangchen20@iscas.ac.cn> Co-authored-by: sun min <sunmin89@outlook.com>
2023-03-20vp8_sixtap_predict16x16_neon: fix overreadJames Zern
Shift the final read from the source by 3 to avoid breaking the assumption that the 6-tap filter needs only 5 pixels outside of the macroblock; this matches the sse2 and ssse3 implementations. It's possible this restriction could be removed if the source buffers are assumed to be padded. Bug: webm:1795 Change-Id: I4c791e3a214898a503c78f4cedca154c75cdbaef Fixed: webm:1795
2023-03-03disable vp8_sixtap_predict16x16_neonJames Zern
This causes various buffer overflows in the tests: [ RUN ] NEON/SixtapPredictTest.TestWithPresetData/0 ================================================================= ==22346==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000012b4a5b at pc 0x000000df0f60 bp 0xffffcf6e64b0 sp 0xffffcf6e64a8 READ of size 8 at 0x0000012b4a5b thread T0 #0 0xdf0f5c in vp8_sixtap_predict16x16_neon vp8/common/arm/neon/sixtappredict_neon.c:1507:13 #1 0x8819e4 in (anonymous namespace)::SixtapPredictTest_TestWithPresetData_Test::TestBody() test/predict_test.cc:293:3 ... 0x0000012b4a5b is located 2 bytes to the right of global variable 'kTestData' defined in '../test/predict_test.cc:237:24' (0x12b48a0) of size 441 [ RUN ] NEON/SixtapPredictTest.TestWithRandomData/0 ================================================================= ==22338==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xffff8b5321fb at pc 0x000000df0f60 bp 0xfffff7e0cf30 sp 0xfffff7e0cf28 READ of size 8 at 0xffff8b5321fb thread T0 #0 0xdf0f5c in vp8_sixtap_predict16x16_neon vp8/common/arm/neon/sixtappredict_neon.c:1507:13 #1 0x87d4c0 in (anonymous namespace)::PredictTestBase::TestWithRandomData(void (*)(unsigned char*, int, int, int, unsigned char*, int)) test/predict_test.cc:170:9 ... 0xffff8b5321fb is located 2 bytes to the right of 441-byte region [0xffff8b532040,0xffff8b5321f9) allocated by thread T0 here: #0 0x5fd4f0 in operator new[](unsigned long) (test_libvpx+0x5fd4f0) #1 0x87c2e0 in (anonymous namespace)::PredictTestBase::SetUp() test/predict_test.cc:47:12 #2 0x87d074 in non-virtual thunk to (anonymous namespace)::PredictTestBase::SetUp() test/predict_test.cc ... Bug: webm:1795 Change-Id: I32213a381eef91547d00f88acf90f1cf2ec2ea75
2022-05-17vp8[loongarch]: Optimize vp8_sixtap_predict4x4yuanhecai
1. vp8_sixtap_predict4x4 Bug: webm:1755 Change-Id: If7d844496ef2cfe2252f2ef12bb7cded63ad03dd
2022-05-17vp8[loongarch]: Optimize fdct8x4/diamond_search_sadyuanhecai
1. vp8_short_fdct8x4_lsx 2. vp8_diamond_search_sad_lsx 3. vpx_sad8x8_lsx Bug: webm:1755 Change-Id: Ic9df84ead2d4fc07ec58e9730d6a12ac2b2d31c1
2022-05-17vp8[loongarch]: Optimize vp8 encoding partial functionHao Chen
1. vp8_short_fdct4x4 2. vp8_regular_quantize_b 3. vp8_block_error 4. vp8_mbblock_error 5. vpx_subtract_block Bug: webm:1755 Change-Id: I3dbfc7e3937af74090fc53fb4c9664e6cdda29ef
2022-03-31vp8[loongarch]: Optimize dequant_idct_add_y/uv_blockyuanhecai
1. vp8_dequant_idct_add_uv_block_lsx 2. vp8_dequant_idct_add_y_block_lsx Bug: webm:1755 Change-Id: I1f006daaefb2075b422bc72a3f69c5abee776e2e
2022-03-29remove sad x3,x8 specializationsJohann
These would compute the sum of absolute differences (sad) for a group of 3 or 8 references. This was used as part of an exhaustive search. vp8 only uses these functions in speed 0 and best quality. For vp9 this is only used with the --enable-non-greedy-mv experiment. This removes the 3- and 8-at-a-time optimized functions and uses the fall back code which will process 1 or 4 (vpx_sadMxNx4d) at a time. For configure --target=x86_64-linux-gcc --enable-realtime-only: libvpx.a before: 3002424 after: 2937622 delta: 64802 after 'strip libvpx.a' before: 2116998 after: 2073090 delta: 43908 Change-Id: I566d06e027c327b3bede68649dd551bba81a848e
2022-03-17vp8[loongarch]: Optimize idct_add, filter_bv/bhyuanhecai
1. vp8_dc_only_idct_add_lsx 2. vp8_loop_filter_bh_lsx 3. vp8_loop_filter_bv_lsx Bug: webm:1755 Change-Id: I9b629767e2a4e9db8cbb3ee2369186502dc6eb00
2022-02-08vp8[loongarch]: Optimize vp8_loop/sixtap, vpx_dc with LSX.Lu Wang
1. vp8_loop_filter_mbh, vp8_loop_filter_mbv 2. vp8_sixtap_predict16x16, vp8_sixtap_predict8x8 3. vpx_dc_predictor_16x16, vpx_dc_predictor_8x8 ./vpxdec --progress -o YUV_1920X1080.yuv original_1200f/VP8_1920X1080.webm before: 37.77fps after : 220.90fps Bug: webm:1755 Change-Id: I1a3ce16f0c872261d813b6531cfdf25bd59bb774
2019-01-07vp8_copy32xn: resolve missing declarationJohann
BUG=webm:1584 Change-Id: I8279e099fb9595edad858bf7332bf2b40fecae02
2018-10-30clang-tidy: fix vp8/encoder parametersJohann
BUG=webm:1444 Change-Id: I57a305cdab0d62b0745116272fbd5d9257c6e679
2018-10-30clang-tidy: fix vp8/common parametersJohann
Match function definitions to declarations BUG=webm:1444 Change-Id: Ib96d3b735eaf81cece5406c89cc5156bc2cde462
2018-10-25vp8 bilinear: rewrite 4x4Johann
~20% faster than the MMX. Removes the last usage of vp8_bilinear_filters_x86_[48]. Change-Id: Iee976fab9655d0020440f26c4403ce50103af913
2018-10-24vp8 bilinear: rewrite in intrinsicsJohann
8x8 is 15% faster than the assembly. 8x4 is 200% faster than MMX. Remove MMX version. Change-Id: I55642ebd276db265911f2c79616177a3a9a7e04f
2017-12-14add copyright to rtcd filesJohann
Allows them to pass the license check in chromium. BUG=chromium:98319 Change-Id: Iefc1706152a549d8c4ae774c917596bf1c9492d8
2017-10-17vp8: [loongson] optimize idct with mmiShiyou Yin
1. vp8_dequant_idct_add_y_block_mmi 2. vp8_dequant_idct_add_uv_block_mmi Change-Id: I9987147be2685ac79d4b045d1d56f6709ee1223c
2017-10-12vp8: [loongson] optimize dct with mmiShiyou Yin
1. vp8_short_fdct4x4_mmi 2. vp8_short_fdct8x4_mmi 3. vp8_short_walsh4x4_mmi Change-Id: I89a7df25cfd09fae309fac257ad8b6a3dc1c8acb
2017-10-11vp8: [loongson] optimize quantize with mmiShiyou Yin
1. vp8_fast_quantize_b_mmi 2. vp8_regular_quantize_b_mmi Change-Id: Ic6e21593075f92c1004acd67184602d2aa5d5646
2017-09-26vp8: [loongson] optimize copymen with mmiShiyou Yin
1. vp8_copy_mem16x16_mmi 2. vp8_copy_mem8x8_mmi 3. vp8_copy_mem8x4_mmi Change-Id: I3de29a11fa7402df0e48bbb944440b1e66498a65
2017-09-14vp8: [loongson] optimize dequantize with mmiShiyou Yin
1. vp8_dequantize_b_mmi 2. vp8_dequant_idct_add_mmi Change-Id: I505f8afb7a444173392b325906e6a4f420f00709
2017-09-14vp8: [loongson] optimize idctllm with mmiShiyou Yin
1. vp8_short_idct4x4llm_mmi 2. vp8_short_inv_walsh4x4_mmi 3. vp8_dc_only_idct_add_mmi Change-Id: I616923681e79d78607a4988608fc39df77b093f4
2017-09-11vp8: [loongson] optimize loopfilter with mmiShiyou Yin
1. vp8_loop_filter_horizontal_edge_mmi 2. vp8_loop_filter_vertical_edge_mmi 3. vp8_mbloop_filter_horizontal_edge_mmi 4. vp8_mbloop_filter_vertical_edge_mmi 5. vp8_loop_filter_simple_horizontal_edge_mmi 6. vp8_loop_filter_simple_vertical_edge_mmi Change-Id: Ie34bbff3a16cff64e39a50798afd2b7dac9bcdc3
2017-09-02vp8: [loongson] optimize sixtap predict with mmiShiyou Yin
1. vp8_sixtap_predict16x16_mmi 2. vp8_sixtap_predict8x8_mmi 3. vp8_sixtap_predict8x4_mmi 4. vp8_sixtap_predict4x4_mmi Change-Id: I186669d1a1d998a0f3ba3a548e25eee8b52c251b
2016-11-08Refine vp8_refining_search_sadx4 targetingJohann
This uses the same sdx4df pointers as vp8_diamond_search_sadx4 and should therefore target the same optimizations. See e4ddf9db6a37eee59c079f5ae427643ae3424fcf Change-Id: Ic298e9b25c34bbe6b7a0799509355b0addb56675
2016-09-29vp8: remove mmx functionsJohann
When they have sse2 equivalents. Change-Id: I158f631a3bcecba57b36093ac10114b1904767a7
2016-09-29Rename _xmm functions to _sse2Johann
Avoid the extra level of indirection/confusion. Change-Id: I0555f639d67835df9fb7dac0c75085e9954805f1
2016-09-29Remove vp8_clear_system_stateJohann
Use vpx_clear_system_state instead. Change-Id: Ia3e9122f69a2c690ddd7c7bc54f92ccb9ec18b3e
2016-09-29vp8: clean up rtcdJohann
Remove lines which specify the same name for a function. Change-Id: I956bd8ce2b81a2a8feab5621d28bd2499c2b4c2d
2016-09-27Hook up vp8_diamond_search_sad_sse3Johann
The original commit never set any 'specialize' line: 61311e61039c300ae872ccba22304e9e60dc0205 It appears the sadx4 version of function uses sdx4df calls to speed up the search. There are no sse3 versions of the sdx4df functions, but there are sse2 and msa versions. There is a neon version of vpx_sad16x16x4d but not any of the smaller versions. Perhaps if they existed this function could be expanded to use them. Change-Id: I936d7d6b1a3ff6dcd5a4d2322272708c47cdec13
2016-09-23Un-Revert "Restore vp8_sixtap_predict4x4_neon"Johann
This restores d9dce2f48eed1368a44c368fa87a506bd89ffec5 Switched to using signed shift-and-narrow. Instead of saturating negative results to 0, it was saturating them to 255. BUG=webm:817 BUG=webm:1273 Change-Id: I571095336aa4182e3288b17924fcaaece42b0a49
2016-09-16Merge "Revert "Restore vp8_sixtap_predict4x4_neon""James Zern
2016-09-16Revert "Restore vp8_sixtap_predict4x4_neon"Johann Koenig
This reverts commit d9dce2f48eed1368a44c368fa87a506bd89ffec5. Appears to be failing the SixtapPredict tests in some configurations and possibly test vectors as well. Change-Id: Ica6aa83ebac47d0a76e451846e7da67b1c17a7d7
2016-09-15Restore vp8_bilinear_predict4x4_neonJohann
This function was removed when clang started introducing alignment hints which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail: https://llvm.org/bugs/show_bug.cgi?id=24421 The load has been rendered safe with an implementation ~indiscernible performance-wise that uses _u8 and over-reads just a touch. It is still ~5x faster than C in the unaligned case and doing both filters. BUG=webm:892 BUG=webm:1273 Change-Id: Icf7167189391b46202f47233bb585c24c42bcc36
2016-09-15Restore vp8_sixtap_predict4x4_neonJohann
This function was removed when clang started introducing alignment hints which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail: https://llvm.org/bugs/show_bug.cgi?id=24421 The load has been rendered safe with an implementation ~indiscernible performance-wise that uses _u8 and over-reads just a touch. The store, when unaligned, has a version that is ~25% slower but safe when xoffset = 0 (second pass filter only). When the first pass filter (or both) are in play, the new version is almost identical in speed. Worst case performance (both filters, unaligned stores) is roughly 3-4x faster than C. BUG=webm:817 BUG=webm:1273 Change-Id: I1e490e94453e0872151fe0dafb05557463f6247d
2016-08-04Remove armv6 targetJohann
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
2016-07-12deblock filter : moved from vp8 code branchJim Bankoski
The deblocking filters used in vp8 have been moved to vpx_dsp for use by both vp8 and vp9. Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d
2016-05-06Remove sixtap/bilinear 4x4 neon implementationsJohann
These implementations rely on casting the pointers to load the data. Clang implemented optimizations which automatically add alignment hints to such loads. The 4x4 filters do not guarantee the necessary alignment so the resulting assembly is broken. https://llvm.org/bugs/show_bug.cgi?id=24421 BUG=webm:817 BUG=webm:892 Change-Id: I608885299f1f86ff83653b65e0e40d0ae87fb3fe
2016-05-02Move vpx_add_plane from codec to vpx_dsp and dedup.Jim Bankoski
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
2015-09-30vp8: change build_intra4x4_predictors() to use vpx_dsp.Ronald S. Bultje
I've added a few new functions (d45e, d63e, he, ve) to cover the filtered h/v 4x4 predictors that are vp8-specific, the "correct" d45 with the correctly filtered bottom-right pixel (as opposed to the unfiltered version in vp9), and the "broken" d63 with weirdly filtered bottom-right pixels (which is correctly filtered in vp9). There may be a minor performance impact on all systems because we have to do an extra copy of the Above pixel array to incorporate the topleft pixel in the same array (thus fitting the vpx_dsp API). In addition, armv6 will have a more serious performance impact b/c I removed the armv6/vp8-specific assembly. I'm not sure anyone cares... Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86
2015-09-30vp8: change build_intra_predictors_mbuv_s to use vpx_dsp.Ronald S. Bultje
Change-Id: I936c2430c3c5b1e0ab5dec0a20110525e925b5e4
2015-09-30vp8: change build_intra_predictors_mby_s to use vpx_dsp.Ronald S. Bultje
Change-Id: I2000820e0c04de2c975d370a0cf7145330289bb2
2015-08-07Replace VP8 SSIM with VP9 derived vpx_dsp SSIM.Alex Converse
Change-Id: Ic61f30af12d1b01c1d5adc4e08bc20e20ad38027
2015-08-01mips msa vp8 denoising filter optimizationParag Salasakar
average improvement ~2x-3x Change-Id: I6c17012c731fa4d56e0343f8de0df47b2dde289b
2015-07-31mips msa vp8 temporal filter optimizationParag Salasakar
average improvement ~2x-3x Change-Id: I05593bed583234dc7809aaec6cab82773a29505d
2015-07-31mips msa vp8 block subtract optimizationParag Salasakar
average improvement ~2x-3x Change-Id: I30abf4c92cddcc9e87b7a40d4106076e1ec701c2
2015-07-30mips msa vp8 quantize optimizationParag Salasakar
average improvement ~2x-3x Change-Id: I6fc37191bf9cb5a67e1af9787d0d27659c17bdba
2015-07-30mips msa vp8 fdct optimizationParag Salasakar
average improvement ~2x-4x Change-Id: Id0bc600440f7ef53348f585ebadb1ac6869e9a00
2015-07-29mips msa vp8 post proc optimizationParag Salasakar
average improvement ~2x-4x Change-Id: I93abc15389649c169bb8b69127c0b95407d34692