summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2022-05-13vp9[loongarch]: Optimize avg_variance64x64/variance8x8yuanhecai
1. vpx_variance8x8_lsx 2. vpx_sub_pixel_avg_variance64x64_lsx Bug: webm:1755 Change-Id: I7d68c7f2f5c8d27dc31cfd32298aeefb68f5d560
2022-05-13vp9[loongarch]: Optimize fdct4x4/8x8_lsxyuanhecai
1. vpx_fdct4x4_lsx 2. vpx_fdct8x8_lsx Bug: webm:1755 Change-Id: If283fc08f9bedcbecd2c4052adb210f8fe00d4f0
2022-05-13vp9[loongarch]: Optimize vpx_hadamard_16x16/8x8yuanhecai
1. vpx_hadamard_16x16_lsx 2. vpx_hadamard_8x8_lsx Bug: webm:1755 Change-Id: I3b1e0a2c026c3806b7bbbd191d0edf0e78912af7
2022-04-28Merge changes I99ee0ef3,Ie087e8be,I6b19d016,I6fb7771d,I54f83733, ... into mainJames Zern
* changes: y4m_input_open: check allocs fastssim,fs_ctx_init: check alloc vp9_get_smooth_motion_field: check alloc vp9_row_mt_alloc_rd_thresh: check alloc simple_encode,init_encoder: check buffer_pool alloc VP9RateControlRTC::Create: check segmentation_map alloc vp9_speed_features.c: check allocations vp9_alloc_motion_field_info: check motion_field_array alloc vp9_enc_grp_get_next_job: check job queue alloc vp9: check postproc_state.limits allocs vp9,encode_tiles_buffer_alloc: fix allocation check
2022-04-28vp9[loongarch]: Optimize sad8x8/32x64/64x32x4dyuanhecai
1. vpx_sad8x8x4d_lsx 2. vpx_sad32x64x4d_lsx 3. vpx_sad64x32x4d_lsx Bug: webm:1755 Change-Id: I08a2b8717ec8623ffdd4451a04e68fa3a7228668
2022-04-28vp9[loongarch]: Optimize sad64x64/32x32_avg,comp_avg_predyuanhecai
1. vpx_sad64x64_avg_lsx 2. vpx_sad32x32_avg_lsx 3. comp_avg_pred_lsx Bug: webm:1755 Change-Id: I58dabdcdd4265bd6ebd5670db8a132d2e838683f
2022-04-26fastssim,fs_ctx_init: check allocJames Zern
Change-Id: Ie087e8be1e943b94327ed520db447a0e3a927738
2022-04-26vp9[loongarch]: Optimize fdct/get/variance16x16yuanhecai
1. vpx_fdct16x16_lsx 2. vpx_get16x16var_lsx 3. vpx_variance16x16_lsx Bug: webm:1755 Change-Id: I27090406dc28cfdca64760fea4bc16ae11b74628
2022-04-24vp9[loongarch]: Optimize sub_pixel_variance32x32/sad16x16yuanhecai
1. vpx_sad16x16_lsx 2. vpx_sub_pixel_variance32x32_lsx Bug: webm:1755 Change-Id: I9926ace710903993ccbb42caef320fa895e90127
2022-04-22vp9[loongarch]: Optimize horizontal/vertical_4/dualyuanhecai
1. vpx_lpf_horizontal_4_lsx 2. vpx_lpf_vertical_4_lsx 3. vpx_lpf_horizontal_4_dual_lsx 3. vpx_lpf_vertical_4_dual_lsx Bug: webm:1755 Change-Id: I12e9f27cafd9514b24cfbf2354cc66c7d1238687
2022-04-22vp9[loongarch]: Optimize convolve8_avg_vert/convolve_copyyuanhecai
1. vpx_convolve8_avg_vert_lsx 2. vpx_convolve_copy_lsx 3. vpx_idct32x32_135_add_lsx Bug: webm:1755 Change-Id: I6bdfe5836a91a5e361ab869b26641e86c5ebb68d
2022-04-22vp9[loongarch]: Optimize vertical/horizontal_8_dualyuanhecai
1. vpx_lpf_vertical_8_dual_lsx 2. vpx_lpf_horizontal_8_dual_lsx Bug: webm:1755 Change-Id: I354df02cc215f36b4edf6558af0ff7fd6909deac
2022-04-19fdct16x16_neon.h,cosmetics: fix include-guard caseJames Zern
Change-Id: I593735bb7f88d63f2ddab57484099479c8759a3d
2022-04-15vp9[loongarch]: Optimize idct32x32_1024/1/34_addyuanhecai
1. vpx_idct32x32_1024_add_lsx 2. vpx_idct32x32_34_add_lsx 3. vpx_idct32x32_1_add_lsx Bug: webm:1755 Change-Id: I9c24f75e0d93613754d8e30da7e007b8d1374e60
2022-04-15vp9[loongarch]: Optimize vpx_fdct32x32/32x32_rdyuanhecai
1. vpx_fdct32x32_lsx 2. vpx_fdct32x32_rd_lsx Bug: webm:1755 Change-Id: I83bce11c0d905cf137545a46cd756aef9cedce47
2022-04-12vp9[loongarch]: Optimize vpx_variance64x64/32x32yuanhecai
1. vpx_variance64x64_lsx 2. vpx_variance32x32_lsx Bug: webm:1755 Change-Id: I45c5aa94cbbf7128473894a990d931acaa40e102
2022-04-12vp9[loongarch]: Optimize sad64x64/32x32/16x16yuanhecai
1. vpx_sad64x64x4d_lsx 2. vpx_sad32x32x4d_lsx 3. vpx_sad16x16x4d_lsx 4. vpx_sad64x64_lsx 5. vpx_sad32x32_lsx Bug: webm:1755 Change-Id: Ief71c2216f697b261d7c1fc481c89c9f1a6098e6
2022-04-05Merge changes I0b6520be,I1f006daa,I7ee8e367 into mainJames Zern
* changes: vp9[loongarch]: Optimize vpx_convolve8_avg_horiz_c vp8[loongarch]: Optimize dequant_idct_add_y/uv_block loongarch: Fix bugs
2022-03-31Revert "quantize: replace highbd versions"James Zern
This reverts commit 2200039d33c49a9f7a5c438656df143755b022c4. This causes failures with VP9/EndToEndTestLarge.EndtoEndPSNRTest/*; it seems the assembly does not match the C code. Bug: webm:1586 Change-Id: I4c63beebf88d4c12789d681b0d38014510b147fe
2022-03-31Revert "quantize: remove highbd version"James Zern
This reverts commit 89cfe3835c47dabf77d38edb3af190155984fa9a. This is a prerequisite for reverting 2200039d33c49a9f7a5c438656df143755b022c4 which causes high bitdepth test failures Bug: webm:1586 Change-Id: I28f3b98f3339f3573b1492b88bf733dade133fc0
2022-03-31vp9[loongarch]: Optimize vpx_convolve8_avg_horiz_cyuanhecai
1. vpx_convolve8_avg_horiz_lsx Bug: webm:1755 Change-Id: I0b6520be0afa1689da329f56ec6cd95c1730250c
2022-03-31loongarch: Fix bugsyuanhecai
Fix bugs from loopfilter_filters_lsx.c, vpx_convolve8_avg_lsx.c Bug: webm:1755 Change-Id: I7ee8e367d66a49f3be10d7e417837d3b6ef50bdb
2022-03-31quantize: remove highbd versionJohann
The only difference between the code is the clamp. For 8 bit it is purely an optimization. The values outside this range will still saturate. Change-Id: I2a770b140690d99e151b00957789bd72f7a11e13
2022-03-31Merge "remove sad x3,x8 specializations" into mainJohann Koenig
2022-03-31quantize: replace highbd versionsJohann
The optimized quantize functions were already built to handle highbd values. The only difference is the clamping. All highbd functions expand to 32bits when running in highbd mode. Removes vpx_highbd_quantize_32x32_sse2 as it is slower than the C version in the worst case. Bug: webm:1586 Change-Id: I49bf8a6a2041f78450bf43a4f655c67656b0f8d9
2022-03-29sad4d_avx2: fix VS 2014 build errorJames Zern
after: d60b671a7 gcc 11 warning: mismatched bound error C2719: 'sums': formal parameter with requested alignment of 32 won't be aligned Change-Id: Iaba46d00ef2334a5e2d9ee69b5d03478fdc73a60
2022-03-30remove skip_block from quantizeJohann
Whether a block is skipped is handled by mi->skip. x->skip_block is kept exclusively to verify that the quantize functions are not called for skip blocks. Finishes the cleanup in 13eed991f Bug: libvpx:1612 Change-Id: I1598c3b682d3c5e6c57a15fa4cb5df2c65b3a58a
2022-03-29remove sad x3,x8 specializationsJohann
These would compute the sum of absolute differences (sad) for a group of 3 or 8 references. This was used as part of an exhaustive search. vp8 only uses these functions in speed 0 and best quality. For vp9 this is only used with the --enable-non-greedy-mv experiment. This removes the 3- and 8-at-a-time optimized functions and uses the fall back code which will process 1 or 4 (vpx_sadMxNx4d) at a time. For configure --target=x86_64-linux-gcc --enable-realtime-only: libvpx.a before: 3002424 after: 2937622 delta: 64802 after 'strip libvpx.a' before: 2116998 after: 2073090 delta: 43908 Change-Id: I566d06e027c327b3bede68649dd551bba81a848e
2022-03-29gcc 11 warning: mismatched boundJohann
Clean up a new build warning with gcc11: argument 3 of type ‘const uint8_t * const[]’ with mismatched bound [-Warray-parameter=] Standardize sad functions with array sizes. Change-Id: Iea4144e61368f6a8279e2f3ae96c78aff06c8b41
2022-03-24Merge "Make sure only NEON FDCT functions are called." into mainJames Zern
2022-03-17Make sure only NEON FDCT functions are called.Konstantinos Margaritis
[NEON] Added vpx_fdct4x4_pass1_neon(), Added vpx_fdct8x8_pass1_notranspose_neon(), Added vpx_fdct8x8_pass1_neon() to avoid code duplication Refactored vpx_fdct4x4_neon() and vpx_dct8x8_neon() to use the above Rename dct_body to vpx_fdct16x16_body to reuse later Add transpose_s16_16x16() I have run make test and all tests/configurations seem to pass. Profiled using this command on an Ampere Altra VM: sudo perf record -g ./vpxenc --codec=vp9 --height=1080 --width=1920 \ --fps=25/1 --limit=20 -o output.mkv \ ../original_videos_Sports_1080P_Sports_1080P-0063.mkv --debug –rt Before this optimization: 1.32% 1.32% vpxenc vpxenc [.] vpx_fdct4x4_neon 0.16% 0.16% vpxenc vpxenc [.] vpx_fdct4x4_c 0.79% 0.79% vpxenc vpxenc [.] vpx_fdct8x8_c 0.52% 0.52% vpxenc vpxenc [.] vpx_fdct8x8_neon 1.23% 1.23% vpxenc vpxenc [.] vpx_fdct16x16_c 0.54% 0.54% vpxenc vpxenc [.] vpx_fdct16x16_neon So, even though a _neon() version exists, the C version was called \ as well. After this patch: 1.42% 1.36% vpxenc vpxenc [.] vpx_fdct4x4_neon 0.87% 0.82% vpxenc vpxenc [.] vpx_fdct8x8_neon 0.74% 0.74% vpxenc vpxenc [.] vpx_fdct16x16_neon Change-Id: Id4e1dd315c67b4355fe4e5a1b59e181a349f16d0
2022-03-16vp9[loongarch]: Optimize convolve/convolve8_avg_cyuanhecai
1. vpx_convolve8_avg_lsx 2. vpx_convolve_avg_lsx Bug: webm:1755 Change-Id: I4af5c362a94f11d0b5d1760e18326660bdbc0559
2022-03-16vp9[loongarch]: Optimize convolve8_horiz/vert/cyuanhecai
1. vpx_convolve8_lsx 2. vpx_convolve8_vert_lsx 3. vpx_convolve8_horiz_lsx Bug: webm:1755 Change-Id: I9897e1ed6a904ac74d1078bd22b275af44db142d
2022-03-13ads2gas[_apple].pl: remove unused stanzasJohann
Many of the features in ads2gas are no longer used. Remove all patterns which are no longer used in libvpx. Simplify between the two to minimize differences. Change-Id: Ia1151eb8b694cbe51845a1374a876cc7b798899c
2022-03-03vp9[loongarch]: Optimize horizontal/vertical_8_cyuanhecai
1. vpx_lpf_vertical_8_lsx 2. vpx_lpf_horizontal_8_lsx Bug: webm:1755 Change-Id: I6b05d6b1b2ac4d2a75beb9c9ca9700976fc3af55
2022-02-25vp9[loongarch]: Optimize lpf_horizontal/vertical_16_dual with LSXyuanhecai
Change-Id: I82c6bc16ea57c3f7ac5f4d212a12a5f70cb55ffc
2022-02-08vp8[loongarch]: Optimize vp8_loop/sixtap, vpx_dc with LSX.Lu Wang
1. vp8_loop_filter_mbh, vp8_loop_filter_mbv 2. vp8_sixtap_predict16x16, vp8_sixtap_predict8x8 3. vpx_dc_predictor_16x16, vpx_dc_predictor_8x8 ./vpxdec --progress -o YUV_1920X1080.yuv original_1200f/VP8_1920X1080.webm before: 37.77fps after : 220.90fps Bug: webm:1755 Change-Id: I1a3ce16f0c872261d813b6531cfdf25bd59bb774
2021-12-21vpx_int_pro_row: normalize declaration w/aomJames Zern
this is a followup to: 7fbcee49d quiet -Warray-parameter warnings and conforms to aom in: 06e13e817 quiet -Warray-parameter warnings the sad functions are more varied in libvpx and will require a separate pass Change-Id: I765fd6704df615e836ba0b184ff8266ce926c394
2021-12-09quiet -Warray-parameter warningsJames Zern
w/gcc-11 this matches the definition of the function with the declaration Change-Id: I757b731b9560cb0b0ceec4ec258ec5af5a183b3d
2021-12-07clear -Wextra-semi/-Wextra-semi-stmt warnings x2James Zern
some additional neon file updates after: 31b954deb clear -Wextra-semi/-Wextra-semi-stmt warnings Bug: chromium:1257449 Change-Id: I3e2664f2bd8f6f7328ec91bf6595ba5fc09862bd
2021-12-02clear -Wextra-semi/-Wextra-semi-stmt warningsJames Zern
Bug: chromium:1257449 Change-Id: Ia9aafccc09b611521d4a7aedfe3723393a840c62
2021-11-08mem_sse2.h: loadu_uint32 -> loadu_int32James Zern
this changes the return to int32_t which matches the type with usage of this call as input to _mm_cvtsi32_si128(), _mm_set_epi32(), etc. fixes implicit conversion warning with clang-11 -fsanitize=undefined Change-Id: I1425f12d4f79155dd5d7af0eb00fbdb9f1940544
2021-11-08mem_sse2.h: storeu_uint32 -> storeu_int32James Zern
this changes the parameter to int32_t which matches the type with usage of this call using _mm_cvtsi128_si32() as a parameter. quiets an implicit conversion warning with clang-11 -fsanitize=undefined Change-Id: I1e9e9ffac5d2996962d29611458311221eca8ea0
2021-05-24Use 'ptrdiff_t' instead of 'int' for pointer offset parametersJonathan Wright
A number of the load/store functions in mem_neon.h use type 'int' for the 'stride' pointer offset parameter. This causes Clang to generate the following warning every time these functions are called with a wider type passed in for 'stride': warning: implicit conversion loses integer precision: 'ptrdiff_t' (aka 'long') to 'int' [-Wshorten-64-to-32] This patch changes all such instances of 'int' to 'ptrdiff_t'. Bug: b/181236880 Change-Id: I2e86b005219e1fbb54f7cf2465e918b7c077f7ee
2021-05-24Implement vpx_convolve8_avg_vert_neon using SDOT instructionJonathan Wright
Add an alternative AArch64 implementation of vpx_convolve8_avg_vert_neon for targets that implement the Armv8.4-A SDOT (signed dot product) instruction. The existing MLA-based implementation of vpx_convolve8_avg_vert_neon is retained and used on target CPUs that do not implement the SDOT instruction (or CPUs executing in AArch32 mode). The availability of the SDOT instruction is indicated by the feature macro __ARM_FEATURE_DOTPROD. Bug: b/181236880 Change-Id: I971c626116155e1384bff4c76fd3420312c7a15b
2021-05-24Merge transpose and permute in Neon SDOT vertical convolutionJonathan Wright
The original dot-product implementation of vpx_convolve8_vert_neon used a separate transpose before and after the convolution operation. This patch merges the first transpose with the TBL permute (necessary before using SDOT to compute the convolution) to significantly reduce the amount of data re-arrangement. This new approach also allows for more effective data re-use between loop iterations. Co-authored by: James Greenhalgh <james.greenhalgh@arm.com> Bug: b/181236880 Change-Id: I87fe4dadd312c3ad6216943b71a5410ddf4a1b5b
2021-05-18Implement vpx_convolve8_avg_horiz_neon using SDOT instructionJonathan Wright
Add an alternative AArch64 implementation of vpx_convolve8_avg_horiz_neon for targets that implement the Armv8.4-A SDOT (signed dot product) instruction. The existing MLA-based implementation of vpx_convolve8_avg_horiz_neon is retained and used on target CPUs that do not implement the SDOT instruction (or CPUs executing in AArch32 mode). The availability of the SDOT instruction is indicated by the feature macro __ARM_FEATURE_DOTPROD. Bug: b/181236880 Change-Id: Ib435107c47c485f325248da87ba5618d68b0c8ed
2021-05-13Optimize remaining mse and sse functions in variance_neon.cJonathan Wright
Implement sum of squared difference calculations in vpx_mse16x16_neon and vpx_get4x4sse_cs_neon using the ABD and UDOT instructions - instead of widening subtracts followed by a sequence of MLAs. The existing implementation is retained for use on CPUs that do not implement the Armv8.4-A UDOT instruction. This commit also updates the variable names used in the existing implementations to be more descriptive. Bug: b/181236880 Change-Id: Id4ad8ea7c808af1ac9bb5f1b63327ab487e4b1c7
2021-05-12Implement vertical convolution using Neon SDOT instructionJonathan Wright
Add an alternative AArch64 implementation of vpx_convolve8_vert_neon for targets that implement the Armv8.4-A SDOT (signed dot product) instruction. The existing MLA-based implementation of vpx_convolve8_vert_neon is retained and used on target CPUs that do not implement the SDOT instruction (or CPUs executing in AArch32 mode). The availability of the SDOT instruction is indicated by the feature macro __ARM_FEATURE_DOTPROD. Bug: b/181236880 Change-Id: Iebb8c77aba1d45b553b5112f3d87071fef3076f0
2021-05-12Implement Neon variance functions using UDOT instructionJonathan Wright
Accelerate Neon variance functions by implementing the sum of squares calculation using the Armv8.4-A UDOT instruction instead of 4 MLAs. The previous implementation is retained for use on CPUs that do not implement the Armv8.4-A dot product instructions. Bug: b/181236880 Change-Id: I9ab3d52634278b9b6f0011f39390a1195210bc75