Age | Commit message (Collapse) | Author |
|
* changes:
y4m_input_open: check allocs
fastssim,fs_ctx_init: check alloc
vp9_get_smooth_motion_field: check alloc
vp9_row_mt_alloc_rd_thresh: check alloc
simple_encode,init_encoder: check buffer_pool alloc
VP9RateControlRTC::Create: check segmentation_map alloc
vp9_speed_features.c: check allocations
vp9_alloc_motion_field_info: check motion_field_array alloc
vp9_enc_grp_get_next_job: check job queue alloc
vp9: check postproc_state.limits allocs
vp9,encode_tiles_buffer_alloc: fix allocation check
|
|
1. vpx_sad8x8x4d_lsx
2. vpx_sad32x64x4d_lsx
3. vpx_sad64x32x4d_lsx
Bug: webm:1755
Change-Id: I08a2b8717ec8623ffdd4451a04e68fa3a7228668
|
|
1. vpx_sad64x64_avg_lsx
2. vpx_sad32x32_avg_lsx
3. comp_avg_pred_lsx
Bug: webm:1755
Change-Id: I58dabdcdd4265bd6ebd5670db8a132d2e838683f
|
|
Change-Id: Ie087e8be1e943b94327ed520db447a0e3a927738
|
|
1. vpx_fdct16x16_lsx
2. vpx_get16x16var_lsx
3. vpx_variance16x16_lsx
Bug: webm:1755
Change-Id: I27090406dc28cfdca64760fea4bc16ae11b74628
|
|
1. vpx_sad16x16_lsx
2. vpx_sub_pixel_variance32x32_lsx
Bug: webm:1755
Change-Id: I9926ace710903993ccbb42caef320fa895e90127
|
|
1. vpx_lpf_horizontal_4_lsx
2. vpx_lpf_vertical_4_lsx
3. vpx_lpf_horizontal_4_dual_lsx
3. vpx_lpf_vertical_4_dual_lsx
Bug: webm:1755
Change-Id: I12e9f27cafd9514b24cfbf2354cc66c7d1238687
|
|
1. vpx_convolve8_avg_vert_lsx
2. vpx_convolve_copy_lsx
3. vpx_idct32x32_135_add_lsx
Bug: webm:1755
Change-Id: I6bdfe5836a91a5e361ab869b26641e86c5ebb68d
|
|
1. vpx_lpf_vertical_8_dual_lsx
2. vpx_lpf_horizontal_8_dual_lsx
Bug: webm:1755
Change-Id: I354df02cc215f36b4edf6558af0ff7fd6909deac
|
|
Change-Id: I593735bb7f88d63f2ddab57484099479c8759a3d
|
|
1. vpx_idct32x32_1024_add_lsx
2. vpx_idct32x32_34_add_lsx
3. vpx_idct32x32_1_add_lsx
Bug: webm:1755
Change-Id: I9c24f75e0d93613754d8e30da7e007b8d1374e60
|
|
1. vpx_fdct32x32_lsx
2. vpx_fdct32x32_rd_lsx
Bug: webm:1755
Change-Id: I83bce11c0d905cf137545a46cd756aef9cedce47
|
|
1. vpx_variance64x64_lsx
2. vpx_variance32x32_lsx
Bug: webm:1755
Change-Id: I45c5aa94cbbf7128473894a990d931acaa40e102
|
|
1. vpx_sad64x64x4d_lsx
2. vpx_sad32x32x4d_lsx
3. vpx_sad16x16x4d_lsx
4. vpx_sad64x64_lsx
5. vpx_sad32x32_lsx
Bug: webm:1755
Change-Id: Ief71c2216f697b261d7c1fc481c89c9f1a6098e6
|
|
* changes:
vp9[loongarch]: Optimize vpx_convolve8_avg_horiz_c
vp8[loongarch]: Optimize dequant_idct_add_y/uv_block
loongarch: Fix bugs
|
|
This reverts commit 2200039d33c49a9f7a5c438656df143755b022c4.
This causes failures with VP9/EndToEndTestLarge.EndtoEndPSNRTest/*; it
seems the assembly does not match the C code.
Bug: webm:1586
Change-Id: I4c63beebf88d4c12789d681b0d38014510b147fe
|
|
This reverts commit 89cfe3835c47dabf77d38edb3af190155984fa9a.
This is a prerequisite for reverting
2200039d33c49a9f7a5c438656df143755b022c4 which causes high bitdepth test
failures
Bug: webm:1586
Change-Id: I28f3b98f3339f3573b1492b88bf733dade133fc0
|
|
1. vpx_convolve8_avg_horiz_lsx
Bug: webm:1755
Change-Id: I0b6520be0afa1689da329f56ec6cd95c1730250c
|
|
Fix bugs from loopfilter_filters_lsx.c, vpx_convolve8_avg_lsx.c
Bug: webm:1755
Change-Id: I7ee8e367d66a49f3be10d7e417837d3b6ef50bdb
|
|
The only difference between the code is the clamp. For
8 bit it is purely an optimization. The values outside
this range will still saturate.
Change-Id: I2a770b140690d99e151b00957789bd72f7a11e13
|
|
|
|
The optimized quantize functions were already built to handle
highbd values. The only difference is the clamping. All highbd
functions expand to 32bits when running in highbd mode.
Removes vpx_highbd_quantize_32x32_sse2 as it is slower than the
C version in the worst case.
Bug: webm:1586
Change-Id: I49bf8a6a2041f78450bf43a4f655c67656b0f8d9
|
|
after:
d60b671a7 gcc 11 warning: mismatched bound
error C2719: 'sums': formal parameter with requested alignment of 32
won't be aligned
Change-Id: Iaba46d00ef2334a5e2d9ee69b5d03478fdc73a60
|
|
Whether a block is skipped is handled by mi->skip. x->skip_block
is kept exclusively to verify that the quantize functions are not
called for skip blocks.
Finishes the cleanup in 13eed991f
Bug: libvpx:1612
Change-Id: I1598c3b682d3c5e6c57a15fa4cb5df2c65b3a58a
|
|
These would compute the sum of absolute differences (sad) for a
group of 3 or 8 references. This was used as part of an exhaustive
search.
vp8 only uses these functions in speed 0 and best quality.
For vp9 this is only used with the --enable-non-greedy-mv
experiment.
This removes the 3- and 8-at-a-time optimized functions and uses
the fall back code which will process 1 or 4 (vpx_sadMxNx4d) at
a time.
For configure --target=x86_64-linux-gcc --enable-realtime-only:
libvpx.a
before: 3002424 after: 2937622 delta: 64802
after 'strip libvpx.a'
before: 2116998 after: 2073090 delta: 43908
Change-Id: I566d06e027c327b3bede68649dd551bba81a848e
|
|
Clean up a new build warning with gcc11:
argument 3 of type ‘const uint8_t * const[]’ with
mismatched bound [-Warray-parameter=]
Standardize sad functions with array sizes.
Change-Id: Iea4144e61368f6a8279e2f3ae96c78aff06c8b41
|
|
|
|
[NEON]
Added vpx_fdct4x4_pass1_neon(),
Added vpx_fdct8x8_pass1_notranspose_neon(),
Added vpx_fdct8x8_pass1_neon() to avoid code duplication
Refactored vpx_fdct4x4_neon() and vpx_dct8x8_neon() to use the above
Rename dct_body to vpx_fdct16x16_body to reuse later
Add transpose_s16_16x16()
I have run make test and all tests/configurations seem to pass.
Profiled using this command on an Ampere Altra VM:
sudo perf record -g ./vpxenc --codec=vp9 --height=1080 --width=1920 \
--fps=25/1 --limit=20 -o output.mkv \
../original_videos_Sports_1080P_Sports_1080P-0063.mkv --debug –rt
Before this optimization:
1.32% 1.32% vpxenc vpxenc [.] vpx_fdct4x4_neon
0.16% 0.16% vpxenc vpxenc [.] vpx_fdct4x4_c
0.79% 0.79% vpxenc vpxenc [.] vpx_fdct8x8_c
0.52% 0.52% vpxenc vpxenc [.] vpx_fdct8x8_neon
1.23% 1.23% vpxenc vpxenc [.] vpx_fdct16x16_c
0.54% 0.54% vpxenc vpxenc [.] vpx_fdct16x16_neon
So, even though a _neon() version exists, the C version was called \
as well. After this patch:
1.42% 1.36% vpxenc vpxenc [.] vpx_fdct4x4_neon
0.87% 0.82% vpxenc vpxenc [.] vpx_fdct8x8_neon
0.74% 0.74% vpxenc vpxenc [.] vpx_fdct16x16_neon
Change-Id: Id4e1dd315c67b4355fe4e5a1b59e181a349f16d0
|
|
1. vpx_convolve8_avg_lsx
2. vpx_convolve_avg_lsx
Bug: webm:1755
Change-Id: I4af5c362a94f11d0b5d1760e18326660bdbc0559
|
|
1. vpx_convolve8_lsx
2. vpx_convolve8_vert_lsx
3. vpx_convolve8_horiz_lsx
Bug: webm:1755
Change-Id: I9897e1ed6a904ac74d1078bd22b275af44db142d
|
|
Many of the features in ads2gas are no longer used.
Remove all patterns which are no longer used in
libvpx.
Simplify between the two to minimize differences.
Change-Id: Ia1151eb8b694cbe51845a1374a876cc7b798899c
|
|
1. vpx_lpf_vertical_8_lsx
2. vpx_lpf_horizontal_8_lsx
Bug: webm:1755
Change-Id: I6b05d6b1b2ac4d2a75beb9c9ca9700976fc3af55
|
|
Change-Id: I82c6bc16ea57c3f7ac5f4d212a12a5f70cb55ffc
|
|
1. vp8_loop_filter_mbh, vp8_loop_filter_mbv
2. vp8_sixtap_predict16x16, vp8_sixtap_predict8x8
3. vpx_dc_predictor_16x16, vpx_dc_predictor_8x8
./vpxdec --progress -o YUV_1920X1080.yuv original_1200f/VP8_1920X1080.webm
before: 37.77fps
after : 220.90fps
Bug: webm:1755
Change-Id: I1a3ce16f0c872261d813b6531cfdf25bd59bb774
|
|
this is a followup to:
7fbcee49d quiet -Warray-parameter warnings
and conforms to aom in:
06e13e817 quiet -Warray-parameter warnings
the sad functions are more varied in libvpx and will require a separate
pass
Change-Id: I765fd6704df615e836ba0b184ff8266ce926c394
|
|
w/gcc-11
this matches the definition of the function with the declaration
Change-Id: I757b731b9560cb0b0ceec4ec258ec5af5a183b3d
|
|
some additional neon file updates after:
31b954deb clear -Wextra-semi/-Wextra-semi-stmt warnings
Bug: chromium:1257449
Change-Id: I3e2664f2bd8f6f7328ec91bf6595ba5fc09862bd
|
|
Bug: chromium:1257449
Change-Id: Ia9aafccc09b611521d4a7aedfe3723393a840c62
|
|
this changes the return to int32_t which matches the type with usage
of this call as input to _mm_cvtsi32_si128(), _mm_set_epi32(), etc.
fixes implicit conversion warning with clang-11 -fsanitize=undefined
Change-Id: I1425f12d4f79155dd5d7af0eb00fbdb9f1940544
|
|
this changes the parameter to int32_t which matches the type with usage
of this call using _mm_cvtsi128_si32() as a parameter. quiets an
implicit conversion warning with clang-11 -fsanitize=undefined
Change-Id: I1e9e9ffac5d2996962d29611458311221eca8ea0
|
|
A number of the load/store functions in mem_neon.h use type 'int' for
the 'stride' pointer offset parameter. This causes Clang to generate
the following warning every time these functions are called with a
wider type passed in for 'stride':
warning: implicit conversion loses integer precision: 'ptrdiff_t'
(aka 'long') to 'int' [-Wshorten-64-to-32]
This patch changes all such instances of 'int' to 'ptrdiff_t'.
Bug: b/181236880
Change-Id: I2e86b005219e1fbb54f7cf2465e918b7c077f7ee
|
|
Add an alternative AArch64 implementation of
vpx_convolve8_avg_vert_neon for targets that implement the Armv8.4-A
SDOT (signed dot product) instruction.
The existing MLA-based implementation of vpx_convolve8_avg_vert_neon
is retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Bug: b/181236880
Change-Id: I971c626116155e1384bff4c76fd3420312c7a15b
|
|
The original dot-product implementation of vpx_convolve8_vert_neon
used a separate transpose before and after the convolution operation.
This patch merges the first transpose with the TBL permute (necessary
before using SDOT to compute the convolution) to significantly reduce
the amount of data re-arrangement. This new approach also allows for
more effective data re-use between loop iterations.
Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>
Bug: b/181236880
Change-Id: I87fe4dadd312c3ad6216943b71a5410ddf4a1b5b
|
|
Add an alternative AArch64 implementation of
vpx_convolve8_avg_horiz_neon for targets that implement the Armv8.4-A
SDOT (signed dot product) instruction.
The existing MLA-based implementation of vpx_convolve8_avg_horiz_neon
is retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Bug: b/181236880
Change-Id: Ib435107c47c485f325248da87ba5618d68b0c8ed
|
|
Implement sum of squared difference calculations in vpx_mse16x16_neon
and vpx_get4x4sse_cs_neon using the ABD and UDOT instructions -
instead of widening subtracts followed by a sequence of MLAs.
The existing implementation is retained for use on CPUs that do not
implement the Armv8.4-A UDOT instruction. This commit also updates
the variable names used in the existing implementations to be more
descriptive.
Bug: b/181236880
Change-Id: Id4ad8ea7c808af1ac9bb5f1b63327ab487e4b1c7
|
|
Add an alternative AArch64 implementation of vpx_convolve8_vert_neon
for targets that implement the Armv8.4-A SDOT (signed dot product)
instruction.
The existing MLA-based implementation of vpx_convolve8_vert_neon is
retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Bug: b/181236880
Change-Id: Iebb8c77aba1d45b553b5112f3d87071fef3076f0
|
|
Accelerate Neon variance functions by implementing the sum of squares
calculation using the Armv8.4-A UDOT instruction instead of 4 MLAs.
The previous implementation is retained for use on CPUs that do not
implement the Armv8.4-A dot product instructions.
Bug: b/181236880
Change-Id: I9ab3d52634278b9b6f0011f39390a1195210bc75
|
|
Implementing sad16_neon using ABD, UDOT instead of ABAL, ABAL2 saves
a cycle and removes resource contention for a single SIMD pipe on
modern out-of-order Arm CPUs. The UDOT accumulation into 32-bit
elements also allows for a faster reduction at the end of each SAD
function.
The existing implementation is retained for CPUs that do not
implement the Armv8.4-A UDOT instruction, and CPUs executing in
AArch32 mode.
Bug: b/181236880
Change-Id: Ibd0da46e86751d2f808c7b1e424f82b046a1aa6f
|
|
Use the AArch64-only ADDV and ADDLV instructions to accelerate
reductions that add across a Neon vector in sum_neon.h. This commit
also refactors the inline functions to return a scalar instead of a
vector - allowing for optimization of the surrounding code at each
call site.
Bug: b/181236880
Change-Id: Ieed2a2dd3c74f8a52957bf404141ffc044bd5d79
|
|
Manually unrolling the inner loop is sufficient to stop the compiler
getting confused and emitting inefficient code.
Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>
Bug: b/181236880
Change-Id: I860768ce0e6c0e0b6286d3fc1b94f0eae95d0a1a
|