summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2022-10-12[NEON] Add highbd FDCT 8x8 functionKonstantinos Margaritis
50% faster than C version in best/rt profiles Change-Id: I0f9504ed52b5d5f7722407e91108ed4056d66bc2
2022-10-12[NEON] Add highbd FDCT 4x4 functionKonstantinos Margaritis
~80% faster than C version for both best/rt profiles. Change-Id: Ibb3c8e1862131d2a020922420d53c66b31d5c2c3
2022-10-12[NEON] Move helper functions for reuseKonstantinos Margaritis
Move all butterfly functions to fdct_neon.h Slightly optimize load/scale/cross functions in fdct 16x16. These will be reused in highbd variants. Change-Id: I28b6e0cc240304bab6b94d9c3f33cca77b8cb073
2022-10-10[NEON] move transpose_8x8 to reuseKonstantinos Margaritis
Change-Id: I3915b6c9971aedaac9c23f21fdb88bc271216208
2022-10-10Merge "[NEON] highbd partial DCT functions" into mainJames Zern
2022-10-10[NEON] highbd partial DCT functionsKonstantinos Margaritis
Change-Id: I7dd4e698469562f5b1f948cc36f8403b490dcb6a
2022-10-07Add vpx_highbd_sad64x{64,32}_avx2.Scott LaVarnway
~2.8x faster than the sse2 version. Bug: b/245917257 Change-Id: Ibc8e5d030ec145c9a9b742fff98fbd9131c9ede4
2022-10-06Add vpx_highbd_sad32x{64,32,16}_avx2.Scott LaVarnway
2.7x to 3.1x faster than the sse2 version. Bug: b/245917257 Change-Id: Idff3284932f7ee89d036f38893205bf622a159a3
2022-10-05Add vpx_highbd_sad16x{32,16,8}_avx2.Scott LaVarnway
1.9x to 2.4x faster than the sse2 version. Bug: b/245917257 Change-Id: I686452772f9b72233930de2207af36a0cd72e0bb
2022-09-30vpx_subpixel_8t_intrin_avx2.c: quiet -WuninitializedScott LaVarnway
warning: ‘s2[3]’ may be used uninitialized and warning: ‘s1[3]’ may be used uninitialized The warnings exposed unused code. Change-Id: I75cf1f9db75e811cb42e2f143be1ad76f3e4dee9
2022-09-26quantize: standardize vp9_quantize_fp_sse2Johann
Match style for vpx_quantize_b_sse2 and prepare to rewrite ssse3 version in intrinsics. Need to evaluate the value of threshold breakout before going further. Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b
2022-09-23quantize: increase iscan by 1Johann
All of the assembly adds 1 to iscan to convert from a 0 based array to the EOB value. Add 1 to all iscan values and remove the extra instructions from the assembly. Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9
2022-09-21Merge "post_proc_sse2.c: quiet -Wuninitialized" into mainScott LaVarnway
2022-09-21post_proc_sse2.c: quiet -WuninitializedScott LaVarnway
In file included from ../libvpx/vpx_dsp/x86/post_proc_sse2.c:12: In function ‘_mm_add_epi16’, inlined from ‘vpx_mbpost_proc_down_sse2’ at ../libvpx/vpx_dsp/x86/post_proc_sse2.c:88:13: /usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h:1060:35: warning: ‘below_context’ may be used uninitialized [-Wmaybe-uninitialized] 1060 | return (__m128i) ((__v8hu)__A + (__v8hu)__B); | ^~~~~~~~~~~ ../libvpx/vpx_dsp/x86/post_proc_sse2.c: In function ‘vpx_mbpost_proc_down_sse2’: ../libvpx/vpx_dsp/x86/post_proc_sse2.c:39:13: note: ‘below_context’ was declared here 39 | __m128i below_context; Change-Id: I2fc592f121c4e85d0aff1640014c3444f5eb09fd
2022-09-17fwd_txfm: remove avx2 file from non-hbdJohann
Resolves warning on OS X: file: libvpx_g.a(fwd_txfm_avx2.c.o) has no symbols Change-Id: Ie8b290bb3ed329656beb883d552c98353f1ed5e5
2022-09-14Add vpx_highbd_sad64x{64,32}x4d_avx2.Scott LaVarnway
~2x faster than the sse2 version. Bug: b/245917257 Change-Id: I4742950ab7b90d7f09e8d4687e1e967138acee39
2022-09-13Add vpx_highbd_sad32x{64,32,16}x4d_avx2.Scott LaVarnway
~2.4x faster than the sse2 version. Bug: b/245917257 Change-Id: I6df2bd62b46e5e175c8ad80daa6de3a1c313db0f
2022-09-09Add vpx_highbd_sad16x{32,16,8}x4d_avx2.Scott LaVarnway
1.98x to 2.3x faster than the sse2 version. Bug: b/245917257 Change-Id: Ie4f9bb942ffaf4af7d395fb5a5978b41aabfc93c
2022-09-06Merge "x86,cosmetics: prefer _mm_setzero_si128/_mm256_setzero_si256" into mainJames Zern
2022-09-02sad_neon: enable UDOT implementation w/aarch32James Zern
Change-Id: Ia28305ec5c61518b732cbacbd102acd2cb7f9d82
2022-09-02variance_neon.cc: simplify __ARM_FEATURE_DOTPROD checkJames Zern
missed in 447e27588 vpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD check + fix #if comments only check that the macro is defined, the value doesn't have any effect. from https://arm-software.github.io/acle/main/acle.html: 5.5.7.7. Dot Product extension __ARM_FEATURE_DOTPROD is defined if the dot product data manipulation instructions are supported and the vector intrinsics are available. Note that this implies: - __ARM_NEON == 1 Change-Id: I098b96421b7de5928bb3b11612ca1f32e7b6cbc4
2022-09-02x86,cosmetics: prefer _mm_setzero_si128/_mm256_setzero_si256James Zern
over *_set1_*(0) Change-Id: I136e1798a2ce286480ebb9418db67a2f1e92b9a2
2022-09-02vpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD checkJames Zern
only check that the macro is defined, the value doesn't have any effect. from https://arm-software.github.io/acle/main/acle.html: 5.5.7.7. Dot Product extension __ARM_FEATURE_DOTPROD is defined if the dot product data manipulation instructions are supported and the vector intrinsics are available. Note that this implies: - __ARM_NEON == 1 Change-Id: I164fe121ccefda99050a9b6a99738a2b518520f3
2022-09-01neon,load_unaligned_*: use dup for lane 0James Zern
this produces better assembly with gcc (11.3.0-3); no change in assembly using clang from the r24 android sdk (Android (8075178, based on r437112b) clang version 14.0.1 (https://android.googlesource.com/toolchain/llvm-project 8671348b81b95fc603505dfc881b45103bee1731) Change-Id: Ifec252d4f499f23be1cd94aa8516caf6b3fbbc11
2022-08-26highbd_variance_neon,cosmetics: reorder a few linesJames Zern
Change-Id: Ia6fa54652d7f94687e64108482bb0f28ca06cf49
2022-08-26Merge "[NEON] Add highbd *variance* functions" into mainJames Zern
2022-08-25[NEON] Add highbd *variance* functionsKonstantinos Margaritis
Total gain for 12-bit encoding: * ~7.2% for best profile * ~5.8% for rt profile Change-Id: I5b70415fb89d1bbb02a0c139eb317ba6b08adede
2022-08-24Merge "[NEON] Improve vpx_quantize_b* functions" into mainJames Zern
2022-08-23.clang-format: update to clang-format-11clang-format
only store the deltas from --style Google in the file and reapply using Debian clang-format version 11.1.0-6+build1 Bug: b/229626362 Change-Id: I3e18a2e7c17a90a48405b3cf1b37ebc652aba0db
2022-08-23[NEON] Improve vpx_quantize_b* functionsKonstantinos Margaritis
Slight optimization, prefetch gives a 1% improvement in 1st pass Change-Id: Iba4664964664234666406ab53893e02d481fbe61
2022-08-22Merge "highbd_quantize_neon.c: remove unneeded assert.h" into mainJames Zern
2022-08-22Merge changes Iabed118b,I60a384b2 into mainJames Zern
* changes: use VPX_NO_UNSIGNED_SHIFT_CHECK with entropy functions compiler_attributes.h: add VPX_NO_UNSIGNED_SHIFT_CHECK
2022-08-22[NEON] Add vpx_highbd_subtract_block functionKonstantinos Margaritis
Total gain for 12-bit encoding: * ~1% for best and rt profile Change-Id: I4039120dc570baab1ae519a5e38b1acff38d81f0
2022-08-22[NEON] Added vpx_highbd_sad* functionsKonstantinos Margaritis
Total gain for 12-bit encoding: * ~7.8% for best profile * ~10% for rt profile Change-Id: I89eda5c4372a5b628c9df84cdeb4c8486fc44789
2022-08-22highbd_quantize_neon.c: remove unneeded assert.hJames Zern
Change-Id: I041f5fb23b856a2b519669b5bf8a40d3772b4a6e
2022-08-20[NEON] Added vpx_highbd_quantize_b* functionsKonstantinos Margaritis
Total gain for 12-bit encoding: * ~4.8% for best profile * ~6.2% for rt profile Change-Id: I61e646ab7aedf06a25db1365d6d1cf7b05101c21
2022-08-18use VPX_NO_UNSIGNED_SHIFT_CHECK with entropy functionsJames Zern
these shift values off the most significant bit as part of the process; vp8_regular_quantize_b_sse4_1 is included here for a special case of mask creation quiets warnings of the form: vp8/decoder/dboolhuff.h:81:11: runtime error: left shift of 2373679303235599696 by 3 places cannot be represented in type 'VP8_BD_VALUE' (aka 'unsigned long') vp8/encoder/bitstream.c:257:18: runtime error: left shift of 2147493041 by 1 places cannot be represented in type 'unsigned int' vp8/encoder/x86/quantize_sse4.c:114:18: runtime error: left shift of 4294967294 by 1 places cannot be represented in type 'unsigned int' vp9/encoder/vp9_pickmode.c:1632:41: runtime error: left shift of 4294967295 by 1 places cannot be represented in type 'unsigned int' Bug: b/229626362 Change-Id: Iabed118b2a094232783e5ad0e586596d874103ca
2022-08-18loopfilter.c: normalize flat func param typeJames Zern
flat/flat2 are stored as int8_t as returned by the filter_mask* functions. this quiets integer sanitizer warnings of the form: vpx_dsp/loopfilter.c:197:28: runtime error: implicit conversion from type 'int8_t' (aka 'signed char') of value -1 (8-bit, signed) to type 'uint8_t' (aka 'unsigned char') changed the value to 255 (8-bit, unsigned) Bug: b/229626362 Change-Id: Iacb6ae052d4cb2b6e0ebccbacf59ece9501d3b5f
2022-08-16highbd_quantize_intrin_sse2: quiet int sanitizer warningsJames Zern
add a missing cast in ^ operations; quiets warnings of the form: implicit conversion from type 'int' of value -1 (32-bit, signed) to type 'unsigned int' changed the value to 4294967295 (32-bit, unsigned) Bug: b/229626362 Change-Id: I56f74981050b2c9d00bad20e68f1b73ce7454729
2022-08-16load_unaligned_u32: use an int w/_mm_cvtsi32_si128James Zern
this matches the type of the function parameter; quiets integer sanitizer warnings of the form: implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 3215646151 (32-bit, unsigned) to type 'int' changed the value to -1079321145 (32-bit, signed) Bug: b/229626362 Change-Id: Ia9a5dc5e1f57cbf4f8f8fa457bb674ef43369d37
2022-08-16variance_sse2.c: add some missing castsJames Zern
quiets integer sanitizer warnings of the form: ../vpx_dsp/x86/variance_sse2.c:100:10: runtime error: implicit conversion from type 'unsigned int' of value 4294966272 (32-bit, unsigned) to type 'int' changed the value to -1024 (32-bit, signed) Bug: b/229626362 Change-Id: I150cc0a6a6b85143c3bf96886686fe3a40897db5
2022-08-09VPX: Fix vp9_quantize_fp_avx2() VS build error.Scott LaVarnway
Add build fix for _mm256_extract_epi16() being undefined. Bug: b/237714063 Change-Id: I855b1828ce1b6b2b2f063fe097999481881bf074
2022-08-05VPX: Add vpx_subtract_block_avx2().Scott LaVarnway
~1.3x faster than vpx_subtract_block_sse2(). Based on aom_subtract_block_avx2(). Bug: b/241580104 Change-Id: I17da036363f213d53c6546c3e858e4c3cba44a5b
2022-07-29Provide Arm SDOT optimizations for SAD functionsKonstantinos Margaritis
Change-Id: I497ee1c45d1fc4d643cefad7d87e5aaacd77869c
2022-07-27x86: normalize type with _mm_cvtsi128_si32James Zern
prefer int in most cases w/clang -fsanitize=integer fixes warnings of the form: implicit conversion from type 'int' of value -809931979 (32-bit, signed) to type 'uint32_t' (aka 'unsigned int') changed the value to 3485035317 (32-bit, unsigned) Bug: b/229626362 Change-Id: I0c6604efc188f2660c531eddfc7aa10060637813
2022-07-27variance_avx2.c: fix implicit conversion warningsJames Zern
w/clang -fsanitize=integer fixes warnings of the form: implicit conversion from type 'int' of value -1323 (32-bit, signed) to type 'unsigned int' changed the value to 4294965973 (32-bit, unsigned) Bug: b/229626362 Change-Id: I7291d9bd5cacea0d88d9f4c4624c096764f4a472
2022-07-26VPX: Add vpx_highbd_quantize_b_32x32_avx2().Scott LaVarnway
Up to 11.78x faster than vpx_quantize_b_32x32_sse2() for full calculations. ~1.7% overall encoder improvement for the test clip used. Bug: b/237714063 Change-Id: Ib759056db94d3487239cb2748ffef1184a89ae18
2022-07-25VPX: Add vpx_highbd_quantize_b_avx2().Scott LaVarnway
Up to 3.61x faster than vpx_highbd_quantize_b_sse2() for full calculations. ~2.3% overall encoder improvement for the test clip used. Bug: b/237714063 Change-Id: I23f88d2a7f96aaa4103778372f4f552207f73cee
2022-07-25Merge "VPX: Add vpx_quantize_b_32x32_avx2()." into mainScott LaVarnway
2022-07-20avg_intrin_avx2: rm dead store in highbd_hadamard_8x8James Zern
missed in: 53dd1e8e7 avg_intrin_{sse2,avg2}: rm dead store in hadamard_8x8 Change-Id: I378e4a388ceb193a4cfee4d9d317fc62fcc4b39e