summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2022-08-26highbd_variance_neon,cosmetics: reorder a few linesJames Zern
Change-Id: Ia6fa54652d7f94687e64108482bb0f28ca06cf49
2022-08-26Merge "[NEON] Add highbd *variance* functions" into mainJames Zern
2022-08-25[NEON] Add highbd *variance* functionsKonstantinos Margaritis
Total gain for 12-bit encoding: * ~7.2% for best profile * ~5.8% for rt profile Change-Id: I5b70415fb89d1bbb02a0c139eb317ba6b08adede
2022-08-24Merge "[NEON] Improve vpx_quantize_b* functions" into mainJames Zern
2022-08-23.clang-format: update to clang-format-11clang-format
only store the deltas from --style Google in the file and reapply using Debian clang-format version 11.1.0-6+build1 Bug: b/229626362 Change-Id: I3e18a2e7c17a90a48405b3cf1b37ebc652aba0db
2022-08-23[NEON] Improve vpx_quantize_b* functionsKonstantinos Margaritis
Slight optimization, prefetch gives a 1% improvement in 1st pass Change-Id: Iba4664964664234666406ab53893e02d481fbe61
2022-08-22Merge "highbd_quantize_neon.c: remove unneeded assert.h" into mainJames Zern
2022-08-22Merge changes Iabed118b,I60a384b2 into mainJames Zern
* changes: use VPX_NO_UNSIGNED_SHIFT_CHECK with entropy functions compiler_attributes.h: add VPX_NO_UNSIGNED_SHIFT_CHECK
2022-08-22[NEON] Add vpx_highbd_subtract_block functionKonstantinos Margaritis
Total gain for 12-bit encoding: * ~1% for best and rt profile Change-Id: I4039120dc570baab1ae519a5e38b1acff38d81f0
2022-08-22[NEON] Added vpx_highbd_sad* functionsKonstantinos Margaritis
Total gain for 12-bit encoding: * ~7.8% for best profile * ~10% for rt profile Change-Id: I89eda5c4372a5b628c9df84cdeb4c8486fc44789
2022-08-22highbd_quantize_neon.c: remove unneeded assert.hJames Zern
Change-Id: I041f5fb23b856a2b519669b5bf8a40d3772b4a6e
2022-08-20[NEON] Added vpx_highbd_quantize_b* functionsKonstantinos Margaritis
Total gain for 12-bit encoding: * ~4.8% for best profile * ~6.2% for rt profile Change-Id: I61e646ab7aedf06a25db1365d6d1cf7b05101c21
2022-08-18use VPX_NO_UNSIGNED_SHIFT_CHECK with entropy functionsJames Zern
these shift values off the most significant bit as part of the process; vp8_regular_quantize_b_sse4_1 is included here for a special case of mask creation quiets warnings of the form: vp8/decoder/dboolhuff.h:81:11: runtime error: left shift of 2373679303235599696 by 3 places cannot be represented in type 'VP8_BD_VALUE' (aka 'unsigned long') vp8/encoder/bitstream.c:257:18: runtime error: left shift of 2147493041 by 1 places cannot be represented in type 'unsigned int' vp8/encoder/x86/quantize_sse4.c:114:18: runtime error: left shift of 4294967294 by 1 places cannot be represented in type 'unsigned int' vp9/encoder/vp9_pickmode.c:1632:41: runtime error: left shift of 4294967295 by 1 places cannot be represented in type 'unsigned int' Bug: b/229626362 Change-Id: Iabed118b2a094232783e5ad0e586596d874103ca
2022-08-18loopfilter.c: normalize flat func param typeJames Zern
flat/flat2 are stored as int8_t as returned by the filter_mask* functions. this quiets integer sanitizer warnings of the form: vpx_dsp/loopfilter.c:197:28: runtime error: implicit conversion from type 'int8_t' (aka 'signed char') of value -1 (8-bit, signed) to type 'uint8_t' (aka 'unsigned char') changed the value to 255 (8-bit, unsigned) Bug: b/229626362 Change-Id: Iacb6ae052d4cb2b6e0ebccbacf59ece9501d3b5f
2022-08-16highbd_quantize_intrin_sse2: quiet int sanitizer warningsJames Zern
add a missing cast in ^ operations; quiets warnings of the form: implicit conversion from type 'int' of value -1 (32-bit, signed) to type 'unsigned int' changed the value to 4294967295 (32-bit, unsigned) Bug: b/229626362 Change-Id: I56f74981050b2c9d00bad20e68f1b73ce7454729
2022-08-16load_unaligned_u32: use an int w/_mm_cvtsi32_si128James Zern
this matches the type of the function parameter; quiets integer sanitizer warnings of the form: implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 3215646151 (32-bit, unsigned) to type 'int' changed the value to -1079321145 (32-bit, signed) Bug: b/229626362 Change-Id: Ia9a5dc5e1f57cbf4f8f8fa457bb674ef43369d37
2022-08-16variance_sse2.c: add some missing castsJames Zern
quiets integer sanitizer warnings of the form: ../vpx_dsp/x86/variance_sse2.c:100:10: runtime error: implicit conversion from type 'unsigned int' of value 4294966272 (32-bit, unsigned) to type 'int' changed the value to -1024 (32-bit, signed) Bug: b/229626362 Change-Id: I150cc0a6a6b85143c3bf96886686fe3a40897db5
2022-08-09VPX: Fix vp9_quantize_fp_avx2() VS build error.Scott LaVarnway
Add build fix for _mm256_extract_epi16() being undefined. Bug: b/237714063 Change-Id: I855b1828ce1b6b2b2f063fe097999481881bf074
2022-08-05VPX: Add vpx_subtract_block_avx2().Scott LaVarnway
~1.3x faster than vpx_subtract_block_sse2(). Based on aom_subtract_block_avx2(). Bug: b/241580104 Change-Id: I17da036363f213d53c6546c3e858e4c3cba44a5b
2022-07-29Provide Arm SDOT optimizations for SAD functionsKonstantinos Margaritis
Change-Id: I497ee1c45d1fc4d643cefad7d87e5aaacd77869c
2022-07-27x86: normalize type with _mm_cvtsi128_si32James Zern
prefer int in most cases w/clang -fsanitize=integer fixes warnings of the form: implicit conversion from type 'int' of value -809931979 (32-bit, signed) to type 'uint32_t' (aka 'unsigned int') changed the value to 3485035317 (32-bit, unsigned) Bug: b/229626362 Change-Id: I0c6604efc188f2660c531eddfc7aa10060637813
2022-07-27variance_avx2.c: fix implicit conversion warningsJames Zern
w/clang -fsanitize=integer fixes warnings of the form: implicit conversion from type 'int' of value -1323 (32-bit, signed) to type 'unsigned int' changed the value to 4294965973 (32-bit, unsigned) Bug: b/229626362 Change-Id: I7291d9bd5cacea0d88d9f4c4624c096764f4a472
2022-07-26VPX: Add vpx_highbd_quantize_b_32x32_avx2().Scott LaVarnway
Up to 11.78x faster than vpx_quantize_b_32x32_sse2() for full calculations. ~1.7% overall encoder improvement for the test clip used. Bug: b/237714063 Change-Id: Ib759056db94d3487239cb2748ffef1184a89ae18
2022-07-25VPX: Add vpx_highbd_quantize_b_avx2().Scott LaVarnway
Up to 3.61x faster than vpx_highbd_quantize_b_sse2() for full calculations. ~2.3% overall encoder improvement for the test clip used. Bug: b/237714063 Change-Id: I23f88d2a7f96aaa4103778372f4f552207f73cee
2022-07-25Merge "VPX: Add vpx_quantize_b_32x32_avx2()." into mainScott LaVarnway
2022-07-20avg_intrin_avx2: rm dead store in highbd_hadamard_8x8James Zern
missed in: 53dd1e8e7 avg_intrin_{sse2,avg2}: rm dead store in hadamard_8x8 Change-Id: I378e4a388ceb193a4cfee4d9d317fc62fcc4b39e
2022-07-19VPX: Add vpx_quantize_b_32x32_avx2().Scott LaVarnway
Up to 1.36x faster than vpx_quantize_b_32x32_avx() for full calculations. Up to 1.29x faster for VP9_HIGHBITDEPTH builds. Bug: b/237714063 Change-Id: I97aa6a18d4dc2f3187b76800f91bbba7be447ef1
2022-07-18avg_intrin_{sse2,avg2}: rm dead store in hadamard_8x8James Zern
this quiets a couple static analysis warnings with clang 11: vpx_dsp/x86/avg_intrin_sse2.c:278:45: warning: Although the value stored to 'src_diff' is used in the enclosing expression, the value is never actually read from 'src_diff' [deadcode.DeadStores] src[7] = _mm_load_si128((const __m128i *)(src_diff += src_stride)); ^ ~~~~~~~~~~ vpx_dsp/x86/avg_intrin_avx2.c:307:49: warning: Although the value stored to 'src_diff' is used in the enclosing expression, the value is never actually read from 'src_diff' [deadcode.DeadStores] src[7] = _mm256_loadu_si256((const __m256i *)(src_diff += src_stride)); ^ ~~~~~~~~~~ Bug: b/229626362 Change-Id: I4b0201bd39775885df0afc03fa5da70910b9dad6
2022-07-18vpx_int_pro_row_c: add an assert for heightJames Zern
this quiets a static analysis warning with clang 11: vpx_dsp/avg.c:353:15: warning: Assigned value is garbage or undefined [core.uninitialized.Assign] hbuf[idx] /= norm_factor; ^ ~~~~~~~~~~~ the same fix was applied in libaom: 1ad0889bc aom_int_pro_row_c: add an assert for height Bug: b/229626362 Change-Id: Ic8a249f866b33b02ec9f378581e51ac104d97169
2022-07-11VPX: Add vpx_quantize_b_avx2().Scott LaVarnway
Up to 1.58x faster than vpx_quantize_b_avx() depending on the size. Bug: b/237714063 Change-Id: I595a6bb32ebee63f69f27b5a15322fdeae1bf70e
2022-06-01vp9,encoder: fix some integer sanitizer warningsJames Zern
the issues fixed in this change are related to implicit conversions between int / unsigned int: vp9/encoder/vp9_segmentation.c:42:36: runtime error: implicit conversion from type 'int' of value -9 (32-bit, signed) to type 'unsigned int' changed the value to 4294967287 (32-bit, unsigned) vpx_dsp/x86/sum_squares_sse2.c:36:52: runtime error: implicit conversion from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type 'int' changed the value to -1 (32-bit, signed) vpx_dsp/x86/sum_squares_sse2.c:36:67: runtime error: implicit conversion from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type 'int' changed the value to -1 (32-bit, signed) vp9/encoder/x86/vp9_diamond_search_sad_avx.c:81:45: runtime error: implicit conversion from type 'uint32_t' (aka 'unsigned int') of value 4290576316 (32-bit, unsigned) to type 'int' changed the value to -4390980 (32-bit, signed) vp9/encoder/vp9_rdopt.c:3472:31: runtime error: implicit conversion from type 'int' of value -1024 (32-bit, signed) to type 'uint16_t' (aka 'unsigned short') changed the value to 64512 (16-bit, unsigned) unsigned is forced for masks and int is used with intel intrinsics Bug: webm:1767 Change-Id: Icfa4179e13bc98a36ac29586b60d65819d3ce9ee Fixed: webm:1767
2022-05-25loongarch: Remove redundant codeyuanhecai
Simplify architecture support code and remove redundant code to improve efficiency. Bug: webm:1755 Change-Id: I03bc251aca115b0379fe19907abd165e0876355b
2022-05-20loongarch: Modify the representation of macrosyuanhecai
Some macros have been changed to "#define do {...} While (0)", change the rest to "static INLINE ..." Bug: webm:1755 Change-Id: I445ac0c543f12df38f086b479394b111058367d0
2022-05-19loongarch: Reduce the number of instructionsyuanhecai
Replace some redundant instructions to improve the efficiency of the program. 1. txfm_macros_lsx.h 2. vpx_convolve8_avg_lsx.c 3. vpx_convolve8_horiz_lsx.c 4. vpx_convolve8_lsx.c 5. vpx_convolve8_vert_lsx.c 6. vpx_convolve_copy_lsx.c 7. vpx_convolve_lsx.h Bug: webm:1755 Change-Id: I9b7fdf6900338a26f9b1775609ad387648684f3d
2022-05-18vp9[loongarch]: Optimize vpx_quantize_b/b_32x32yuanhecai
1. vpx_quantize_b_lsx 2. vpx_quantize_b_32x32_lsx Bug: webm:1755 Change-Id: I476c8677a2c2aed7248e088e62c3777c9bed2adb
2022-05-17vp8[loongarch]: Optimize fdct8x4/diamond_search_sadyuanhecai
1. vp8_short_fdct8x4_lsx 2. vp8_diamond_search_sad_lsx 3. vpx_sad8x8_lsx Bug: webm:1755 Change-Id: Ic9df84ead2d4fc07ec58e9730d6a12ac2b2d31c1
2022-05-17vp8[loongarch]: Optimize sub_pixel_variance8x8/16x16yuanhecai
1. vpx_sub_pixel_variance8x8_lsx 1. vpx_sub_pixel_variance16x16_lsx 2. vpx_mse16x16_lsx Bug: webm:1755 Change-Id: Iaedd8393c950c13042a0597d0d47b534a2723317
2022-05-17vp8[loongarch]: Optimize vp8 encoding partial functionHao Chen
1. vp8_short_fdct4x4 2. vp8_regular_quantize_b 3. vp8_block_error 4. vp8_mbblock_error 5. vpx_subtract_block Bug: webm:1755 Change-Id: I3dbfc7e3937af74090fc53fb4c9664e6cdda29ef
2022-05-13vp9[loongarch]: Optimize avg_variance64x64/variance8x8yuanhecai
1. vpx_variance8x8_lsx 2. vpx_sub_pixel_avg_variance64x64_lsx Bug: webm:1755 Change-Id: I7d68c7f2f5c8d27dc31cfd32298aeefb68f5d560
2022-05-13vp9[loongarch]: Optimize fdct4x4/8x8_lsxyuanhecai
1. vpx_fdct4x4_lsx 2. vpx_fdct8x8_lsx Bug: webm:1755 Change-Id: If283fc08f9bedcbecd2c4052adb210f8fe00d4f0
2022-05-13vp9[loongarch]: Optimize vpx_hadamard_16x16/8x8yuanhecai
1. vpx_hadamard_16x16_lsx 2. vpx_hadamard_8x8_lsx Bug: webm:1755 Change-Id: I3b1e0a2c026c3806b7bbbd191d0edf0e78912af7
2022-04-28Merge changes I99ee0ef3,Ie087e8be,I6b19d016,I6fb7771d,I54f83733, ... into mainJames Zern
* changes: y4m_input_open: check allocs fastssim,fs_ctx_init: check alloc vp9_get_smooth_motion_field: check alloc vp9_row_mt_alloc_rd_thresh: check alloc simple_encode,init_encoder: check buffer_pool alloc VP9RateControlRTC::Create: check segmentation_map alloc vp9_speed_features.c: check allocations vp9_alloc_motion_field_info: check motion_field_array alloc vp9_enc_grp_get_next_job: check job queue alloc vp9: check postproc_state.limits allocs vp9,encode_tiles_buffer_alloc: fix allocation check
2022-04-28vp9[loongarch]: Optimize sad8x8/32x64/64x32x4dyuanhecai
1. vpx_sad8x8x4d_lsx 2. vpx_sad32x64x4d_lsx 3. vpx_sad64x32x4d_lsx Bug: webm:1755 Change-Id: I08a2b8717ec8623ffdd4451a04e68fa3a7228668
2022-04-28vp9[loongarch]: Optimize sad64x64/32x32_avg,comp_avg_predyuanhecai
1. vpx_sad64x64_avg_lsx 2. vpx_sad32x32_avg_lsx 3. comp_avg_pred_lsx Bug: webm:1755 Change-Id: I58dabdcdd4265bd6ebd5670db8a132d2e838683f
2022-04-26fastssim,fs_ctx_init: check allocJames Zern
Change-Id: Ie087e8be1e943b94327ed520db447a0e3a927738
2022-04-26vp9[loongarch]: Optimize fdct/get/variance16x16yuanhecai
1. vpx_fdct16x16_lsx 2. vpx_get16x16var_lsx 3. vpx_variance16x16_lsx Bug: webm:1755 Change-Id: I27090406dc28cfdca64760fea4bc16ae11b74628
2022-04-24vp9[loongarch]: Optimize sub_pixel_variance32x32/sad16x16yuanhecai
1. vpx_sad16x16_lsx 2. vpx_sub_pixel_variance32x32_lsx Bug: webm:1755 Change-Id: I9926ace710903993ccbb42caef320fa895e90127
2022-04-22vp9[loongarch]: Optimize horizontal/vertical_4/dualyuanhecai
1. vpx_lpf_horizontal_4_lsx 2. vpx_lpf_vertical_4_lsx 3. vpx_lpf_horizontal_4_dual_lsx 3. vpx_lpf_vertical_4_dual_lsx Bug: webm:1755 Change-Id: I12e9f27cafd9514b24cfbf2354cc66c7d1238687
2022-04-22vp9[loongarch]: Optimize convolve8_avg_vert/convolve_copyyuanhecai
1. vpx_convolve8_avg_vert_lsx 2. vpx_convolve_copy_lsx 3. vpx_idct32x32_135_add_lsx Bug: webm:1755 Change-Id: I6bdfe5836a91a5e361ab869b26641e86c5ebb68d
2022-04-22vp9[loongarch]: Optimize vertical/horizontal_8_dualyuanhecai
1. vpx_lpf_vertical_8_dual_lsx 2. vpx_lpf_horizontal_8_dual_lsx Bug: webm:1755 Change-Id: I354df02cc215f36b4edf6558af0ff7fd6909deac