libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2019-07-17	Fix comment typos.	Wan-Teh Chang
	Fix comment typos in transpose_s16_4x4q() and transpose_u16_4x4q(). Change-Id: I21bcc1fb3fb880798e5a3927c3dbe81dd518c83b
2019-03-21	vp9 postproc neon: Remove the condition on mb cols.	Jerome Jiang
	VP8 and VP9 have different padding on buffer stride. VP8 microblock is 16x16 so the buffer stride needs to be divisible by 16. Thus UV buffer stride is divisible by 8. VP9 microblock is 8x8 so the buffer stride is only extended to be divisible by 8. Then UV buffer stride isn't divisible by 8. Change-Id: I6fa953feb951f2fb2e48f72a623786b85e23822f
2019-01-09	highbd idct: resolve missing declarations	Johann
	BUG=webm:1584 Change-Id: I596f5f0e1a1c152493cd8177b32d416cc79937e0
2019-01-07	arm neon: resolve missing declarations	Johann
	BUG=webm:1584 Change-Id: I2dcf39f2327b72b58be72c27f952ea781a790dd3
2018-12-03	quantize neon: fix hbd builds	Johann
	BUG=webm:1448 Change-Id: I2140fb9b6ce92716d2d9509f3031244088a62127
2018-11-12	quantize: use aarch64 vmaxv	Johann
	Simplify max value calculation on aarch64 by using vmaxv. Much faster for 4x4 but diminishing returns as the block size grows. Only the vp9 quantize has a speed test hooked up. Anticipate similar results for the other quantize versions. Before: [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2 [ BENCH ] Bypass calculations 4x4 31.6 ms ( ±0.0 ms ) [ BENCH ] Full calculations 4x4 31.6 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 8x8 17.7 ms ( ±0.0 ms ) [ BENCH ] Full calculations 8x8 17.7 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 16x16 14.2 ms ( ±0.0 ms ) [ BENCH ] Full calculations 16x16 14.2 ms ( ±0.0 ms ) [ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1906 ms) [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3 [ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms ) [ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms ) After: [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2 [ BENCH ] Bypass calculations 4x4 29.1 ms ( ±0.0 ms ) [ BENCH ] Full calculations 4x4 29.1 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 8x8 16.9 ms ( ±0.0 ms ) [ BENCH ] Full calculations 8x8 16.9 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 16x16 14.1 ms ( ±0.0 ms ) [ BENCH ] Full calculations 16x16 14.1 ms ( ±0.0 ms ) [ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1803 ms) [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3 [ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms ) [ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms ) Change-Id: Ic95812b3fdbd4e47b4dcb8ed46c68a9617de38d2
2018-11-01	clang-tidy: fix vpx_dsp parameters	Johann
	BUG=webm:1444 Change-Id: Iee19be068afc6c81396c79218a89c469d2e66207
2018-10-31	clang-tidy: normalize variance functions	Johann
	Always use src/ref and _ptr/_stride suffixes. Normalize to [xy]_offset and second_pred. Drop some stray source/recon_strides. BUG=webm:1444 Change-Id: I32362a50988eb84464ab78686348610ea40e5c80
2018-09-15	cosmetics: normalize include guards	James Zern
	use the recommended format [1] of: <PROJECT>_<PATH>_<FILE>_H_ [1] https://google.github.io/styleguide/cppguide.html#The__define_Guard "All header files should have #define guards to prevent multiple inclusion. The format of the symbol name should be <PROJECT>_<PATH>_<FILE>_H_." Change-Id: I2e8ab0b32fb23c30fa43cff5fec12d043c0d2037
2018-07-28	arm: Consistently use unified syntax for asm	Martin Storsjo
	The ".syntax unified" directives in a few source files aren't valid ADS assembly directives, and they break compilation for windows, since ads2armasm_ms.pl doesn't handle them. Explicity add them via ads2gas.pl and ads2gas_apple.pl instead, and tweak one instruction to be valid unified syntax. Change-Id: I37f1709f163d11474597161fe02eb433859cb9b8
2018-07-26	Add New Neon Assemblies for Motion Compensation	Venkatarama NG. Avadhani
	Commit adds neon assemblies for motion compensation which show an improvement over the existing neon code. Performance Improvement - Platform Resolution 1 Thread 4 Threads Nexus 6 720p 12.16% 7.21% @2.65 GHz 1080p 18.00% 15.28% Change-Id: Ic0b0412eeb01c8317642b20bb99092c2f5baba37
2018-07-17	vpx_sum_squares_2d_i16_neon(): Make \|s2\| a uint64x1_t.	Raphael Kubo da Costa
	This fixes the build with at least GCC 7.3, where it was previously failing with: sum_squares_neon.c: In function 'vpx_sum_squares_2d_i16_neon': sum_squares_neon.c: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts s2 = vpaddl_u32(s1); ^~ sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t' s2 = vpaddl_u32(s1); ^ sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t' s2 = vadd_u64(vget_low_u64(s1), vget_high_u64(s1)); ^ sum_squares_neon.c: incompatible type for argument 1 of 'vget_lane_u64' return vget_lane_u64(s2, 0); ^~ The generated assembly was verified to remain identical with both GCC and LLVM. Bug: chromium:819249 Change-Id: I2778428ee1fee0a674d0d4910347c2a717de21ac
2018-05-10	vpx_subtract_block_neon: add explicit cast	James Zern
	quiets ptrdiff_t -> int conversion warning Change-Id: If6b545a736fc19e48e290961736b1618df97db3e
2018-05-11	Merge "Update vpx_subtract_block_neon()"	James Zern

2018-05-10	Update vpx_subtract_block_neon()	Linfeng Zhang
	Change-Id: Ie2ac06c090c8f92268e9a799e96aa5192a1bdcd2
2018-05-08	Update vpx_comp_avg_pred_neon()	Linfeng Zhang
	Separate width 4 and 8 cases to reduce jumps in loop in clang. Change-Id: I6ffc6f1555f2ad08b72a8dba35a78b9fd5f95a73
2018-05-08	Update SadMxNx4 NEON functions	Linfeng Zhang
	Change-Id: Ia313a6da00a05837fcd4de6ece31fa1c0016438c
2018-05-07	Add vpx_sum_squares_2d_i16_neon()	Linfeng Zhang
	Perf shows CPU time of this function dropped from 0.81% to 0.15%. Change-Id: I8a7649ca5c15af2fc65cfb848f5befa0cc5e64f2
2018-03-13	Add vp9_highbd_iht16x16_256_add_neon()	Linfeng Zhang
	BUG=webm:1403 Change-Id: I2293c11666786be276909d48ee78dacb40a89e25
2018-02-27	Add vp9_iht16x16_256_add_neon()	Linfeng Zhang
	BUG=webm:1403 Change-Id: I1413cc3dfcb62143ba04fe9b0f8d8b010fdf69b6
2018-02-26	Fix a bug in create_s16x4_neon()	Linfeng Zhang
	This bug exposes when 2nd argument is negative, and the higher 32 bits would be all 1s. Change-Id: I189ee8cd3753fde00a34847e7a37cde2caa4ba72
2018-02-20	Add vp9_highbd_iht8x8_16_add_neon()	Linfeng Zhang
	BUG=webm:1403 Change-Id: I11efb652f1aee371c71eee2d29e33793e4736832
2018-02-08	Update iadst NEON functions	Linfeng Zhang
	Use scalar multiply. No impact on clang, but improves gcc compiling. BUG=webm:1403 Change-Id: I4922e7e033d9e93282c754754100850e232e1529
2018-02-05	Add vp9_highbd_iht4x4_16_add_neon()	Linfeng Zhang
	BUG=webm:1403 Change-Id: Id9833e985fb70958cf4bde38f8e6303ed83c12f9
2018-01-29	Update vp9_iht8x8_64_add_neon()	Linfeng Zhang
	Change-Id: Ie70ed8b9273df5e1fd06bc93cb469e80630941d2
2018-01-29	Clean dct_const_round_shift() related neon code	Linfeng Zhang
	Change-Id: I8f4e0fc6ecb77b623519f2dd3cd2886f89218ddd
2018-01-24	cosmetic: clean idct neon functions	Linfeng Zhang
	Change-Id: I9c7c52567850aded0437b13ba1260e94441bc49d
2018-01-18	clang-format v5.0.0 vpx_dsp/	Johann
	Remove comments above #define statements because they get indented unnecessarily. https://bugs.llvm.org/show_bug.cgi?id=35930 Add blank lines to prevent comments from being treated as blocks. Change-Id: I04dce21b2a10e13b8dc07411a0019c098f6dd705
2017-10-26	vpx: hadamard: use ptrdiff_t instead of int for stride	Scott LaVarnway
	Eliminates the following instruction for the x86 (64 bit) intrinsic code: movslq %esi,%rax Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae
2017-09-27	Merge "fix signed integer overflow of idct"	James Zern

2017-09-27	fix signed integer overflow of idct	Linfeng Zhang
	Exposed by fuzz test in high bitdepth. The bug is introduced in commit 64653fa. BUG=webm:1466 Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
2017-09-26	Add vpx_scaled_2d_neon()	Linfeng Zhang
	BUG=webm:1419 Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
2017-09-26	Merge changes Ib9105462,Idfac00ed,If8d8a0e2	Linfeng Zhang
	* changes: cosmetics: NEON scaling code Refactor convolve NEON code Refactor convolve code
2017-09-20	Remove the unnecessary cast of (int16_t)cospi_{1...31}_64	Linfeng Zhang
	BUG=webm:1450 Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
2017-09-20	Remove the unnecessary upcasts of (int)cospi_{1...31}_64	Linfeng Zhang
	BUG=webm:1450 Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858
2017-09-20	Change cospi_{1...31}_64 from tran_high_t to tran_coef_t	Linfeng Zhang
	The unnecessary upcast to (int) will be cleaned later. BUG=webm:1450 Change-Id: Ia234575206d5a74540526924b06ed3939322d063
2017-09-19	cosmetics: NEON scaling code	Linfeng Zhang
	Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618
2017-09-19	Refactor convolve NEON code	Linfeng Zhang
	Rename a couple of hbd static functions. Move the position of NEON function convolve8_4(). Change-Id: Idfac00edf2e99cdd8e0a73b9f895402f60be6349
2017-09-11	Add 4 to 1 scaling NEON optimization	Linfeng Zhang
	BUG=webm:1419 Change-Id: If82a93935d2453e61b7647aae70983db1740bec7
2017-09-06	Refactor convolve8 NEON functions	Linfeng Zhang
	Change-Id: I4ac576875c91fee7cb150d298fae4a2c156d374c
2017-09-05	Remove get_filter_base() and get_filter_offset() in convolve	Linfeng Zhang
	so that the convolve functions are independent of table alignment. Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
2017-08-23	Merge "quantize neon: round dqcoeff towards zero"	Johann Koenig

2017-08-23	quantize neon: round dqcoeff towards zero	Johann
	Add 1 if negative to get dqcoeff to round towards zero. 10-15% faster than converting to positive before shifting. Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d
2017-08-21	quantize: ignore skip_block in arm	Johann
	Change-Id: Icfb70687476b2edb25d255793ba325b261d40584
2017-08-08	neon: vpx_quantize_b_32x32	Johann
	With skip block the neon is about twice as fast as C. The neon has no shortcut for coeff < zbin so it always takes the same amount of time. Even if the C can take the shortcut, it is over twice as fast in neon. If it can't, that gap increases to over 10x. BUG=webm:1426 Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
2017-07-31	neon: vpx_quantize_b	Johann
	With skip block or coeff < zbin it is about twice as fast as C. If most coeff values are > zbin it is about 10-15x as fast as C. BUG=webm:1426 Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
2017-07-12	sad4d neon: 64x[32,64]	Johann
	Rewrite 64x64. BUG=webm:1425 Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf
2017-07-12	sad4d neon: 32x[16,32,64]	Johann
	Rewrite 32x32. Use half the accumulator registers. BUG=webm:1425 Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13
2017-07-12	sad4d neon: 16x[8,16,32]	Johann
	Rewrite 16x16. Use half the accumulator registers. BUG=webm:1425 Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2
2017-07-12	sad4d neon: 8x[4,8,16]	Johann
	BUG=webm:1425 Change-Id: I7de2500cca4b621f21478c4b0333c56d76dbc9a4