Age | Commit message (Collapse) | Author |
|
Seen with arm-linux-gnueabihf-gcc-8 (8.3.0 & 8.4.0)
Without reworking the code or adding an additional branch this warning
cannot be silenced otherwise. The loopfilter is only called when needed
for a block so these output pixels will be set.
BUG=b/176822719
Change-Id: I9cf6e59bd5de901e168867ccbe021d28d0c04933
|
|
Fix comment typos in transpose_s16_4x4q() and transpose_u16_4x4q().
Change-Id: I21bcc1fb3fb880798e5a3927c3dbe81dd518c83b
|
|
VP8 and VP9 have different padding on buffer stride.
VP8 microblock is 16x16 so the buffer stride needs to be divisible by
16. Thus UV buffer stride is divisible by 8.
VP9 microblock is 8x8 so the buffer stride is only extended to be
divisible by 8. Then UV buffer stride isn't divisible by 8.
Change-Id: I6fa953feb951f2fb2e48f72a623786b85e23822f
|
|
BUG=webm:1584
Change-Id: I596f5f0e1a1c152493cd8177b32d416cc79937e0
|
|
BUG=webm:1584
Change-Id: I2dcf39f2327b72b58be72c27f952ea781a790dd3
|
|
BUG=webm:1448
Change-Id: I2140fb9b6ce92716d2d9509f3031244088a62127
|
|
Simplify max value calculation on aarch64 by using vmaxv. Much
faster for 4x4 but diminishing returns as the block size grows.
Only the vp9 quantize has a speed test hooked up. Anticipate
similar results for the other quantize versions.
Before:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.2 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.2 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1906 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
After:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.1 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1803 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
Change-Id: Ic95812b3fdbd4e47b4dcb8ed46c68a9617de38d2
|
|
BUG=webm:1444
Change-Id: Iee19be068afc6c81396c79218a89c469d2e66207
|
|
Always use src/ref and _ptr/_stride suffixes.
Normalize to [xy]_offset and second_pred.
Drop some stray source/recon_strides.
BUG=webm:1444
Change-Id: I32362a50988eb84464ab78686348610ea40e5c80
|
|
use the recommended format [1] of:
<PROJECT>_<PATH>_<FILE>_H_
[1] https://google.github.io/styleguide/cppguide.html#The__define_Guard
"All header files should have #define guards to prevent multiple
inclusion. The format of the symbol name should be
<PROJECT>_<PATH>_<FILE>_H_."
Change-Id: I2e8ab0b32fb23c30fa43cff5fec12d043c0d2037
|
|
The ".syntax unified" directives in a few source files aren't valid
ADS assembly directives, and they break compilation for windows,
since ads2armasm_ms.pl doesn't handle them.
Explicity add them via ads2gas.pl and ads2gas_apple.pl instead,
and tweak one instruction to be valid unified syntax.
Change-Id: I37f1709f163d11474597161fe02eb433859cb9b8
|
|
Commit adds neon assemblies for motion compensation which show an improvement
over the existing neon code.
Performance Improvement -
Platform Resolution 1 Thread 4 Threads
Nexus 6 720p 12.16% 7.21%
@2.65 GHz 1080p 18.00% 15.28%
Change-Id: Ic0b0412eeb01c8317642b20bb99092c2f5baba37
|
|
This fixes the build with at least GCC 7.3, where it was previously failing
with:
sum_squares_neon.c: In function 'vpx_sum_squares_2d_i16_neon':
sum_squares_neon.c: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
s2 = vpaddl_u32(s1);
^~
sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t'
s2 = vpaddl_u32(s1);
^
sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t'
s2 = vadd_u64(vget_low_u64(s1), vget_high_u64(s1));
^
sum_squares_neon.c: incompatible type for argument 1 of 'vget_lane_u64'
return vget_lane_u64(s2, 0);
^~
The generated assembly was verified to remain identical with both GCC and
LLVM.
Bug: chromium:819249
Change-Id: I2778428ee1fee0a674d0d4910347c2a717de21ac
|
|
quiets ptrdiff_t -> int conversion warning
Change-Id: If6b545a736fc19e48e290961736b1618df97db3e
|
|
|
|
Change-Id: Ie2ac06c090c8f92268e9a799e96aa5192a1bdcd2
|
|
Separate width 4 and 8 cases to reduce jumps in loop in clang.
Change-Id: I6ffc6f1555f2ad08b72a8dba35a78b9fd5f95a73
|
|
Change-Id: Ia313a6da00a05837fcd4de6ece31fa1c0016438c
|
|
Perf shows CPU time of this function dropped from 0.81% to 0.15%.
Change-Id: I8a7649ca5c15af2fc65cfb848f5befa0cc5e64f2
|
|
BUG=webm:1403
Change-Id: I2293c11666786be276909d48ee78dacb40a89e25
|
|
BUG=webm:1403
Change-Id: I1413cc3dfcb62143ba04fe9b0f8d8b010fdf69b6
|
|
This bug exposes when 2nd argument is negative, and the higher 32 bits
would be all 1s.
Change-Id: I189ee8cd3753fde00a34847e7a37cde2caa4ba72
|
|
BUG=webm:1403
Change-Id: I11efb652f1aee371c71eee2d29e33793e4736832
|
|
Use scalar multiply. No impact on clang, but improves gcc compiling.
BUG=webm:1403
Change-Id: I4922e7e033d9e93282c754754100850e232e1529
|
|
BUG=webm:1403
Change-Id: Id9833e985fb70958cf4bde38f8e6303ed83c12f9
|
|
Change-Id: Ie70ed8b9273df5e1fd06bc93cb469e80630941d2
|
|
Change-Id: I8f4e0fc6ecb77b623519f2dd3cd2886f89218ddd
|
|
Change-Id: I9c7c52567850aded0437b13ba1260e94441bc49d
|
|
Remove comments above #define statements because they get
indented unnecessarily.
https://bugs.llvm.org/show_bug.cgi?id=35930
Add blank lines to prevent comments from being treated as
blocks.
Change-Id: I04dce21b2a10e13b8dc07411a0019c098f6dd705
|
|
Eliminates the following instruction for the x86 (64 bit)
intrinsic code:
movslq %esi,%rax
Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae
|
|
|
|
Exposed by fuzz test in high bitdepth.
The bug is introduced in commit 64653fa.
BUG=webm:1466
Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
|
|
BUG=webm:1419
Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
|
|
* changes:
cosmetics: NEON scaling code
Refactor convolve NEON code
Refactor convolve code
|
|
BUG=webm:1450
Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
|
|
BUG=webm:1450
Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858
|
|
The unnecessary upcast to (int) will be cleaned later.
BUG=webm:1450
Change-Id: Ia234575206d5a74540526924b06ed3939322d063
|
|
Change-Id: Ib91054622c1f09c4ca523bc6837d7d8ab9f03618
|
|
Rename a couple of hbd static functions.
Move the position of NEON function convolve8_4().
Change-Id: Idfac00edf2e99cdd8e0a73b9f895402f60be6349
|
|
BUG=webm:1419
Change-Id: If82a93935d2453e61b7647aae70983db1740bec7
|
|
Change-Id: I4ac576875c91fee7cb150d298fae4a2c156d374c
|
|
so that the convolve functions are independent of table alignment.
Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
|
|
|
|
Add 1 if negative to get dqcoeff to round towards zero.
10-15% faster than converting to positive before shifting.
Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d
|
|
Change-Id: Icfb70687476b2edb25d255793ba325b261d40584
|
|
With skip block the neon is about twice as fast as C.
The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.
BUG=webm:1426
Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
|
|
With skip block or coeff < zbin it is about twice as fast as C.
If most coeff values are > zbin it is about 10-15x as fast as C.
BUG=webm:1426
Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
|
|
Rewrite 64x64.
BUG=webm:1425
Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf
|
|
Rewrite 32x32. Use half the accumulator registers.
BUG=webm:1425
Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13
|
|
Rewrite 16x16. Use half the accumulator registers.
BUG=webm:1425
Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2
|