Age | Commit message (Collapse) | Author |
|
50% faster than C version in best/rt profiles
Change-Id: I0f9504ed52b5d5f7722407e91108ed4056d66bc2
|
|
~80% faster than C version for both best/rt profiles.
Change-Id: Ibb3c8e1862131d2a020922420d53c66b31d5c2c3
|
|
Move all butterfly functions to fdct_neon.h
Slightly optimize load/scale/cross functions
in fdct 16x16.
These will be reused in highbd variants.
Change-Id: I28b6e0cc240304bab6b94d9c3f33cca77b8cb073
|
|
Change-Id: I3915b6c9971aedaac9c23f21fdb88bc271216208
|
|
|
|
Change-Id: I7dd4e698469562f5b1f948cc36f8403b490dcb6a
|
|
~2.8x faster than the sse2 version.
Bug: b/245917257
Change-Id: Ibc8e5d030ec145c9a9b742fff98fbd9131c9ede4
|
|
2.7x to 3.1x faster than the sse2 version.
Bug: b/245917257
Change-Id: Idff3284932f7ee89d036f38893205bf622a159a3
|
|
1.9x to 2.4x faster than the sse2 version.
Bug: b/245917257
Change-Id: I686452772f9b72233930de2207af36a0cd72e0bb
|
|
warning: ‘s2[3]’ may be used uninitialized
and
warning: ‘s1[3]’ may be used uninitialized
The warnings exposed unused code.
Change-Id: I75cf1f9db75e811cb42e2f143be1ad76f3e4dee9
|
|
Match style for vpx_quantize_b_sse2 and prepare to rewrite
ssse3 version in intrinsics.
Need to evaluate the value of threshold breakout before
going further.
Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b
|
|
All of the assembly adds 1 to iscan to convert from
a 0 based array to the EOB value.
Add 1 to all iscan values and remove the extra
instructions from the assembly.
Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9
|
|
|
|
In file included from ../libvpx/vpx_dsp/x86/post_proc_sse2.c:12:
In function ‘_mm_add_epi16’,
inlined from ‘vpx_mbpost_proc_down_sse2’ at ../libvpx/vpx_dsp/x86/post_proc_sse2.c:88:13:
/usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h:1060:35: warning: ‘below_context’ may be used uninitialized [-Wmaybe-uninitialized]
1060 | return (__m128i) ((__v8hu)__A + (__v8hu)__B);
| ^~~~~~~~~~~
../libvpx/vpx_dsp/x86/post_proc_sse2.c: In function ‘vpx_mbpost_proc_down_sse2’:
../libvpx/vpx_dsp/x86/post_proc_sse2.c:39:13: note: ‘below_context’ was declared here
39 | __m128i below_context;
Change-Id: I2fc592f121c4e85d0aff1640014c3444f5eb09fd
|
|
Resolves warning on OS X:
file: libvpx_g.a(fwd_txfm_avx2.c.o) has no symbols
Change-Id: Ie8b290bb3ed329656beb883d552c98353f1ed5e5
|
|
~2x faster than the sse2 version.
Bug: b/245917257
Change-Id: I4742950ab7b90d7f09e8d4687e1e967138acee39
|
|
~2.4x faster than the sse2 version.
Bug: b/245917257
Change-Id: I6df2bd62b46e5e175c8ad80daa6de3a1c313db0f
|
|
1.98x to 2.3x faster than the sse2 version.
Bug: b/245917257
Change-Id: Ie4f9bb942ffaf4af7d395fb5a5978b41aabfc93c
|
|
|
|
Change-Id: Ia28305ec5c61518b732cbacbd102acd2cb7f9d82
|
|
missed in
447e27588 vpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD check
+ fix #if comments
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
Change-Id: I098b96421b7de5928bb3b11612ca1f32e7b6cbc4
|
|
over *_set1_*(0)
Change-Id: I136e1798a2ce286480ebb9418db67a2f1e92b9a2
|
|
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
Change-Id: I164fe121ccefda99050a9b6a99738a2b518520f3
|
|
this produces better assembly with gcc (11.3.0-3); no change in assembly
using clang from the r24 android sdk (Android (8075178, based on
r437112b) clang version 14.0.1
(https://android.googlesource.com/toolchain/llvm-project
8671348b81b95fc603505dfc881b45103bee1731)
Change-Id: Ifec252d4f499f23be1cd94aa8516caf6b3fbbc11
|
|
Change-Id: Ia6fa54652d7f94687e64108482bb0f28ca06cf49
|
|
|
|
Total gain for 12-bit encoding:
* ~7.2% for best profile
* ~5.8% for rt profile
Change-Id: I5b70415fb89d1bbb02a0c139eb317ba6b08adede
|
|
|
|
only store the deltas from --style Google in the file and reapply using
Debian clang-format version 11.1.0-6+build1
Bug: b/229626362
Change-Id: I3e18a2e7c17a90a48405b3cf1b37ebc652aba0db
|
|
Slight optimization, prefetch gives a 1% improvement in 1st pass
Change-Id: Iba4664964664234666406ab53893e02d481fbe61
|
|
|
|
* changes:
use VPX_NO_UNSIGNED_SHIFT_CHECK with entropy functions
compiler_attributes.h: add VPX_NO_UNSIGNED_SHIFT_CHECK
|
|
Total gain for 12-bit encoding:
* ~1% for best and rt profile
Change-Id: I4039120dc570baab1ae519a5e38b1acff38d81f0
|
|
Total gain for 12-bit encoding:
* ~7.8% for best profile
* ~10% for rt profile
Change-Id: I89eda5c4372a5b628c9df84cdeb4c8486fc44789
|
|
Change-Id: I041f5fb23b856a2b519669b5bf8a40d3772b4a6e
|
|
Total gain for 12-bit encoding:
* ~4.8% for best profile
* ~6.2% for rt profile
Change-Id: I61e646ab7aedf06a25db1365d6d1cf7b05101c21
|
|
these shift values off the most significant bit as part of the process;
vp8_regular_quantize_b_sse4_1 is included here for a special case of
mask creation
quiets warnings of the form:
vp8/decoder/dboolhuff.h:81:11: runtime error: left shift of
2373679303235599696 by 3 places cannot be represented in type
'VP8_BD_VALUE' (aka 'unsigned long')
vp8/encoder/bitstream.c:257:18: runtime error: left shift of 2147493041
by 1 places cannot be represented in type 'unsigned int'
vp8/encoder/x86/quantize_sse4.c:114:18: runtime error: left shift of
4294967294 by 1 places cannot be represented in type 'unsigned int'
vp9/encoder/vp9_pickmode.c:1632:41: runtime error: left shift of
4294967295 by 1 places cannot be represented in type 'unsigned int'
Bug: b/229626362
Change-Id: Iabed118b2a094232783e5ad0e586596d874103ca
|
|
flat/flat2 are stored as int8_t as returned by the filter_mask*
functions.
this quiets integer sanitizer warnings of the form:
vpx_dsp/loopfilter.c:197:28: runtime error: implicit conversion from
type 'int8_t' (aka 'signed char') of value -1 (8-bit, signed) to type
'uint8_t' (aka 'unsigned char') changed the value to 255 (8-bit,
unsigned)
Bug: b/229626362
Change-Id: Iacb6ae052d4cb2b6e0ebccbacf59ece9501d3b5f
|
|
add a missing cast in ^ operations; quiets warnings of the form:
implicit conversion from type 'int' of value -1 (32-bit, signed) to type
'unsigned int' changed the value to 4294967295 (32-bit, unsigned)
Bug: b/229626362
Change-Id: I56f74981050b2c9d00bad20e68f1b73ce7454729
|
|
this matches the type of the function parameter; quiets integer
sanitizer warnings of the form:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
3215646151 (32-bit, unsigned) to type 'int' changed the value to
-1079321145 (32-bit, signed)
Bug: b/229626362
Change-Id: Ia9a5dc5e1f57cbf4f8f8fa457bb674ef43369d37
|
|
quiets integer sanitizer warnings of the form:
../vpx_dsp/x86/variance_sse2.c:100:10: runtime error: implicit
conversion from type 'unsigned int' of value 4294966272 (32-bit,
unsigned) to type 'int' changed the value to -1024 (32-bit, signed)
Bug: b/229626362
Change-Id: I150cc0a6a6b85143c3bf96886686fe3a40897db5
|
|
Add build fix for _mm256_extract_epi16() being undefined.
Bug: b/237714063
Change-Id: I855b1828ce1b6b2b2f063fe097999481881bf074
|
|
~1.3x faster than vpx_subtract_block_sse2().
Based on aom_subtract_block_avx2().
Bug: b/241580104
Change-Id: I17da036363f213d53c6546c3e858e4c3cba44a5b
|
|
Change-Id: I497ee1c45d1fc4d643cefad7d87e5aaacd77869c
|
|
prefer int in most cases
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'int' of value -809931979 (32-bit, signed)
to type 'uint32_t' (aka 'unsigned int') changed the value to 3485035317
(32-bit, unsigned)
Bug: b/229626362
Change-Id: I0c6604efc188f2660c531eddfc7aa10060637813
|
|
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'int' of value -1323 (32-bit, signed) to
type 'unsigned int' changed the value to 4294965973 (32-bit, unsigned)
Bug: b/229626362
Change-Id: I7291d9bd5cacea0d88d9f4c4624c096764f4a472
|
|
Up to 11.78x faster than vpx_quantize_b_32x32_sse2() for full
calculations.
~1.7% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: Ib759056db94d3487239cb2748ffef1184a89ae18
|
|
Up to 3.61x faster than vpx_highbd_quantize_b_sse2() for full
calculations.
~2.3% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: I23f88d2a7f96aaa4103778372f4f552207f73cee
|
|
|
|
missed in:
53dd1e8e7 avg_intrin_{sse2,avg2}: rm dead store in hadamard_8x8
Change-Id: I378e4a388ceb193a4cfee4d9d317fc62fcc4b39e
|