Age | Commit message (Collapse) | Author |
|
This allows AArch64 to be correctly detected when building with Visual
Studio (cl.exe) and fixes a crash in vp9_diamond_search_sad_neon.c.
There are still test failures, however.
Microsoft's compiler doesn't define __ARM_FEATURE_*. To use those paths
we may need to rely on _M_ARM64_EXTENSION.
Bug: webm:1788
Bug: b/277255076
Change-Id: I4d26f5f84dbd0cbcd1cdf0d7d932ebcf109febe5
|
|
BUG=webm:1584
Change-Id: I2dcf39f2327b72b58be72c27f952ea781a790dd3
|
|
Simplify max value calculation on aarch64 by using vmaxv. Much
faster for 4x4 but diminishing returns as the block size grows.
Only the vp9 quantize has a speed test hooked up. Anticipate
similar results for the other quantize versions.
Before:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.2 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.2 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1906 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
After:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.1 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1803 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
Change-Id: Ic95812b3fdbd4e47b4dcb8ed46c68a9617de38d2
|
|
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
|
|
Change-Id: I7605b6678014a5426ceb45c27b54885e0c4e06ed
|
|
This commit replaces the vp8_ prefixed subtract function with the
common vpx_subtract_block function. It removes redundant SIMD
optimization codes and unit tests.
Change-Id: I42e086c32c93c6125e452dcaa6ed04337fe028d9
|
|
subpel functions will be moved in another patch.
Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
|
|
By using 0xff for a short it was not setting the high bits. When
comparing the output with vtst to find non-zero elements it was skipping
vaules which had no low bits set such as -512 / 0xFE00.
Using -8191 as the first element of coeff will generate this condition.
BUG=883
Change-Id: Ia1e10fb809d1e7866f28c56769fe703e6231a657
|
|
The obj_int_extract code is no longer worth maintaining. It creates
significant issues when adapting for different build systems and no
longer offers as significant of a performance benefit due to
improvements in intrinsics.
Source files will remain until the various third-party builds are updated.
The neon fast quantizer has been moved to intrinsics. The armv6 version
has been removed because so few remaining targets require it.
Compilers and processors have improved significantly since the
pack_tokens code was written. The assembly is no longer faster than the
C code.
pack_tokens were the only optimizations for the armv5te targets so the targets
will be removed after the test infrastructure has been updated.
BUG=710
Change-Id: Ic785b167cd9f95eeff31c7c76b7b736c07fb30eb
|
|
The intrinsics version of the pair quant is slower than running it
individually.
Change-Id: I7b4ea8599d4aab04be0a5a0c59b8b29a7fc283f4
|
|
Use intrinsics for neon quantization. Slight loss (<5%) of performance
compared to the assembly. Roughly 10x faster on arm64 because that was
running C code before.
Change-Id: I7cf5242d8f29b7eab5bca6a1c20c89c9fc9ca66d
|
|
The version of gcc4.6 included with the Android NDK through r10b
fails to compile this function. Replace it with C code.
BUG=860
Change-Id: Ifcc0476664071aec46a171cdd5ad17305930986a
|
|
Use the right return values - vadd_s64 returns int64x1_t, not
a normal int64_t.
Change-Id: Ife17213087c1dfb5faaa647f804d2fd140f3a0eb
|
|
Add vp8_mse16x16_neon.c
- vp8_mse16x16_neon
- vp8_get4x4sse_cs_neon
Change-Id: I108952f60a9ae50613f0ce3903c2c81df19d99d0
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Use generic C implementation instead of neon-specific code
Change-Id: Ib322b4ece9cdbd4de76a9eed3d2e9fd1d8542406
|
|
Add shortfdct_neon.c
- vp8_short_fdct4x4_neon
- vp8_short_fdct8x4_neon
Change-Id: I90152c803b484f5fab839473d632c50af0524e68
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add subtract_neon.c
- vp8_subtract_b_neon
- vp8_subtract_mby_neon
- vp8_subtract_mbuv_neon
Change-Id: If9a17a093478552e3e3276eeaa3f098b9021d08c
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp8_shortwalsh4x4_neon.c
- vp8_short_walsh4x4_neon
Change-Id: Ica5f584be608c9e636f62db14f563757e94be09b
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Issue: https://code.google.com/p/webm/issues/detail?id=829
Change-Id: I580308f8aa4af194b5d8990a9692ebd18db68ee8
|
|
The encoder performance improved by 5% (vs "C")
for the test clip used.
Change-Id: I866b35eb2a06092edce7b37fc409562d0dacd7e7
|
|
Now match the "C" version of "Fix to reduce block
artifacts from vp8 temporal denoiser."
(see change id Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f)
Change-Id: I99e569bb6af4ae3532621127e12bf917a48ba08e
|
|
If increase_denoising is set,
vp8_denoiser_filter_neon() produced incorrect results.
Change-Id: I645f78e48b8f6657fa8a4b69d2c4d3488a0581dc
|
|
Change-Id: I96ed73e109c4f89dd06f3583cf7ecf9277401fae
|
|
This reverts commit 06e6d56fa138d84759e8bdfd4c721ead000051b4
Change-Id: If95598385b693945d6b144d03b6da8f6a57dac98
|
|
This reverts commit e516a42527098a26798dbb3663a5bcdd38793839
Change-Id: I7c78712acc737ad5f580181cdab3aa76b23f3ca5
|
|
This eliminates the asm_offsets dependency for future
all-assembly versions of this function.
Change-Id: I3227073ecfcb8ee6e593934fab941e9081abdda0
|
|
|
|
Used horizonal add instructions instead of adding
byte lanes. The encoder performance improved by
~4% for the test clip used.
Change-Id: Iaddd10403fcffb5b3f53b1f591ab2fe0ff002c08
|
|
The recent compiler can generate optimized code that uses NEON registers
for various operations besides floating-point operations. Therefore,
only saving callee-saved registers d8 - d15 at the beginning of the
encoder/decoder is not enough anymore. This patch added register saving
code in VP8 NEON functions that use those registers.
Change-Id: Ie9e44f5188cf410990c8aaaac68faceee9dffd31
|
|
vector types
This fixes building with MSVC for arm.
Change-Id: Iffae0408e0c68760e87e96b9e17d9df8e8cadb1a
|
|
Change-Id: I951abd4ad0078f78949f3cb79453ac334fb82a7e
|
|
datatype is optional for the instruction but clang refuses it.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHIJIHC.html
It is still required when using an immediate.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGGEEB.html
Change-Id: I0fae956c8c0fa3f97578ce80abea247f7fc88705
|
|
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
|
|
Creates a merge between the master and experimental branches. Fixes a
number of conflicts in the build system to allow *either* VP8 or VP9
to be built. Specifically either:
$ configure --disable-vp9 $ configure --disable-vp8
--disable-unit-tests
VP9 still exports its symbols and files as VP8, so that will be
resolved in the next commit.
Unit tests are broken in VP9, but this isn't a new issue. They are
fixed upstream on origin/experimental as of this writing, but rebasing
this merge proved difficult, so will tackle that in a second merge
commit.
Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
|
|
Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4
|
|
For non-static functions, change the prefix to vp9_. For static functions,
remove the prefix. Also fix some comments, remove unused code or unused
function prototypes.
Change-Id: I1f8be05362f66060fe421c3d4c9a906fdf835de5
|
|
This change encompasses VP8_PTR, VP8_COMP, VP8D_COMP, VP8_COMMON,
VP8Decompressor and VP8Common.
Change-Id: I514ef4ad4e682370f36d656af1c09ee20da216ad
|
|
Change-Id: Ie2e3652591b010ded10c216501ce24fd95d0aec5
|
|
Remove the fdct invoke macro calls
Change-Id: Ica2431c655819fa012133ee7abc75a16761e5fd6
|
|
Merges this experiment in to make it easier to run tests on
filter precision, vectorized implementation etc.
Also removes an experimental filter.
Change-Id: I1e8706bb6d4fc469815123939e9c6e0b5ae945cd
|
|
Approximate the Google style guide[1] so that that there's a written
document to follow and tools to check compliance[2].
[1]: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
[2]: http://google-styleguide.googlecode.com/svn/trunk/cpplint/cpplint.py
Change-Id: Idf40e3d8dddcc72150f6af127b13e5dab838685f
|
|
|
|
Besides imposing a performance penalty at startup in most
configurations, these relocations break the dynamic linker for
native Fennec, since it does not support them at all.
Change-Id: Id5dc768609354ebb4379966eb61a7313e6fd18de
|
|
Fix code for following warnings:
-Wimplicit-function-declaration
-Wuninitialized
-Wunused-but-set-variable
-Wunused-variable
Change-Id: I2be434f22fdecb903198e8b0711255b4c1a2947a
|
|
Change-Id: Ic7782707afed38c3ec7e996a4a11dc2d55226691
|
|
This is a code snapshot of experimental work currently ongoing for a
next-generation codec.
The codebase has been cut down considerably from the libvpx baseline.
For example, we are currently only supporting VBR 2-pass rate control
and have removed most of the code relating to coding speed, threading,
error resilience, partitions and various other features. This is in
part to make the codebase easier to work on and experiment with, but
also because we want to have an open discussion about how the bitstream
will be structured and partitioned and not have that conversation
constrained by past work.
Our basic working pattern has been to initially encapsulate experiments
using configure options linked to #IF CONFIG_XXX statements in the
code. Once experiments have matured and we are reasonably happy that
they give benefit and can be merged without breaking other experiments,
we remove the conditional compile statements and merge them in.
Current changes include:
* Temporal coding experiment for segments (though still only 4 max, it
will likely be increased).
* Segment feature experiment - to allow various bits of information to
be coded at the segment level. Features tested so far include mode
and reference frame information, limiting end of block offset and
transform size, alongside Q and loop filter parameters, but this set
is very fluid.
* Support for 8x8 transform - 8x8 dct with 2nd order 2x2 haar is used
in MBs using 16x16 prediction modes within inter frames.
* Compound prediction (combination of signals from existing predictors
to create a new predictor).
* 8 tap interpolation filters and 1/8th pel motion vectors.
* Loop filter modifications.
* Various entropy modifications and changes to how entropy contexts and
updates are handled.
* Extended quantizer range matched to transform precision improvements.
There are also ongoing further experiments that we hope to merge in the
near future: For example, coding of motion and other aspects of the
prediction signal to better support larger image formats, use of larger
block sizes (e.g. 32x32 and up) and lossless non-transform based coding
options (especially for key frames). It is our hope that we will be
able to make regular updates and we will warmly welcome community
contributions.
Please be warned that, at this stage, the codebase is currently slower
than VP8 stable branch as most new code has not been optimized, and
even the 'C' has been deliberately written to be simple and obvious,
not fast.
The following graphs have the initial test results, numbers in the
tables measure the compression improvement in terms of percentage. The
build has the following optional experiments configured:
--enable-experimental --enable-enhanced_interp --enable-uvintra
--enable-high_precision_mv --enable-sixteenth_subpel_uv
CIF Size clips:
http://getwebm.org/tmp/cif/
HD size clips:
http://getwebm.org/tmp/hd/
(stable_20120309 represents encoding results of WebM master branch
build as of commit#7a15907)
They were encoded using the following encode parameters:
--good --cpu-used=0 -t 0 --lag-in-frames=25 --min-q=0 --max-q=63
--end-usage=0 --auto-alt-ref=1 -p 2 --pass=2 --kf-max-dist=9999
--kf-min-dist=0 --drop-frame=0 --static-thresh=0 --bias-pct=50
--minsection-pct=0 --maxsection-pct=800 --sharpness=0
--arnr-maxframes=7 --arnr-strength=3(for HD,6 for CIF)
--arnr-type=3
Change-Id: I5c62ed09cfff5815a2bb34e7820d6a810c23183c
|
|
The MFQE function of the postprocessor depends on these
Change-Id: I256a37c6de079fe92ce744b1f11e16526d06b50a
|
|
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.
To fix this we cast the result of the multiplication as unsigned.
The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.
For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).
PPC change is untested.
Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
|
|
This is the final commit in the series converting to the new RTCD
system. It removes the encoder csystemdependent files and the remaining
global function pointers that didn't conform to the old RTCD system.
Change-Id: I9649706f1bb89f0cbf431ab0e3e7552d37be4d8e
|
|
This commit continues the process of converting to the new RTCD
system.
Change-Id: Id8a287fdd4bd050ea4452e1582ad85520f3081be
|