summaryrefslogtreecommitdiff
path: root/vp8/encoder/arm
AgeCommit message (Collapse)Author
2023-05-03s/__aarch64__/VPX_ARCH_AARCH64/James Zern
This allows AArch64 to be correctly detected when building with Visual Studio (cl.exe) and fixes a crash in vp9_diamond_search_sad_neon.c. There are still test failures, however. Microsoft's compiler doesn't define __ARM_FEATURE_*. To use those paths we may need to rely on _M_ARM64_EXTENSION. Bug: webm:1788 Bug: b/277255076 Change-Id: I4d26f5f84dbd0cbcd1cdf0d7d932ebcf109febe5
2019-01-07arm neon: resolve missing declarationsJohann
BUG=webm:1584 Change-Id: I2dcf39f2327b72b58be72c27f952ea781a790dd3
2018-11-12quantize: use aarch64 vmaxvJohann
Simplify max value calculation on aarch64 by using vmaxv. Much faster for 4x4 but diminishing returns as the block size grows. Only the vp9 quantize has a speed test hooked up. Anticipate similar results for the other quantize versions. Before: [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2 [ BENCH ] Bypass calculations 4x4 31.6 ms ( ±0.0 ms ) [ BENCH ] Full calculations 4x4 31.6 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 8x8 17.7 ms ( ±0.0 ms ) [ BENCH ] Full calculations 8x8 17.7 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 16x16 14.2 ms ( ±0.0 ms ) [ BENCH ] Full calculations 16x16 14.2 ms ( ±0.0 ms ) [ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1906 ms) [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3 [ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms ) [ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms ) After: [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2 [ BENCH ] Bypass calculations 4x4 29.1 ms ( ±0.0 ms ) [ BENCH ] Full calculations 4x4 29.1 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 8x8 16.9 ms ( ±0.0 ms ) [ BENCH ] Full calculations 8x8 16.9 ms ( ±0.0 ms ) [ BENCH ] Bypass calculations 16x16 14.1 ms ( ±0.0 ms ) [ BENCH ] Full calculations 16x16 14.1 ms ( ±0.0 ms ) [ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1803 ms) [ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3 [ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms ) [ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms ) Change-Id: Ic95812b3fdbd4e47b4dcb8ed46c68a9617de38d2
2016-08-04Remove armv6 targetJohann
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
2016-07-15vp8: apply clang-formatclang-format
Change-Id: I7605b6678014a5426ceb45c27b54885e0c4e06ed
2015-07-07Unify subtract function used in VP8/9Jingning Han
This commit replaces the vp8_ prefixed subtract function with the common vpx_subtract_block function. It removes redundant SIMD optimization codes and unit tests. Change-Id: I42e086c32c93c6125e452dcaa6ed04337fe028d9
2015-05-26Move variance functions to vpx_dspJohann
subpel functions will be moved in another patch. Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
2014-11-20Correctly initialize "ones" value in neon quantizeJohann
By using 0xff for a short it was not setting the high bits. When comparing the output with vtst to find non-zero elements it was skipping vaules which had no low bits set such as -512 / 0xFE00. Using -8191 as the first element of coeff will generate this condition. BUG=883 Change-Id: Ia1e10fb809d1e7866f28c56769fe703e6231a657
2014-11-06Remove asm offset dependenciesJohann
The obj_int_extract code is no longer worth maintaining. It creates significant issues when adapting for different build systems and no longer offers as significant of a performance benefit due to improvements in intrinsics. Source files will remain until the various third-party builds are updated. The neon fast quantizer has been moved to intrinsics. The armv6 version has been removed because so few remaining targets require it. Compilers and processors have improved significantly since the pack_tokens code was written. The assembly is no longer faster than the C code. pack_tokens were the only optimizations for the armv5te targets so the targets will be removed after the test infrastructure has been updated. BUG=710 Change-Id: Ic785b167cd9f95eeff31c7c76b7b736c07fb30eb
2014-10-31Remove pair quantizationJohann
The intrinsics version of the pair quant is slower than running it individually. Change-Id: I7b4ea8599d4aab04be0a5a0c59b8b29a7fc283f4
2014-10-31vp8 quantization -> intrinsicsJohann
Use intrinsics for neon quantization. Slight loss (<5%) of performance compared to the assembly. Roughly 10x faster on arm64 because that was running C code before. Change-Id: I7cf5242d8f29b7eab5bca6a1c20c89c9fc9ca66d
2014-09-25Fix build failure with Android NDKJohann
The version of gcc4.6 included with the Android NDK through r10b fails to compile this function. Replace it with C code. BUG=860 Change-Id: Ifcc0476664071aec46a171cdd5ad17305930986a
2014-09-16arm: Fix building vp8_mse16x16_neon.c with MSVCScott LaVarnway
Use the right return values - vadd_s64 returns int64x1_t, not a normal int64_t. Change-Id: Ife17213087c1dfb5faaa647f804d2fd140f3a0eb
2014-09-15VP8 encoder for ARMv8 by using NEON intrinsics 1Scott LaVarnway
Add vp8_mse16x16_neon.c - vp8_mse16x16_neon - vp8_get4x4sse_cs_neon Change-Id: I108952f60a9ae50613f0ce3903c2c81df19d99d0 Signed-off-by: James Yu <james.yu@linaro.org>
2014-09-08vp8 encoder: remove vp8_yv12_copy_partial_frame_neonJia Jia
Use generic C implementation instead of neon-specific code Change-Id: Ib322b4ece9cdbd4de76a9eed3d2e9fd1d8542406
2014-08-20VP8 encoder for ARMv8 by using NEON intrinsics 6James Yu
Add shortfdct_neon.c - vp8_short_fdct4x4_neon - vp8_short_fdct8x4_neon Change-Id: I90152c803b484f5fab839473d632c50af0524e68 Signed-off-by: James Yu <james.yu@linaro.org>
2014-08-20VP8 encoder for ARMv8 by using NEON intrinsics 3James Yu
Add subtract_neon.c - vp8_subtract_b_neon - vp8_subtract_mby_neon - vp8_subtract_mbuv_neon Change-Id: If9a17a093478552e3e3276eeaa3f098b9021d08c Signed-off-by: James Yu <james.yu@linaro.org>
2014-08-20VP8 encoder for ARMv8 by using NEON intrinsics 2Scott LaVarnway
Add vp8_shortwalsh4x4_neon.c - vp8_short_walsh4x4_neon Change-Id: Ica5f584be608c9e636f62db14f563757e94be09b Signed-off-by: James Yu <james.yu@linaro.org>
2014-07-23Fix clang compiler warning in denoising_neon.Marco Paniconi
Issue: https://code.google.com/p/webm/issues/detail?id=829 Change-Id: I580308f8aa4af194b5d8990a9692ebd18db68ee8
2014-06-27Neon version of vp8_denoiser_filter_uv()Scott LaVarnway
The encoder performance improved by 5% (vs "C") for the test clip used. Change-Id: I866b35eb2a06092edce7b37fc409562d0dacd7e7
2014-05-28Neon match to vp8 temporal denoiser fixScott LaVarnway
Now match the "C" version of "Fix to reduce block artifacts from vp8 temporal denoiser." (see change id Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f) Change-Id: I99e569bb6af4ae3532621127e12bf917a48ba08e
2014-05-26neon matches "C" when using increase_denoisingScott LaVarnway
If increase_denoising is set, vp8_denoiser_filter_neon() produced incorrect results. Change-Id: I645f78e48b8f6657fa8a4b69d2c4d3488a0581dc
2014-05-16vp8: Add increase_denoising parameter to denoiser.Marco Paniconi
Change-Id: I96ed73e109c4f89dd06f3583cf7ecf9277401fae
2014-05-14Revert "Revert "Remove struct params from vp8_denoiser_filter""Marco Paniconi
This reverts commit 06e6d56fa138d84759e8bdfd4c721ead000051b4 Change-Id: If95598385b693945d6b144d03b6da8f6a57dac98
2014-05-07Revert "Remove struct params from vp8_denoiser_filter"Frank Galligan
This reverts commit e516a42527098a26798dbb3663a5bcdd38793839 Change-Id: I7c78712acc737ad5f580181cdab3aa76b23f3ca5
2014-05-02Remove struct params from vp8_denoiser_filterScott LaVarnway
This eliminates the asm_offsets dependency for future all-assembly versions of this function. Change-Id: I3227073ecfcb8ee6e593934fab941e9081abdda0
2014-05-02Merge "Improved intrinsic version of vp8_denoiser_filter_neon"Scott LaVarnway
2014-04-30Improved intrinsic version of vp8_denoiser_filter_neonScott LaVarnway
Used horizonal add instructions instead of adding byte lanes. The encoder performance improved by ~4% for the test clip used. Change-Id: Iaddd10403fcffb5b3f53b1f591ab2fe0ff002c08
2014-04-28Save NEON registers in VP8 NEON functionsYunqing Wang
The recent compiler can generate optimized code that uses NEON registers for various operations besides floating-point operations. Therefore, only saving callee-saved registers d8 - d15 at the beginning of the encoder/decoder is not enough anymore. This patch added register saving code in VP8 NEON functions that use those registers. Change-Id: Ie9e44f5188cf410990c8aaaac68faceee9dffd31
2014-01-22arm: Use vreinterpret instead of a plain cast for converting between neon ↵Martin Storsjo
vector types This fixes building with MSVC for arm. Change-Id: Iffae0408e0c68760e87e96b9e17d9df8e8cadb1a
2014-01-02ARM NEON version of denoiser.Christian Duvivier
Change-Id: I951abd4ad0078f78949f3cb79453ac334fb82a7e
2013-05-23Remove type from vmvnJohann
datatype is optional for the instruction but clang refuses it. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHIJIHC.html It is still required when using an immediate. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGGEEB.html Change-Id: I0fae956c8c0fa3f97578ce80abea247f7fc88705
2012-11-15support building vp8 and vp9 into a single libJohn Koleszar
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
2012-11-07Rough merge of master into experimentalJohn Koleszar
Creates a merge between the master and experimental branches. Fixes a number of conflicts in the build system to allow *either* VP8 or VP9 to be built. Specifically either: $ configure --disable-vp9 $ configure --disable-vp8 --disable-unit-tests VP9 still exports its symbols and files as VP8, so that will be resolved in the next commit. Unit tests are broken in VP9, but this isn't a new issue. They are fixed upstream on origin/experimental as of this writing, but rebasing this merge proved difficult, so will tackle that in a second merge commit. Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
2012-11-01Rename vp8/ codec directory to vp9/.Ronald S. Bultje
Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4
2012-11-01Remove vp8 in local symbols.Ronald S. Bultje
For non-static functions, change the prefix to vp9_. For static functions, remove the prefix. Also fix some comments, remove unused code or unused function prototypes. Change-Id: I1f8be05362f66060fe421c3d4c9a906fdf835de5
2012-10-31Change name of common top-level structures from VP8 to VP9.Ronald S. Bultje
This change encompasses VP8_PTR, VP8_COMP, VP8D_COMP, VP8_COMMON, VP8Decompressor and VP8Common. Change-Id: I514ef4ad4e682370f36d656af1c09ee20da216ad
2012-10-30Change encoder vp8_ and vp8cx_ public symbol prefixes to vp9_.Ronald S. Bultje
Change-Id: Ie2e3652591b010ded10c216501ce24fd95d0aec5
2012-10-29remove fdct invoke macrosJim Bankoski
Remove the fdct invoke macro calls Change-Id: Ica2431c655819fa012133ee7abc75a16761e5fd6
2012-08-08Merging in the sixteenth subpel uv experimentDeb Mukherjee
Merges this experiment in to make it easier to run tests on filter precision, vectorized implementation etc. Also removes an experimental filter. Change-Id: I1e8706bb6d4fc469815123939e9c6e0b5ae945cd
2012-07-17Restyle codeJohn Koleszar
Approximate the Google style guide[1] so that that there's a written document to follow and tools to check compliance[2]. [1]: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml [2]: http://google-styleguide.googlecode.com/svn/trunk/cpplint/cpplint.py Change-Id: Idf40e3d8dddcc72150f6af127b13e5dab838685f
2012-05-02Merge "Fix compiler warnings" into eiderJohn Koleszar
2012-05-02Fix TEXTRELs in the ARM asm.Timothy B. Terriberry
Besides imposing a performance penalty at startup in most configurations, these relocations break the dynamic linker for native Fennec, since it does not support them at all. Change-Id: Id5dc768609354ebb4379966eb61a7313e6fd18de
2012-05-02Fix compiler warningsAttila Nagy
Fix code for following warnings: -Wimplicit-function-declaration -Wuninitialized -Wunused-but-set-variable -Wunused-variable Change-Id: I2be434f22fdecb903198e8b0711255b4c1a2947a
2012-03-29remove unused BOOL_CODER::valueJohn Koleszar
Change-Id: Ic7782707afed38c3ec7e996a4a11dc2d55226691
2012-03-15WebM Experimental Codec Branch SnapshotYaowu Xu
This is a code snapshot of experimental work currently ongoing for a next-generation codec. The codebase has been cut down considerably from the libvpx baseline. For example, we are currently only supporting VBR 2-pass rate control and have removed most of the code relating to coding speed, threading, error resilience, partitions and various other features. This is in part to make the codebase easier to work on and experiment with, but also because we want to have an open discussion about how the bitstream will be structured and partitioned and not have that conversation constrained by past work. Our basic working pattern has been to initially encapsulate experiments using configure options linked to #IF CONFIG_XXX statements in the code. Once experiments have matured and we are reasonably happy that they give benefit and can be merged without breaking other experiments, we remove the conditional compile statements and merge them in. Current changes include: * Temporal coding experiment for segments (though still only 4 max, it will likely be increased). * Segment feature experiment - to allow various bits of information to be coded at the segment level. Features tested so far include mode and reference frame information, limiting end of block offset and transform size, alongside Q and loop filter parameters, but this set is very fluid. * Support for 8x8 transform - 8x8 dct with 2nd order 2x2 haar is used in MBs using 16x16 prediction modes within inter frames. * Compound prediction (combination of signals from existing predictors to create a new predictor). * 8 tap interpolation filters and 1/8th pel motion vectors. * Loop filter modifications. * Various entropy modifications and changes to how entropy contexts and updates are handled. * Extended quantizer range matched to transform precision improvements. There are also ongoing further experiments that we hope to merge in the near future: For example, coding of motion and other aspects of the prediction signal to better support larger image formats, use of larger block sizes (e.g. 32x32 and up) and lossless non-transform based coding options (especially for key frames). It is our hope that we will be able to make regular updates and we will warmly welcome community contributions. Please be warned that, at this stage, the codebase is currently slower than VP8 stable branch as most new code has not been optimized, and even the 'C' has been deliberately written to be simple and obvious, not fast. The following graphs have the initial test results, numbers in the tables measure the compression improvement in terms of percentage. The build has the following optional experiments configured: --enable-experimental --enable-enhanced_interp --enable-uvintra --enable-high_precision_mv --enable-sixteenth_subpel_uv CIF Size clips: http://getwebm.org/tmp/cif/ HD size clips: http://getwebm.org/tmp/hd/ (stable_20120309 represents encoding results of WebM master branch build as of commit#7a15907) They were encoded using the following encode parameters: --good --cpu-used=0 -t 0 --lag-in-frames=25 --min-q=0 --max-q=63 --end-usage=0 --auto-alt-ref=1 -p 2 --pass=2 --kf-max-dist=9999 --kf-min-dist=0 --drop-frame=0 --static-thresh=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=800 --sharpness=0 --arnr-maxframes=7 --arnr-strength=3(for HD,6 for CIF) --arnr-type=3 Change-Id: I5c62ed09cfff5815a2bb34e7820d6a810c23183c
2012-03-05Move SAD and variance functions to commonJohann
The MFQE function of the postprocessor depends on these Change-Id: I256a37c6de079fe92ce744b1f11e16526d06b50a
2012-02-09Fix variance overflowJohann
In the variance calculations the difference is summed and later squared. When the sum exceeds sqrt(2^31) the value is treated as a negative when it is shifted which gives incorrect results. To fix this we cast the result of the multiplication as unsigned. The alternative fix is to shift sum down by 4 before multiplying. However that will reduce precision. For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and change). PPC change is untested. Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
2012-01-30RTCD: finalize removal of old RTCD systemJohn Koleszar
This is the final commit in the series converting to the new RTCD system. It removes the encoder csystemdependent files and the remaining global function pointers that didn't conform to the old RTCD system. Change-Id: I9649706f1bb89f0cbf431ab0e3e7552d37be4d8e
2012-01-30RTCD: add block subtraction functionsJohn Koleszar
This commit continues the process of converting to the new RTCD system. Change-Id: Id8a287fdd4bd050ea4452e1582ad85520f3081be