Age | Commit message (Collapse) | Author |
|
I've added a few new functions (d45e, d63e, he, ve) to cover the
filtered h/v 4x4 predictors that are vp8-specific, the "correct"
d45 with the correctly filtered bottom-right pixel (as opposed to
the unfiltered version in vp9), and the "broken" d63 with weirdly
filtered bottom-right pixels (which is correctly filtered in vp9).
There may be a minor performance impact on all systems because we
have to do an extra copy of the Above pixel array to incorporate
the topleft pixel in the same array (thus fitting the vpx_dsp API).
In addition, armv6 will have a more serious performance impact b/c
I removed the armv6/vp8-specific assembly. I'm not sure anyone
cares...
Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86
|
|
Change-Id: I936c2430c3c5b1e0ab5dec0a20110525e925b5e4
|
|
Change-Id: I2000820e0c04de2c975d370a0cf7145330289bb2
|
|
|
|
Change-Id: I914b456558edbdee5eefdfba731bc70d3d5f5d53
|
|
Avoid conflict with vpx_dsp version
Change-Id: I041b1532a9276400a5547de8dfed1de43ad4e83d
|
|
Chromium puts all the yasm output in the same directory. Looking at ways
to improve this but in the meantime get rid of collisions.
Change-Id: I923c5231d14e895ab96521eb89807ede868a0753
|
|
average improvement ~2x-4x
Change-Id: I93abc15389649c169bb8b69127c0b95407d34692
|
|
average improvement ~3x-5x
Change-Id: Ia808ae56b118e0e1b293901447aa5a0f597b405b
|
|
average improvement ~3x-5x
Change-Id: I73306863e9bf172d5adc06b8dd54e43985d1e063
|
|
average improvement ~3x-4x
Change-Id: I8c0b3d5c86c9eb4f802b87c971864d2cfceeb7cc
|
|
average improvement ~2x-4x
Change-Id: I3af3ecced96c5b8e0cb811256e5089e28fe013a2
|
|
average improvement ~3x-5x
Change-Id: I5fd88cb088814be443d04be384b9fca99b22adef
|
|
average improvement ~2x-4x
Change-Id: I20c4f900ef95d99b18f9cf4db592cd352c2212eb
|
|
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
|
|
average improvement ~2x-5x
Change-Id: I19e82f78772993bcd67fcf975fe180232172f86d
|
|
There is a naming conflict in the chromium build system.
The rest of the variance functions will move to vpx_dsp soon.
Change-Id: Iff78da2aafb0d7380eda73e38d7dac72110a1e47
|
|
subpel functions will be moved in another patch.
Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
|
|
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.
This reduces the size of vpxenc/dec by 36k on x86_64 builds.
Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
|
|
This reverts commit 677fb5123e0ece1d6c30be9d0282e1e1f594a56d
Compiles with 4.6.
Change-Id: I7f87048911b6bc28a61741d95501fa45ee97b819
|
|
Change-Id: Ib89107fb824b5fe58afef6841104d5a27b2e0f2d
|
|
vp8_build_intra_predictors_mbuv_s().
This patch replaces the assembly version with an intrinsic
version.
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~2.6%.
Change-Id: I9ef65bad929450c0215253fdae1c16c8b4a8f26f
|
|
Add vp8_subpixelvariance_neon.c
- vp8_sub_pixel_variance16x16_neon_func
- vp8_variance_halfpixvar16x16_h_neon
- vp8_variance_halfpixvar16x16_v_neon
- vp8_variance_halfpixvar16x16_hv_neon
- vp8_sub_pixel_variance8x8_neon
Change-Id: I3e5d85b2eafc26be0eef6a777789b80e4579257b
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
reverts commit 81ad047ee57ecb0e2c1ee4dcebda54a44ea54ae9. Revert "VP8 for ARMv8 by using NEON intrinsics 15" This reverts commit 727af7cebe3698b8493ba6c1360b0a6606c310fb.""
|
|
This reverts commit 928ff03889dadc3f63883553b443c08e625b4885
Compiles with 4.6 now.
Change-Id: Ib455da1098bb0e0623248be07579882a425fcbd1
|
|
commit 81ad047ee57ecb0e2c1ee4dcebda54a44ea54ae9. Revert "VP8 for ARMv8 by using NEON intrinsics 15" This reverts commit 727af7cebe3698b8493ba6c1360b0a6606c310fb."
This reverts commit 920f803f2e2f41395311f96fec1b4a0c1b2b631a
Change-Id: I410d9036214a1b18427cca70b4bc6d8239740737
|
|
When building x86 assembly use lrand48 instead of the
undocumented inlined _rand function.
Android now supports rand()
https://android-review.googlesource.com/97731
but only for new versions. Original workaround:
https://gerrit.chromium.org/gerrit/15744
Change-Id: I130566837d5bfc9e54187ebe9807350d1a7dab2a
|
|
Change-Id: I80630a7350e884ebc4fef73fb5b52ec25f908523
|
|
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
|
|
These optimizations are currently disabled.
Change-Id: I19c58c9cb82d017638b86196641b9e001dfa798b
|
|
This reverts commit 81ad047ee57ecb0e2c1ee4dcebda54a44ea54ae9.
Revert "VP8 for ARMv8 by using NEON intrinsics 15"
This reverts commit 727af7cebe3698b8493ba6c1360b0a6606c310fb.
This exposes a bug in gcc 4.9 regarding register allocation. Will reland
when 4.9 is fixed.
Change-Id: I2d8a04e4edde93719280e41550f4c0765608ec4d
|
|
Allow selectively building just the intrinsics for armv8
Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477
|
|
This reverts commit c500fc22c1bb2a3ae5c318bfb806f7e9bd57ce25
There is an issue with gcc 4.6 in the Android NDK:
loopfiltersimpleverticaledge_neon.c: In function 'vp8_loop_filter_bvs_neon':
loopfiltersimpleverticaledge_neon.c:176:1: error: insn does not satisfy its constraints:
Change-Id: I95b6509d12f075890308914cc691b813d2e5cd9f
|
|
This reverts commit a5d79f43b963ced59b462206faf3b7857bdeff7b
There is an issue with gcc 4.6 in the Android NDK:
loopfilter_neon.c: In function 'vp8_loop_filter_vertical_edge_y_neon':
loopfilter_neon.c:394:1: error: insn does not satisfy its constraints:
Change-Id: I2b8c6ee3fa595c152ac3a5c08dd79bd9770c7b52
|
|
Add variance_neon.c
- vp8_variance16x16_neon
- vp8_variance16x8_neon
- vp8_variance8x16_neon
- vp8_variance8x8_neon
Change-Id: Idfb9c96134a1c6a696a98ce68b4f7ed593a00660
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add idct_dequant_0_2x_neon.c
- idct_dequant_0_2x_neon
Change-Id: I8e129172ef1b2517cf72ff267788921f1a792586
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add sixtappredict_neon.c
- vp8_sixtap_predict16x16_neon
- vp8_sixtap_predict8x8_neon
- vp8_sixtap_predict8x4_neon
- vp8_sixtap_predict4x4_neon
Change-Id: I3b02fce48ae2e6c6099041ba5ddd7b090f1463b9
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add shortidct4x4llm_neon.c
- vp8_short_idct4x4llm_neon
Change-Id: I5a734bbffca8dacf8633c2b0ff07b98aa2f438ba
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add sad_neon.c
- vp8_sad16x16_neon
- vp8_sad16x8_neon
- vp8_sad8x8_neon
- vp8_sad8x16_neon
- vp8_sad4x4_neon
Change-Id: I08eaae49ec03fb91b394354660a5df0367cea311
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add mbloopfilter_neon.c
- vp8_mbloop_filter_horizontal_edge_y_neon
- vp8_mbloop_filter_horizontal_edge_uv_neon
- vp8_mbloop_filter_vertical_edge_y_neon
- vp8_mbloop_filter_vertical_edge_uv_neon
Change-Id: Ia9084e0892d4d49412d9cf2b165a0f719f2382d7
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add loopfiltersimpleverticaledge_neon.c
- vp8_loop_filter_bvs_neon
- vp8_loop_filter_mbvs_neon
Change-Id: I7cf0a161ad4ae37c881b94cc0122f895d3baae79
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add loopfiltersimplehorizontaledge_neon.c
- vp8_loop_filter_bhs_neon
- vp8_loop_filter_mbhs_neon
Change-Id: I77f9721b20585da8bf3869a3850ff0ae4b4bfeea
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add loopfilter_neon.c
- vp8_loop_filter_horizontal_edge_y_neon
- vp8_loop_filter_horizontal_edge_uv_neon
- vp8_loop_filter_vertical_edge_y_neon
- vp8_loop_filter_vertical_edge_uv_neon
Change-Id: I50b57dedabd42d2a3c183c1738cc5346f0e71ed8
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add iwalsh_neon.c
- vp8_short_inv_walsh4x4_neon
Change-Id: I8beda6ce11ad8ce9e80cc0a38d40161938359162
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add idct_dequant_full_2x_neon.c
- idct_dequant_full_2x_neon
==== Summary of apply VP8 decode patch series ====
Benchmark on Samsung Chromebook, Cortex-A15, 1.7GHz, Dual core
Toolchain: linaro-1.13.1-4.8-2014.01
Compile argument: CROSS=arm-linux-gnueabihf- ../libvpx/configure
--target=armv7-linux-gcc --prefix=$HOME/out
--enable-shared --cpu=cortex-a7
Test argument: vpxdec --summary --noblit ./tears_of_steel_1080p.webm
NEON assembly 46.68 (fps)
Apply patch 06 46.65, -0.03
Apply patch 07 46.86, +0.21
Apply patch 08 46.58, -0.28
Apply patch 09 46.57, -0.01
Apply patch 10 46.51, -0.06
Apply patch 11 46.13, -0.38
Apply patch 12 45.42, -0.71
Apply patch 13 46.06, +0.64
Apply patch 14 45.19, -0.87
Apply patch 15 45.93, +0.74
Apply patch 16 45.48, -0.45
Apply patch 17 45.84, +0.36
Apply patch 18 45.91, +0.07 <= With all NEON intrinsics patches
Total -0.77 fps, 1.65% performance regression
Change-Id: I77bfc9eaccfb97b8d401e949ceff8795e26ca6b7
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
This patch did a cleanup following the commit "Save NEON registers
in VP8 NEON functions". The pushing/poping of callee-saved NEON
registers was moved into individual NEON functions. Therefore,
we don't need to save those registers at the beginning of codec.
The related code was removed.
Change-Id: I5648166514fc9beffb780aa138495597731f49ea
|
|
significantly speeds up file generation.
the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.
---
Linux
[CREATE] vpx_scale_rtcd.h
real 0m0.485s -> 0m0.022s
[CREATE] vp8_rtcd.h
real 0m4.619s -> 0m0.060s
[CREATE] vp9_rtcd.h
real 0m10.102s -> 0m0.087s
Windows
[CREATE] vpx_scale_rtcd.h
real 0m8.360s -> 0m0.080s
[CREATE] vp8_rtcd.h
real 1m8.083s -> 0m0.160s
[CREATE] vp9_rtcd.h
real 2m6.489s -> 0m0.233s
Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
|
|
Add dequantizeb_neon.c
- vp8_dequantize_b_loop_neon
vpxdec --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.23 (fps)
Change-Id: Iebe3b0c6ed2359c778b0570763c5681ae25fef0c
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add dequant_idct_neon.c
- vp8_dequant_idct_add_neon
vpxdec --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.22 (fps)
Change-Id: Id48f39e1da58dd3d8d37658e94989411997f4f7c
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add dc_only_idct_add_neon.c
- vp8_dc_only_idct_add_neon
vpxdec --summary --noblit ../videos/tears_of_steel_1080p.webm
Before => After, 13.25 => 13.24 (fps)
Change-Id: I5e9e277ec3a3ca67e13c8cc4c324a6fbe8a897fc
Signed-off-by: James Yu <james.yu@linaro.org>
|