Age | Commit message (Collapse) | Author |
|
average improvement ~4x-6x
Change-Id: I7c8b4f2334491be8a859592606e568bc95d019aa
|
|
average improvement ~5x-8x
Change-Id: I179a69ec620fbd69979bd128f05d18113618aab4
|
|
average improvement ~4x-6x
Change-Id: Ia2e6f770da46416ebec31fdcea5cc7878879a9d9
|
|
average improvement ~3x-4x
moved assert to respective files
Change-Id: I6c915059d456a00bdd76fab0dd2eede8b6c6ea58
|
|
Updated sources according to improved version of common MSA macros.
Enabled respective convolve MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.
Change-Id: If5ad6ef8ea7ca47feed6d2fc9f34f0f0e8b6694d
|
|
Updated sources according to improved version of common MSA macros.
Enabled idct MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.
Change-Id: I1f488ab2c741f6c622b7a855388a202168082209
|
|
|
|
~90% faster over 20M pixels
Change-Id: Iab791510cc57c8332c2f9a5da0ed50702e5f5763
|
|
Done little restructuring/styling changes to the sources like generic macro definitions, their use to reduce code lines, better code alignments etc.
Disabled all MSA hooks and tests
Change-Id: Ic6f2dce0b501f46b80c06c46c0fe2043d557b190
|
|
collect the vp9_convolve function definition macros there; this will
allow some relocation of functions from vp9_asm_stubs.c
Change-Id: Idadd117fa256dd48748379856973fd985b8204e8
|
|
average improvement ~4x-6x
Change-Id: I5edf713721b9e24c7e0ce2e69d8fc3ecab625d91
|
|
average improvement ~4x-6x
Change-Id: Idaba7e49fbd7f388caee0d73773ccf6e4807ef17
|
|
average improvement ~4x-6x
Change-Id: I55e95b7f2ba403dff11813958dc7c73a900dd022
|
|
|
|
The rotation computation using 2X of cos(pi/16) has a potential to
overflow 32 bit, this commit disable the function to allow further
investigation and optimization.
Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf
|
|
average improvement ~3x-5x
Change-Id: I422e4c33ea7e6d6783ba40029438ccf21b0e76bb
|
|
|
|
Some build systems use just the basename for object files.
Change-Id: I333e1107ee866f3906cc46476ef8d04c6200a8a0
|
|
average improvement ~6x-8x
Change-Id: I7c91eec41aada3b0a5231dda7869b3b968f3ad18
|
|
average improvement ~5x-8x
Change-Id: I3214734cb3716e742907ce0d2d7a042d953df82b
|
|
average improvement ~6x-10x
Change-Id: Ie3f3ab3a9005be84935919701e56b404e420affa
|
|
|
|
Change-Id: Ia31ada59172eb1818e1eb91009f83cbb1f581223
|
|
exclude files that only contain functions for non-high-bitdepth builds.
this removes some warnings related to missing prototypes
Change-Id: Ic6642998c46a7b808c6c53b2f9c34bcd4d037abe
|
|
Renames the files to allow more common thread code to
be moved to vp9/common.
Change-Id: I7386e64e221086e3cdc087e79812f993c423413b
|
|
|
|
The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8
side. In our testing, we achieve 2X speed by adopting this change.
Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716
|
|
1. Added row-based loopfilter in encoder;
2. Moved common multi-threaded loopfilter functions from decoder
to common;
3. Merged multi-threaded loopfilter code, and made encoder/
decoder call same function to reduce code duplication.
Encoder tests showed that 1% - 2% speedup was seen for good-quality
2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6%
speedup using 4 threads were seen for real-time mode(at speed 7).
Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
|
|
Investigating https://code.google.com/p/chromium/issues/detail?id=443839
Change-Id: Ibb7485d835c5aa5e1d40f31715596ba8d208eedb
|
|
Separate functions and rename files. This will make it easier to disable
some functions later to help work around a compiler issue in chromium.
Change-Id: I7f30e109f77c4cd22e2eda7bd006672f090c1dc5
|
|
Re-write
- vp9_lpf_horizontal_4_dual_neon
in vp9_loopfilter_16_neon.c
Change-Id: Ie14f63d352f9564ad01db3939a61d91cf6d21a31
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_reconintra_neon.c
- vp9_v_predictor_4x4_neon
- vp9_v_predictor_8x8_neon
- vp9_v_predictor_16x16_neon
- vp9_v_predictor_32x32_neon
- vp9_h_predictor_4x4_neon
- vp9_h_predictor_8x8_neon
- vp9_h_predictor_16x16_neon
- vp9_h_predictor_32x32_neon
- vp9_tm_predictor_4x4_neon
- vp9_tm_predictor_8x8_neon
- vp9_tm_predictor_16x16_neon
- vp9_tm_predictor_32x32_neon
Change-Id: Ib5d54a4766a1b5127169045659974f33aa98376d
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Delete vp9_dc_only_idct_add_neon.c
The function was merged with vp9_short_idct4x4_1_add (later
vp9_idct4x4_1_add) in d2de1ca and should have been deleted then.
Change-Id: Ie58ba3dd9dc7330a8f1238dd7dd71c9ed4639b94
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_iht8x8_add_neon.c
- vp9_iht8x8_64_add_neon
The assembly did not previously implement tx_type 0
BUG=716
Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_iht4x4_add_neon.c
- vp9_iht4x4_16_add_neon
The assembly did not previously implement tx_type 0
BUG=715
Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
|
|
It is the first version of MFQE in VP9. There are a few TODOs included
in this version.
Usage: Add flag --enable-vp9-postproc to config the project.
In decoder, use flag --mfqe in the command line to enable
MFQE in postproc.
Note: Need to have key frame with low quality to see the effect of this
new patch. In my experiment, I fixed the qindex to 200 in key frame.
Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396
|
|
Add vp9_idct32x32_add_neon.c
- vp9_idct32x32_1024_add_neon
Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_idct16x16_add_neon.c
- vp9_idct16x16_256_add_neon_pass1
- vp9_idct16x16_256_add_neon_pass2
- vp9_idct16x16_10_add_neon_pass1
- vp9_idct16x16_10_add_neon_pass2
Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_idct8x8_add_neon.c
- vp9_idct8x8_64_add_neon
- vp9_idct8x8_10_add_neon
Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_idct4x4_add_neon.c
- vp9_idct4x4_16_add_neon
Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_idct16x16_1_add_neon.c
- vp9_idct16x16_1_add_neon
Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_idct32x32_1_add_neon.c
- vp9_idct32x32_1_add_neon
Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_idct8x8_1_add_neon.c
- vp9_idct8x8_1_add_neon
Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_idct4x4_1_add_neon.c
- vp9_idct4x4_1_add_neon
Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_convolve8_neon.c
- vp9_convolve8_horiz_neon
- vp9_convolve8_vert_neon
Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_convolve8_avg_neon.c
- vp9_convolve8_avg_horiz_neon
- vp9_convolve8_avg_vert_neon
Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_copy_neon.c
- vp9_convolve_copy_neon
Change-Id: I291fc5423d06240876411bbceab03eae5ef585be
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_avg_neon.c
- vp9_convolve_avg_neon
Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_loopfilter_neon.c
- vp9_lpf_horizontal_4_neon
- vp9_lpf_vertical_4_neon
- vp9_lpf_horizontal_8_neon
- vp9_lpf_vertical_8_neon
Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb
Signed-off-by: James Yu <james.yu@linaro.org>
|