Age | Commit message (Collapse) | Author |
|
This reverts commit f9404f240642222775a371acde8fc0721b3812df.
This patch caused some ASAN error.
Change-Id: If15b7e581310e19061d111c69f2931809662ed19
|
|
This reverts commit b645257121da20b422dbbebf02aae0fc6dff95d4.
Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45
|
|
This reverts commit 511d218c60b9b6c1ab9383db746815e907af0359.
In current form intrinsics break borg build.
Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9
|
|
|
|
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization done only for 64bit.
Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
|
|
|
|
More intra optimizations will be added.
Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
|
|
Moving all code from that files to vp9_mvref_common.{h, c}.
Change-Id: Ibc4afcb8cea6847166ff411130e93611ebe63b20
|
|
Moving vp9_norm probability table from vp9_entropy.c to vp9_prob.c
Change-Id: Ie757b73860c6f43130790c332b292e2a1a81b788
|
|
Writing custom coeff branch count calculation (which is much clearer) in
adapt_coef_probs() function. Removing vp9_treecoder.c file.
Change-Id: I8880fb7a39996c8bcf6cd0acf9898a8c712ba91f
|
|
Moving all probability tables from removed file to vp9_entropy.c.
Change-Id: I12846f1da778c3016d96b82e53384d4634883430
|
|
Multiply by 3 was on 8bit vectors when it should have been on
16bit vectors.
Change-Id: I248c1429b3134dfd171dfab0ebb109fd2437e1fc
|
|
The change caused mismatches with some test vectors on neon.
Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/
Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0
|
|
|
|
Add support to do 16 pixel horizontal filtering in Neon.
Nexus devices saw about 0.5% decode speed increase.
Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
|
|
Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b
|
|
|
|
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h
Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
|
|
This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.
Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
|
|
|
|
Change-Id: Ib27fc4f3dbe01fe8adfa04a61aaba21b3480e75c
|
|
Change-Id: Ia7f640ca395e8deaac5986f19d11ab18d85eec2d
|
|
I63df79a13cf62aa2c9360a7a26933c100f9ebda3."
|
|
cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3.
Change-Id: I034848cf05031618818f7df2e7f9c35102686948
|
|
This CL contains two AVX2 optimized loop filter functions,
mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16.
Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
|
|
Change-Id: Iedcdb8867084f328f4fce2fadb968e0984217308
|
|
|
|
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.
Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
|
|
|
|
Change-Id: Ic31b4ef85e65070b4f8b9f26e068ccfaae00c4f0
|
|
Now we have entropy code separate from scan/iscan code. The next step
in future is to move iscan code from common part to the encoder.
Change-Id: Id9732f7d80aec00af35c1d58d1137c4c96c91451
|
|
Change-Id: I401536778e3c68ba2b3ae3955c689d005e1f1d59
|
|
|
|
Replace current code which corrupts the stack by
duplicate of vp8 code to save and restore neon
registers.
Change-Id: Ibb0220b9aa985d10533befa0a455ebce57a2891a
|
|
- full ASM version, no more C gateway file.
- integrate combine-add with last step of 2nd pass.
- remove a few push/pop pairs.
- some instruction reordering to hide latency.
Change-Id: Ic9d9933c908b65d1bf7ba8fd47b524cda808c9c6
|
|
Moving all code from that file to vp9_variace_c.c in the encoder.
Change-Id: Ic803d5b4c78d5191e4d25541b3df97337878fc3e
|
|
This is incompatible with most toolchains other than gcc.
Revert "Deleted #include <inttypes.h>"
This reverts commit 4d018be950ef8b056a7c797a22ee58012443df26.
This reverts commit d22a504d11a15dc3eab666859db0046b5a7d75c5.
Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf
|
|
|
|
Change-Id: I963dd4a6e8671957403ccbb9a16ea7de703e3530
|
|
Lots of TODO which will be taken care in upcoming changes. As is,
about 6x faster than C version.
Change-Id: Ie2557b72fd2d8edca376dbf400a4d173aa5e63e0
|
|
Reformatted version of a patch submitted by Erik/Tamar
from Intel. For the test clips used, the decoder
performance improved by ~2%.
Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b
|
|
|
|
Change-Id: I42c497b68ae1ee645b59c9968ad805db0a43e37e
|
|
Vp9 postproc is disabled for now as its not been shown to help and
may be merged with vp8.
Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057
|
|
Change-Id: Ib9354c1d975d03e8081df20d50b6a77dfe2dc7e5
|
|
Change-Id: I0b15d5e3b0eb97abb9ab5ec08e88b61f8723aaf4
|
|
Change-Id: I6ecb5c4a1a472feb8e84e9f3352b536d5e28a4a5
|
|
|
|
|
|
|