Age | Commit message (Collapse) | Author |
|
|
|
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.
Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
|
|
Change-Id: I9120a87e27e73e496932d11716937e2fad246521
|
|
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
|
|
Allow selectively building just the intrinsics for armv8
Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477
|
|
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.
Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
|
|
Change-Id: I03451c88536bc498edddbe0cd9773ff79da085c2
|
|
significantly speeds up file generation.
the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.
---
Linux
[CREATE] vpx_scale_rtcd.h
real 0m0.485s -> 0m0.022s
[CREATE] vp8_rtcd.h
real 0m4.619s -> 0m0.060s
[CREATE] vp9_rtcd.h
real 0m10.102s -> 0m0.087s
Windows
[CREATE] vpx_scale_rtcd.h
real 0m8.360s -> 0m0.080s
[CREATE] vp8_rtcd.h
real 1m8.083s -> 0m0.160s
[CREATE] vp9_rtcd.h
real 2m6.489s -> 0m0.233s
Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
|
|
Change-Id: I7b9738a7113c0c4687e5d320581ff69d98a8b271
|
|
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization is done only for 64 bit
Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
|
|
Two convolve functions were optimized for AVX2:
1. vp9_filter_block1d16_h8
2. vp9_filter_block1d16_v8
vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
loop strides by half, two strides were processed in parallel.
vp9_filter_block1d16_v8 was also optimized in the same way also some of the
loads were being done outside of the loop and by that preventing redundant
loads.
This Optimization gives 43% function level gain and 1.3% user level gain.
Now can be compiled in Windows
Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
|
|
|
|
|
|
This CL changes libvpx to call a function when a frame buffer
is needed for decode. Libvpx will call a release callback when
no other frames reference the frame buffer. This CL adds a
default implementation of the frame buffer callbacks. Currently
only VP9 is supported. A future CL will add support for
applications to supply their own frame buffer callbacks.
Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d
|
|
CONFIG_USE_X86INC is available to every makefile, there's no need to
duplicate its value with USE_X86INC
Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6
|
|
This patch added ssse3 optimization of bilinear sub-pixel filters.
The real time encoder was speeded up by ~1%.
Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338
|
|
|
|
Using bilinear filters could speed up the codec in real-time mode.
This patch added sse2 optimizations of bilinear filters that
operate on different-sized blocks.
Tests showed that the real-time encoder was speeded up by 3%.
Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
|
|
Change-Id: Ifdd951f24932839f06d1c700371662511dde6ebe
|
|
Change-Id: Iefe118f61a335e88821a21a9f50fb919212c1507
|
|
This reverts commit f9404f240642222775a371acde8fc0721b3812df.
This patch caused some ASAN error.
Change-Id: If15b7e581310e19061d111c69f2931809662ed19
|
|
This reverts commit b645257121da20b422dbbebf02aae0fc6dff95d4.
Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45
|
|
This reverts commit 511d218c60b9b6c1ab9383db746815e907af0359.
In current form intrinsics break borg build.
Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9
|
|
|
|
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization done only for 64bit.
Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
|
|
|
|
More intra optimizations will be added.
Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
|
|
Moving all code from that files to vp9_mvref_common.{h, c}.
Change-Id: Ibc4afcb8cea6847166ff411130e93611ebe63b20
|
|
Moving vp9_norm probability table from vp9_entropy.c to vp9_prob.c
Change-Id: Ie757b73860c6f43130790c332b292e2a1a81b788
|
|
Writing custom coeff branch count calculation (which is much clearer) in
adapt_coef_probs() function. Removing vp9_treecoder.c file.
Change-Id: I8880fb7a39996c8bcf6cd0acf9898a8c712ba91f
|
|
Moving all probability tables from removed file to vp9_entropy.c.
Change-Id: I12846f1da778c3016d96b82e53384d4634883430
|
|
Multiply by 3 was on 8bit vectors when it should have been on
16bit vectors.
Change-Id: I248c1429b3134dfd171dfab0ebb109fd2437e1fc
|
|
The change caused mismatches with some test vectors on neon.
Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/
Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0
|
|
|
|
Add support to do 16 pixel horizontal filtering in Neon.
Nexus devices saw about 0.5% decode speed increase.
Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
|
|
Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b
|
|
|
|
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h
Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
|
|
This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.
Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
|
|
|
|
Change-Id: Ib27fc4f3dbe01fe8adfa04a61aaba21b3480e75c
|
|
Change-Id: Ia7f640ca395e8deaac5986f19d11ab18d85eec2d
|
|
I63df79a13cf62aa2c9360a7a26933c100f9ebda3."
|
|
cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3.
Change-Id: I034848cf05031618818f7df2e7f9c35102686948
|
|
This CL contains two AVX2 optimized loop filter functions,
mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16.
Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
|
|
Change-Id: Iedcdb8867084f328f4fce2fadb968e0984217308
|
|
|
|
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.
Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
|
|
|
|
Change-Id: Ic31b4ef85e65070b4f8b9f26e068ccfaae00c4f0
|