Age | Commit message (Collapse) | Author |
|
* changes:
add vp9_loop_filter_data_reset
move LFWorkerData allocation to VP9LfSync
vp9_loop_filter_frame_mt: remove pbi dependency
vp9_loop_filter_frame_mt: pass planes directly
vp9_loop_filter_frame_mt: pass VP9LfSync directly
vp9: store TileWorkerData allocations separately
|
|
Change-Id: I8a9c9019242ec10fa499a78db322221bf96a0275
|
|
|
|
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.
Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
|
|
When profiling, gprof can't distinguish between matching labels in
different files.
Change-Id: I56770df212ed314a0d8568071fa8157624ef1e8f
|
|
|
|
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain
Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
|
|
Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/
(a92f987a6b7819ae5c62a429e126e1c26bdb1b71) on highbitdepth branch.
Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8
|
|
this removes an assumption that worker->data1 would be pointing to a
TileWorkerData allocation.
additionally, within the multi-threaded loopfilter pass VP9LfSync as a
parameter to the worker hook, removing the need for a shadow pointer in
LFWorkerData.
Change-Id: Ic7b2faa34e3eb59dbcb8a7c67f333448fa047c88
|
|
|
|
This is based on the 64-bit ssse3 quantizer.
1.1x speedup for screen content at speed 7.
Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448
|
|
|
|
|
|
Change-Id: I016b4e77d8268e189473f4c382603afe1ae1750f
|
|
Change-Id: I4b4764463f5a7cdc01ec004b882c6237466c74b0
|
|
Change-Id: I5e79c276d8953ae17cd35b2846e6e40660c037c3
|
|
Change-Id: If2de420f8123a4e8bf635dd29205dd74ee174eee
|
|
|
|
Uses highbd_ prefix convention consistently.
Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
|
|
vp9_avg_8x8 does not depend on x86inc, fixes 32-bit OS X build
Change-Id: I709b874ea84bf57c8cdb5ac7d43eecc6b8c1a2dd
|
|
|
|
|
|
|
|
The concept:
There's too much noise in source pixels for variance and at low bitrate
the reconstructed looks nothing like the source so we have problems
getting good partitionings with either. This skirts the issue by using
a box blur scaled down version for variance calculations. To compare
against source_var_ moved keyframe to be rd based like source_var.
Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
|
|
|
|
|
|
|
|
This commit breaks the overly broad header files into more
targeted and smaller ones, to help better structure the system
layout.
Change-Id: I7b24559d3ea6e582cf5d9bbe8f71459f9824d71b
|
|
Change-Id: I3a7f83ab1dbfcedc8a82fe798c2fa30dd9c7d696
|
|
Change-Id: I6f2865bb8ba9295f5c45a4cad065aecbe1e63c32
|
|
|
|
The basic data defs should be above block operation level.
Change-Id: I7dd9836d01120ab75e0c472baac9f15495ed0db5
|
|
Change-Id: If0ea98aa139d14d40cd924114e18396aff36b5a5
|
|
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.
Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
|
|
Add comments on the use case of these definitions. Further reduce
the scope of header file in vp9_context_tree.h.
Change-Id: Ic4a7638e838d0ac441b64abfc56e57354c059d75
|
|
|
|
Also fixes a case of distortion becoming negative and messing
up the RDCOST computation.
Change-Id: Id345af9e8dfff31ade622be5756e51f2cdface53
|
|
|
|
This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are
only 16x16 blocks in denoiser, while in VP9, there are 13 different
block sizes.
By adding this SSE2 code, the improvement of encoder speed is around
20%(using C code vs using SSE2 code), vary for different clips.
The unit test for VP9 denoiser is to confirm that the SSE2 code is
bit-exact with the C code. The unit test covers all block size.
Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
|
|
Bit-stream clarification related to Issue 868.
Change-Id: I92a7bc5b7782c9ea5c3f6cceec761742183c9514
|
|
Resolves a visual studio warning, and includes some cleanups.
Change-Id: I6a7576ef323c475b7d1c659800cd82c6cb1fd18d
|
|
|
|
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.
Also incudes some minor cleanups/renaming etc.
Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
|
|
|
|
This commit changes the tables to be read only, which fixes
issue #866
Change-Id: I85bbe03f9d344f50570f8c1c61699bdc5cee248f
|
|
Change-Id: I5a566d6ade720f212a60c0ad5d6f1ee1d1d37f2e
|
|
|
|
|
|
Change-Id: Id92544762e7b96d3c729dfc8e04ecff91cbcc7f9
|
|
Miscellaneous bug-fixes for high bitdepth functionality.
With this patch, high bit-depth profiles become mostly functional,
except for an intermittent assert failure issue that is being
tracked.
Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c
|