Age | Commit message (Collapse) | Author |
|
-.220 BDRATE derf: https://x20web.corp.google.com/~aconverse/results/cost256_derf.html
-.675 BDRATE hevcmr: https://x20web.corp.google.com/~aconverse/results/cost256_hevcmr.html
Change-Id: Ifb1646d8ce65ffe0eff9953a911b1b88735b335f
|
|
|
|
Change-Id: Ifa607dd2bb366ce09fa16dfcad3cc45a2440c185
|
|
This is a pure-refactor in preparation to potentially raise the bit-cost
resolution.
Verified at good speed 0 and rt speed -6.
Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9
|
|
The decoder does not use this function.
Change-Id: Ie67f909c0f4108ef286789c70df867d4b960a780
|
|
This commit enables encoder to avoid 8x4 and 4x8 partitions for
scaled reference frames when libvpx is configured and built with
--enable-better-hw-compatibility
Change-Id: I02ad65c386f5855f4325d72570c49164ed52f413
|
|
Change-Id: I12de2dd5e5f375551804166188d76a9ad8067b41
|
|
This commit makes the sub8x8 block rate-distortion optimization
scheme use precise motion compensated prediction to compute the rd
cost. It fixes a potential buffer overflow issue related to sub8x8
motion search on scaled reference frame.
Change-Id: I4274992ef4f54eaacfde60db045e269c13aaa2de
|
|
Found with clang -fsanitize=integer
Change-Id: I2538e7483cb2d5f06bceecbd3326bdd88bfecfa1
|
|
This change alters the nature and use of exhaustive motion search.
Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.
Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.
For example:
stage 1: Range +/- 64 interval 4
stage 2: Range +/- 32 interval 2
stage 3: Range +/- 15 interval 1
This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.
This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained a bug (the two searches used different distortion
metrics).
For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.
Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.
Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
|
|
On derflr, +0.1% for VP10; however, -0.03% on VP9.
Change-Id: I09c724232ede74254043d61d3cadc506256af0af
|
|
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.
Further 2 optimizations are applied:
1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.
The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.
Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
|
|
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.
Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
|
|
Change-Id: Ia5072a3a92212d8565f33359f6c146469bdfbbec
|
|
Coding gain:
derflr 0.142%
hevclr 0.153%
hevcmr 0.124%
Change-Id: I63b56ae3a9002c3a266e10e2964135ed43b0ba53
|
|
Access scaled reference frame in the sub8x8 rate-distortion
optimization loop only when the current test mode is an inter mode.
This prevents an ioc warning triggered by sending intra_frame index
to fetch scaled reference frame.
Change-Id: I6177ecc946651dd86c7ce362e3f65c4074444604
|
|
This commit allows the encoder to include sub8x8 inter mode with
scaled reference frame in the rate-distortion optimization scheme.
Change-Id: Ibbe9678801592826ef22566566dcdeeb008350d5
|
|
Change-Id: I2e387a06484a06301f3cd6600c4ba2f4335b61ee
|
|
prevents redeclaration warnings;
vp8 has its own define which will be resolved in a future commit
Change-Id: Ic941fef3dd4262fcdce48b73075fe6b375f11c9c
|
|
stdhd +0.226 hevchr +0.091 hevcmr +0.052 derflr +0.033
Change-Id: I84034209c5760609a99bd6e0ce55e02534b72cac
|
|
Change-Id: Ia069da11eebb271063e9eb837bdb3e7175ecce13
|
|
Use system_state.h in vpx_dsp and remove unneeded includes of
vp9_systemdependent.h.
Change-Id: I92557ec6dd5aa790160b4f31fe7967db0d7ec3c4
|
|
Change-Id: I77e397ac9f594c9c4c1db442e334a6ea5f53f588
|
|
Change-Id: I83ed3422f1f4009675ad2f5c4b7236bc7b83b30e
|
|
Change-Id: Iaa43aeeb7a2074495e00cdb83bb551c3f13d3ed2
|
|
Change-Id: Ic1ce346a053800ae3b2d77178f46e6a388357f6d
|
|
Change-Id: Ic0b4e92cbaf813bcca8a8e9052c936c2e025e114
|
|
|
|
This is using a define instead of an enum to keep byte packing.
Change-Id: I3abb07c8bfe377e19be4531b624af7b7b4207792
|
|
Don't run rate_block (cost_coeffs) if distortion alone is enough to
surpass best_rd.
This decreases 2nd pass runtime on HD at speed 2 by about 2%. There is
zero effect on output if tx_cache is removed.
Change-Id: Ia3b1cc77bfbe6ee988c395fde06c0eb92940b784
|
|
1. The RD scores obtained during the tx size selection were stored in the
tx cache, and used to help make the tx decision for the following frames.
This wasn't used anymore in VP9 encoder. Recovered the related decision
making code from 1.5+ years ago, and borg tests didn't show any quality
gain. This patch removed it to lower the complexity.
2. An optimization was done after the above refactoring. If the tx_mode
is not TX_MODE_SELECT, we only need to test the chosen tx size instead
of all posible tx sizes. This gave a 1.5% average speed gain at speed 2,
and a 1% average speed gain at speed 3.
Change-Id: Id8cd650e066a8cef33829d8c15388a8138adc78c
|
|
|
|
Change-Id: I8877025e172fff29bc4e270790211463b676b4d7
|
|
Change-Id: I9d613cbe9e76b5dd15e935878ef9fd04521690ba
|
|
Clean up the forward 2D-DCT function names in vpx_dsp.
Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25
|
|
Replace vp9_ in names to vpx_ as they are not codec specific.
Change-Id: I2e583aa63dee769353ada4b42417aa15c4074ebb
|
|
Separate the hybrid transform case from 2D-DCT case. This will
allow us to clear up cross dependency between c and SIMD
implementations later.
Change-Id: Iaa499e8b096850a1c5a0c50a3b6e63e15d0184bf
|
|
This commit simplifies the intra block boundary condition logic.
It removes the block index from the argument set.
Change-Id: If00142512eb88992613d6609356dfd73ba390138
|
|
Changes to allow more use of rectangular partitions at
speeds 1 and 2 for content classed by the first pass as
animation and for blocks near the active image edge.
This has quite a big impact in quality for the animated
test sequence but also hurts encode speed for speed 2.
For other content types the impact on both speed and
quality is small.
Added some plumbing for detection of internal vertical
image edges.
Change-Id: I3fc48de2349f8cb87946caaf0b06dbb0ea261a9a
|
|
Change speed features / behavior for split mode when there
is an internal active edge (e.g. formatting bars).
Remove some threshold constraints in rd code near the active
edge of the image.
Add some plumbing for left and right active edge detection.
Patch set 5. Limit rd pass through for sub 8x8 to internal active edges.
This takes away any speed penalty for most clips but keeps the enhanced
edge coding for the more critical case of internal image edges
Change-Id: If644e4762874de4fe9cbb0a66211953fa74c13a5
|
|
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
|
|
|
|
expose filter_kernels[] and do the table lookup directly
Change-Id: I0b10bff0327c3e01a723736141a9ffd377cd3d20
|
|
Factor out the subtraction operator as common function.
Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
|
|
to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)
Change-Id: If006abb2224acaf326df3c2be09e77e967662107
|
|
and added to MACROBLOCKD.
Change-Id: I0e60aaa9f84bcc9f2376d71bd934f251baee38db
|
|
and change name.
Change-Id: I706645cf9d9dc04f1b3b6ac80df80edb7f101854
|
|
and changed name.
Change-Id: Ie023ca66cc2c823032f58d4faeb53fd1863c94f3
|
|
Reduced size from 124 bytes to 104 bytes. For decode only builds,
it is reduced to 68 bytes.
Change-Id: If9e6b92285459425fa086ab5a743d0a598a69de3
|
|
Various header/test files had to be re-worked in order to
build "Remove cm parameter from vp9_decode_block_tokens()".
This patch reverts the "Remove cm" part and only contains
the re-worked header files.
Change-Id: I520958a88d1991fee988a3c784d0eac40e117a32
|