Age | Commit message (Collapse) | Author |
|
Do full pixel MV search around all 3 MV candidates.
Coding gains for speed 0:
avg_psnr ovr_psnr ssim
lowres -0.088% -0.095% -0.117%
midres -0.175% -0.177% -0.148%
hdres -0.115% -0.146% -0.146%
Coding gains for speed 1:
avg_psnr ovr_psnr ssim
lowres -0.089% -0.104% -0.124%
midres -0.151% -0.171% -0.195%
hdres -0.110% -0.105% -0.132%
Tested encoding speed with speed 1 QP=30,40 over 10 midres sequences,
average speed loss is about 1%.
Change-Id: I9e6de035f4ed2e814e6494aefc2f84aae333a6b4
|
|
Before this patch, pred_mv is used only when the
adaptive_motion_search speed feature is on(speed>=1).
This patch enables pred_mv for speed 0 as well.
Coding gains:
avg_psnr ovr_psnr ssim
lowres -0.31% -0.32% -0.38%
midres -0.37% -0.41% -0.42%
hdres -0.30% -0.31% -0.29%
Tested encoding speed over 18 midres sequences with QP=40. The
overall speed loss is about 0.6%.
Change-Id: I8987e9efb5a70d2bf8779fc2a43838009f9bbd8a
|
|
Adapt the Lagrangian multipler based on the spatial variance in
the temporal dependency model. The functionality is disabled by
default. To turn on, set enable_tpl_model to 1.
Change-Id: I1b50606d9e2c8eb9c790c49eacc12c00d3d7c211
|
|
To save a branch.
Change-Id: Ifa2be7583e95c6991784731c654bbd4cce31e993
|
|
Instead of doing it in every transform search loop.
Change-Id: I12dc402a6633d1a27d32cb6b58710b8c0ebf0fd4
|
|
The intra mode rd penalty was implemented as a rate penalty.
Code was added to scale the penalty according to block size but
this was not done correctly for the SB level or sub 8x8.
The code did a weird double scaling in regard to bit depth that
has been removed. Given that it is a rate penalty the bit depth
should not matter.
This bug fix improves average metrics on our standard test
sets by about 0.1%
Change-Id: I7cf81b66aad0cda389fe234f47beba01c7493b1e
|
|
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.
Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
|
|
Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.
Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
|
|
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
|
|
(Yunqing)
This patch added the missing initialization in temporal filter.
Borg test BDRate results:
PSNR: -0.019%(lowres); -0.013%(hdres);
SSIM: -0.001%(lowres); -0.010%(hdres).
Other q values gave comparable but no better results.
Change-Id: I7ad0c18b39e6f558342688e2fe1e12fdb133ce9b
|
|
Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2
|
|
Change-Id: I7217c90d5cf38c51b76759a2dc4f10070f3a40ac
|
|
Change-Id: I9afa02cae671bd3527cf344695e53d0cc767f549
|
|
Don't initialize first pass costs for a number of symbols where first
pass probabilities aren't initialized.
This brings a 1.22x first pass speedup.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I97438c357bd88f52f5a15c697031cf0c3cc8f510
|
|
The bit to error transformation got doubled as a result of going from
8-bit to 9-bit costs (change d13385c).
Use defines to derive the scale numbers and comment some of the fields.
derf: -0.023 BDRATE
hevcmr: +0.067 BDRATE
stdhd: +0.098 BDRATE
(These are substantially smaller than than the original gains from 8 to
9 bit costing.)
Change-Id: I6a2b3b029b2f1415e4f90a05709b2333ec0eea9b
|
|
|
|
Change-Id: Ifa607dd2bb366ce09fa16dfcad3cc45a2440c185
|
|
This is a pure-refactor in preparation to potentially raise the bit-cost
resolution.
Verified at good speed 0 and rt speed -6.
Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9
|
|
On derflr, +0.1% for VP10; however, -0.03% on VP9.
Change-Id: I09c724232ede74254043d61d3cadc506256af0af
|
|
Change-Id: I2e387a06484a06301f3cd6600c4ba2f4335b61ee
|
|
prevents redeclaration warnings;
vp8 has its own define which will be resolved in a future commit
Change-Id: Ic941fef3dd4262fcdce48b73075fe6b375f11c9c
|
|
In high bitdepth setting, the rate multipier may be set as 0. In
lossless mode, the RD cost would always be 0, resulting in bad
partition and prediction mode choices.
Change-Id: I297014dd8bfa8a07ff0ab480119f75678300ff68
|
|
Use system_state.h in vpx_dsp and remove unneeded includes of
vp9_systemdependent.h.
Change-Id: I92557ec6dd5aa790160b4f31fe7967db0d7ec3c4
|
|
Replace vp9_ in names to vpx_ as they are not codec specific.
Change-Id: I2e583aa63dee769353ada4b42417aa15c4074ebb
|
|
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
|
|
to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)
Change-Id: If006abb2224acaf326df3c2be09e77e967662107
|
|
Moved the frame_type check to the tile level and stored
the prob ptr in MACROBLOCKD.
Change-Id: I10b5a4abd58213dc7610e3ade1a1583c01526842
|
|
silences missing prototype warnings
Change-Id: Idaf68d83d2cb03847f3ee002c4d00c2ac79da604
|
|
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.
Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
|
|
vestigial. replace instances with memcpy() which they already were being
defined to.
Change-Id: Icfd1b0bc5d95b70efab91b9ae777ace1e81d2d7c
|
|
(see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6)
For the test clip used, the decoder performance improved by ~2%.
This is also an intermediate step towards adding back the
mode_info streams.
Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d
|
|
1. skip near if it is same as nearest
2. correct rounding for converting mv to fullpel position
3. update pred_mv_sad after new mv search.
Overall .1%~.25% compression gains on rtc set for speed 5, 6, 7, 8.
Change-Id: Ic300ca53f7da18073771f1bb993c58cde9deee89
|
|
add an assert to validate pred_mv array size
Change-Id: I532b882b71e2baff3ac76e07ed133ec5a11bd0fc
|
|
While turning on "--aq_mode=3", the quantizers are updated by each
thread. Fixed the me consts initialization function to make sure
that the correct thread data are updated.
Change-Id: Ied27bb7bae76fc3fa2cda4f8c35ac0b46271bef4
|
|
Frame buffers are now allocated dynamically on-demand.
Entries in the reference frame map, cm->ref_frame_map,
may now be set to -1 (INVALID_IDX) to indicate that
there is not a valid reference buffer in that "slot".
All slots in the reference frame map are now initialized
to the empty state (-1) and each buffer is initialized
to have a reference count of 0.
Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582
|
|
The block partition rate cost should be updated when recursive
partition search is needed.
Change-Id: I7bc5ad1fc2cbd3577dee7f7e8da111a2742bdeb9
|
|
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.
Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.
There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5.
Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
|
|
Change-Id: I78ef7f89586a329787f6bc4c58ec83af210989a3
|
|
This commit enables sub8x8 inter block coding for RTC mode. The
use of sub8x8 blocks can be turned on by allowing
choose_partitioning function to select 4x4/4x8/8x4 block sizes.
Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a
|
|
Use left shift to replace one multiplication. The computation
outcome remains identical.
Change-Id: I1e1737af0a245de0d2a2bde10f0c171477199fc1
|
|
This patch modified struct VP9_COMP. Created a struct ThreadData
to include data that need to be copied for each thread. In
multiple thread case, one thread processes one tile. all threads
share one copy of VP9_COMP,
(refer to VP9_COMP *cpi in the code)
but each thread has its own copy of ThreadData,
(refer to ThreadData *td in the code).
Therefore, within the scope of encode_tiles(), both cpi and td
need to be passed as function parameters.
In single thread case, the FRAME_COUNTS pointer in ThreadData
points to "counts" in VP9_COMMON.
Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
|
|
The max_partition_size and max_partition_size are set at the
beginning while setting speed features, and then adjusted at
SB level. Moving them to mb struct ensures there is a local
copy for each thread.
Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4
|
|
Prepare for the introduction of frame-size change
logic into the recode loop.
Separated the speed dependent features into
separate static and dynamic parts, the latter being
those features that are dependent on the frame size.
Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313
|
|
|
|
Reduce the scope of function parameters.
Change-Id: Ifef2cfb559908a97498ffdbd6ea53da1cd45a73c
|
|
|
|
Adaptively adjust the mode thresholds after each mode search round
to skip checking less likely selected modes. Local tests indicate
5% - 10% speed-up in speed -5 and -6. Average coding performance
loss is -1.055%.
speed -5
vidyo1 720p 1000 kbps
16533 b/f, 40.851 dB, 12607 ms -> 16556 b/f, 40.796 dB, 11831 ms
nik 720p 1000 kbps
33229 b/f, 39.127 dB, 11468 ms -> 33235 b/f, 39.131 dB, 10919 ms
speed -6
vidyo1 720p 1000 kbps
16549 b/f, 40.268 dB, 10138 ms -> 16538 b/f, 40.212 dB, 8456 ms
nik 720p 1000 kbps
33271 b/f, 38.433 dB, 7886 ms -> 33279 b/f, 38.416 dB, 7843 ms
Change-Id: I2c2963f1ce4ed9c1cf233b5b2c880b682e1c1e8b
|
|
Change-Id: I4bf0f9a38697f5aea564a47afd7f02bb8b2888b6
|
|
|
|
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.
Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
|