Age | Commit message (Collapse) | Author |
|
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.
Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.
There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5.
Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
|
|
master branch."
This reverts commit bde04ce5039cbcf86c8b34bdb4127e18d7e1d0c7
Change-Id: I053dae04c761b04a36dc239558503905a14d2470
|
|
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
|
|
|
|
These two parameters are used to control the denoiser cut-off
thresholds. They should be properly initialized when starting
mode search of a given block.
Change-Id: Iba8a25487026a0dbe0d350c347d7e4e4e237b637
|
|
This fixes the issue that sub8x8 inter blocks always end up
with GOLDEN_FRAME.
Change-Id: Id0c25cbb9c2003f43b4dff8fb1572512c246e077
|
|
Replace ref_frame++ with ++ref_frame.
Change-Id: Ic39793081156c314bf1b85d5ab76def97f3bff52
|
|
|
|
Use mbmi->segment_id directly in vp9_pick_inter_mode. The value is
set outside this function, hence no need to assign it again.
Change-Id: I3d63cdd2e4fadf62ccdefada638b00d979eb3741
|
|
|
|
|
|
This commit simplifies the reference motion vector part for sub8x8
block coding in RTC mode and reduces the required local variables.
Change-Id: I470d1482092563b68af22404dc1f497e7457b0a8
|
|
This commit enables sub8x8 inter block coding for RTC mode. The
use of sub8x8 blocks can be turned on by allowing
choose_partitioning function to select 4x4/4x8/8x4 block sizes.
Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a
|
|
Properly set the corresponding scaling factor of the reference
frame in the non-RD mode decision process. This allows the mode
search process to account for the scaled reference frame when
selecting coding mode.
Change-Id: I9d41bff6931c98e5a82b413e37ac5e6e14b93b23
|
|
This commit adds a guard condition to the intra mode test skip
control in RTC coding mode. If all inter modes are skipped, force
the encoder to check intra mode. It avoids situations where the
encoder processes without properly assigning required mode
information.
Change-Id: Ibb349fee997d6584ce901d08b06e8df3ca9c01b1
|
|
The alternate reference frame is disabled in non-RD mode. No need
to keep the related entries in the THR_MODES array.
Change-Id: I53386f4bb1c6284f582801f27246c5edf55bc24b
|
|
In RTC coding mode, the alternate reference frame modes and compound
inter prediction modes are disabled. This commit reworks the
related mode search threshold update process to skip interacting
with these coding modes. It provides about 1.5% speed-up for speed
-6 on average.
vidyo1
16551 b/f, 40.451 dB, 6261 ms -> 16550 b/f, 40.459 dB, 6190 ms
nik720p
33316 b/f, 38.795 dB, 6335 ms -> 33310 b/f, 38.798 dB, 6237 ms
mmmoving
33265 b/f, 41.055 dB, 7176 ms -> 33267 b/f, 41.064 dB, 7084 ms
dark720
33329 b/f, 39.729 dB, 11235 ms -> 33331 b/f, 39.733 dB, 10731 ms
Change-Id: If2a4090a371cd28f579be219c013b972d7d9b97f
|
|
Use a temporary variable to store the transform size associated
with the best intra mode and restore the mode_info if the overall
best mode is intra mode.
Change-Id: I2606e0061ad32f91b095462902b1eb734b128eea
|
|
When multiple intra modes are tested, the previous mode info
update process may overwrite the selected best intra mode and make
the final selection use an inter mode. This commit fixes this
issue by moving the mode_info reset outside the intra mode search
loop.
Change-Id: I15ed4288a6b3cb0832104a5e6d5d9a25cd1a5b2b
|
|
If vp9_pick_inter_mode works properly, it should at least check
one coding mode and hence get best_tx_size assigned a valid value.
There is no need to initialize best_tx_size with a legitimate
value before starting the mode search.
Change-Id: Ic0496cd89672ea9c2c512a9bd1da952190af9cba
|
|
Make the variable reduction_fac log2 based and explicitly use
right shift when computing intra_cost_penalty.
Change-Id: I208f1fb879a02debb3b3fc64f9fd06260dcf1c86
|
|
Use left shift to replace one multiplication. The computation
outcome remains identical.
Change-Id: I1e1737af0a245de0d2a2bde10f0c171477199fc1
|
|
Replace error_resilient flag with use_prev_frame_mvs in
vp9_pick_inter_mode reference motion vector search selection.
This effectively turns off the simplified ref mv search in the
settings of frame resizing, even if error-resilient mode is off.
Change-Id: I7fed814ee7bc0cb419a03b846e0fc2de46ba7686
|
|
The initial reset of this_rdc in vp9_pick_inter_mode is not needed,
since it will be re-assign when used.
Change-Id: Ic0e12d741cbab292fc214c1eabb48b129af7839b
|
|
Compare the current best mode rate-distortion cost with the skip
threshold to decide if performing motion search.
Change-Id: Ia071824f8dd3b7db485f424692a485a2da6a1a9f
|
|
Change-Id: I0222f7abc61202f4a83b117bbfb042ada6304562
|
|
Bug was introduced in https://gerrit.chromium.org/gerrit/#/c/72122/
Change-Id: Idb500ea619a30e7bc50e22fb8ee03be5282f41db
|
|
Change-Id: Ic072585ebffdb36982ed7b8b9f875ca6c1c656c4
|
|
This commit allows the encoder to increase the mode test kick-off
thresholds if the previous best mode renders all zero quantized
coefficients, thereby saving motion search runs when possible.
The compression performance of speed -5 and -6 is down by -0.446%
and 0.591%, respectively. The runtime of speed -6 is improved by
10% for many test clips.
vidyo1, 1000 kbps
16578 b/f, 40.316 dB, 7873 ms -> 16575 b/f, 40.262 dB, 7126 ms
nik720p, 1000 kbps
33311 b/f, 38.651 dB, 7263 ms -> 33304 b/f, 38.629 dB, 6865 ms
dark720p, 1000 kbps
33331 b/f, 39.718 dB, 13596 ms -> 33324 b/f, 39.651 dB, 12000 ms
mmoving, 1000 kbps
33263 b/f, 40.983 dB, 7566 ms -> 33259 b/f, 40.978 dB, 7531 ms
Change-Id: I7591617ff113e91125ec32c9b853e257fbc41d90
|
|
|
|
|
|
This patch modified struct VP9_COMP. Created a struct ThreadData
to include data that need to be copied for each thread. In
multiple thread case, one thread processes one tile. all threads
share one copy of VP9_COMP,
(refer to VP9_COMP *cpi in the code)
but each thread has its own copy of ThreadData,
(refer to ThreadData *td in the code).
Therefore, within the scope of encode_tiles(), both cpi and td
need to be passed as function parameters.
In single thread case, the FRAME_COUNTS pointer in ThreadData
points to "counts" in VP9_COMMON.
Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
|
|
The intra mode penalty is covered by intra_cost_penalty. This
commit removes the other intra cost threshold, provided that the
constant 50 is negligible in normal rate-distortion cost.
Change-Id: I9b8b7483c43b9a41741622e7057def1f7d51bb72
|
|
This commit makes a non-RD coding mode decision process for key
frame coding. It can be optionally turned on in speed -6 and above.
Change-Id: I0847258b392877a0210b4768bef88ebc9ad009b5
|
|
This commit allows more aggressive decision to skip forward
transform and quantization for luma component in RTC coding mode.
The chroma components remains going through the normal coding
routine, since they are not included in the non-RD mode search
process.
It reduces the runtime cost by 2% - 10%. In speed -6,
vidyo1 1000 kbps
16576 b/f, 40.281 dB, 8402 ms -> 16576 b/f, 40.323 dB, 7764 ms
nik720p 1000 kbps
33337 b/f, 38.622 dB, 7473 ms -> 33299 b/f, 38.660 dB, 7314 ms
dark720p 1000 kbps
33330 b/f, 39.785 dB, 13505 ms -> 33325 b/f, 39.714 dB, 13105 ms
The compression performance of speed -6 is improved by 0.44% in
PSNR and 1.31% in SSIM.
Change-Id: Iae9e3738de6255babea734e5897f29118bebc6d7
|
|
6.3% better compression
less than 1% compression time increase
Change-Id: Ie83c059436e54c09de9e7c87e06e0a6d40dc38fe
|
|
This commit adds a check condition to the prediction buffering
operation used in the rtc coding mode. This resolves a unit test
warning in example/vpx_tsvc_encoder_vp9_mode_7.
Change-Id: I9fd50d5956948b73b53bd8fc5a16ee66aff61995
|
|
|
|
This commit makes the speed -6 and above use the reconstructed
boundary pixels for precise intra prediction. This allows more
intra prediction modes to be tested in the non-RD coding process.
Enabling horizontal and vertical intra prediction modes can
improve the speed -6 compression performance for rtc set
by 0.331%.
Change-Id: I3a99f9d12c6af54de2bdbf28c76eab8e0905f744
|
|
Change-Id: I39d9f13fa34984ee9dad0c4f303ef672635f420e
|
|
This commit makes the RTC coding mode to conditionally skip the
reference frame mode search, when the predicted motion vector of
the current reference frame gives more than two times sum of
absolute difference compared to that of other reference frames.
It reduces the runtim by 1% - 4% for speed -5 and -6. The average
compression performance is improved by about 0.1% in both settings.
It is of particular benefit to light change scenarios. The
compression performance of test clip mmmovingvga.y4m is improved by
6.39% and 15.69% at high bit rates for speed -5 and -6, respectively.
Speed -5
vidyo1 16555 b/f, 40.818 dB, 12422 ms ->
16552 b/f, 40.804 dB, 12100 ms
nik 33211 b/f, 39.138 dB, 11341 ms ->
33228 b/f, 39.139 dB, 11023 ms
mmmoving 33263 b/f, 40.935 dB, 13508 ms ->
33256 b/f, 41.068 dB, 12861 ms
Speed -6
vidyo1 16541 b/f, 40.227 dB, 8437 ms ->
16540 b/f, 40.220 dB, 8216 ms
nik 33272 b/f, 38.399 dB, 7610 ms ->
33267 b/f, 38.414 dB, 7490 ms
mmmoving 33255 b/f, 40.555 dB, 7523 ms ->
33257 b/f, 40.975 dB, 7493 ms
Change-Id: Id2aef76ef74a3cba5e9a82a83b792144948c6a91
|
|
Fix the alignment of entries fo intra prediction modes.
Change-Id: Ie32ad87cf90694efd591a4b1cc29c916c4cd56f7
|
|
|
|
|
|
|
|
This improves coding performance of speed -5 and -6 by 0.6%,
respectively.
Change-Id: Ic5a7746a88c73285f0b14333d35dc16b02152c25
|
|
Reduce the scope of function parameters.
Change-Id: Ifef2cfb559908a97498ffdbd6ea53da1cd45a73c
|
|
This commit makes the inter prediction buffer system to support
hybrid partition search. It reduces the runtime of speed -5 by
about 3%. No compression performance change.
vidyo1 720p 1000 kbps
11831 ms -> 11497 ms
nik 720p 1000 kbps
10919 ms -> 10645 ms
Change-Id: I5b2da747c6395c253cd074d3907f5402e1840c36
|
|
INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV."
|
|
Adaptively adjust the mode thresholds after each mode search round
to skip checking less likely selected modes. Local tests indicate
5% - 10% speed-up in speed -5 and -6. Average coding performance
loss is -1.055%.
speed -5
vidyo1 720p 1000 kbps
16533 b/f, 40.851 dB, 12607 ms -> 16556 b/f, 40.796 dB, 11831 ms
nik 720p 1000 kbps
33229 b/f, 39.127 dB, 11468 ms -> 33235 b/f, 39.131 dB, 10919 ms
speed -6
vidyo1 720p 1000 kbps
16549 b/f, 40.268 dB, 10138 ms -> 16538 b/f, 40.212 dB, 8456 ms
nik 720p 1000 kbps
33271 b/f, 38.433 dB, 7886 ms -> 33279 b/f, 38.416 dB, 7843 ms
Change-Id: I2c2963f1ce4ed9c1cf233b5b2c880b682e1c1e8b
|