summaryrefslogtreecommitdiff
path: root/vp9/decoder/vp9_dthread.c
AgeCommit message (Collapse)Author
2015-02-04Mute the harmless tsan error in frame parallel decode.hkuang
Change-Id: I52565fd90461221f89134997a0782cb1b681df01
2015-01-30Try again to merge branch 'frame-parallel' into master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. Current frame parallel decode will only speed up the decoding for frame parallel encoded videos. For non frame parallel encoded videos, frame parallel decode is slower than serial decode due to lack of loopfilter worker thread. There are still some known issues that need to be addressed. For example: decode frame parallel videos with segmentation enabled is not right sometimes. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5. Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
2015-01-23Revert "Merge branch 'frame-parallel' to enable frame parallel decode in ↵Johann
master branch." This reverts commit bde04ce5039cbcf86c8b34bdb4127e18d7e1d0c7 Change-Id: I053dae04c761b04a36dc239558503905a14d2470
2015-01-22Merge branch 'frame-parallel' to enable frame parallel decode in master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. VP9 frame parallel decode is >30% faster than serial decode with tile parallel threading which will makes devices play 1080P VP9 videos more easily. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
2015-01-16vp9_ethread: add parallel loopfilterYunqing Wang
1. Added row-based loopfilter in encoder; 2. Moved common multi-threaded loopfilter functions from decoder to common; 3. Merged multi-threaded loopfilter code, and made encoder/ decoder call same function to reduce code duplication. Encoder tests showed that 1% - 2% speedup was seen for good-quality 2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6% speedup using 4 threads were seen for real-time mode(at speed 7). Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
2014-12-08Add error handling for frame parallel decode and unit test for that.hkuang
Change-Id: I6e309e11f1641618d2424b7a2c0fe744b8974dec
2014-10-23add vp9_loop_filter_data_resetJames Zern
Change-Id: I8a9c9019242ec10fa499a78db322221bf96a0275
2014-10-22Implement frame parallel decode for VP9.Hangyu Kuang
Using 4 threads, frame parallel decode is ~3x faster than single thread decode and around 30% faster than tile parallel decode for frame parallel encoded video on both Android and desktop with 4 threads. Decode speed is scalable to threads too which means decode could be even faster with more threads. Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
2014-10-16move LFWorkerData allocation to VP9LfSyncJames Zern
this removes an assumption that worker->data1 would be pointing to a TileWorkerData allocation. additionally, within the multi-threaded loopfilter pass VP9LfSync as a parameter to the worker hook, removing the need for a shadow pointer in LFWorkerData. Change-Id: Ic7b2faa34e3eb59dbcb8a7c67f333448fa047c88
2014-10-16vp9_loop_filter_frame_mt: remove pbi dependencyJames Zern
Change-Id: I44d42a5098305a2d050ce8ff3c76baf7798c48af
2014-10-16vp9_loop_filter_frame_mt: pass planes directlyJames Zern
one less dependency on pbi Change-Id: I3f3a392416d3523f4aea6682c3965885baf85197
2014-10-16vp9_loop_filter_frame_mt: pass VP9LfSync directlyJames Zern
a step towards removing the pbi dependency Change-Id: I10747b325e81c172f5e67031ea5159159fc26e91
2014-10-07Resolves some static analysis / undefined warningsDeb Mukherjee
Also fixes a case of distortion becoming negative and messing up the RDCOST computation. Change-Id: Id345af9e8dfff31ade622be5756e51f2cdface53
2014-09-19Remove mi_grid_* structures.hkuang
mi_grid_* are arrays of pointer to pointer. They save the pointers that point to the MIs in cm->mi. But they are unnecessary and complicated. The original goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer inside MODE_INFO_t, same goal could be achieved. This commit totally removes the mi_grid_* structures. But there are still many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit will do on-demand MODE_INFO_t allocation in order to save these memories. Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
2014-09-09Merge changes If8887e1d,I36bfc9c8,I3d1e6c42James Zern
* changes: vp9_dthread: simplify loop_filter_row_worker signature simplify vp9_loop_filter_worker signature vp9_decodeframe: simplify tile_work_hook signature
2014-09-08vp9_loop_filter_frame_mt: defer allocationsJames Zern
the code currently checks whether the allocation has been done instead of allocating on the first frame. since: 4f27202 vp9: fix crash in mt loopfilter w/corrupt file this change defers the allocation until the loop filter is used. Change-Id: I660c1b7f34e713a8dd9884483f01d23b9847366e
2014-09-08vp9_loop_filter_alloc: reorder parametersJames Zern
VP9LfSync lf_sync is being operated on, make it the first parameter as in dealloc Change-Id: Id3cdf6b6a48157627780ae0d5d4b7dfa94a78078
2014-09-08vp9_dthread: simplify loop_filter_row_worker signatureJames Zern
use the type names directly in the function declaration rather than (void *arg1, void *arg2) Change-Id: If8887e1dbcdf84842783a92f91668bef6223c9e5
2014-08-29vp9: fix m/t loop filter invalid freeJames Zern
store the number of allocated rows in VP9LfSync, the calculated values can not be relied on when dealing with corrupt material. Change-Id: I13b8bcec9738c299a71df726772ab7ac05511e5b
2014-08-27vp9: fix crash in mt loopfilter w/corrupt fileJames Zern
if the first frame was corrupt and loop filter not called, the next call would assume the necessary allocations had been done and segfault when accessing a NULL pointer Change-Id: Ib6ef505e5c594e6f0fe65ab0700172bcf06b92a6
2014-07-01update vp9_thread.[hc]James Zern
pull the latest from WebP, which adds a worker interface abstraction allowing an application to override init/reset/sync/launch/execute/end this has the side effect of removing a harmless, but annoying, TSan warning. Original source: http://git.chromium.org/webm/libwebp.git 100644 blob 08ad4e1fecba302bf1247645e84a7d2779956bc3 src/utils/thread.c 100644 blob 7bd451b124ae3b81596abfbcc823e3cb129d3a38 src/utils/thread.h Local modifications: - s/WebP/VP9/g - camelcase functions -> lower with _'s - associate '*' with the variable, not the type Change-Id: I875ac5a74ed873cbcb19a3a100b5e0ca6fcd9aed
2014-06-29silence unused parm warning for worker thread in loop filterJim Bankoski
Change-Id: Id51468f99f8970b8795ce2d254344f4b8d7817d0
2014-05-16Removing MACROBLOCKD dependency from loop filter.Dmitry Kovalev
Change-Id: I9ef40f3d95ab8f94f69e92ea25678a40956bc1ce
2014-05-15cleanup -wextra warnings:Yaowu Xu
vp9_decoder.c vp9_dthread.c Change-Id: Iaafe941545db98e9e3559096a955894646084ac2
2014-05-12Moving loopfilter call to vp9_decode_frame().Dmitry Kovalev
Inline loopfilter has been already handled in vp9_decode_frame(). Collecting all similar code in one place now. Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd
2014-05-08Removing VP9DecoderConfig.Dmitry Kovalev
We only used two members from that struct: max_threads and inv_tile_order. Moving them directly to VP9Decoder struct. Change-Id: If696a4e5b5b41868a55f3cc971e1d7c1dd9d5f69
2014-04-10Cleaning up vp9_dthread.{c, h}.Dmitry Kovalev
Change-Id: If33087462293605f79d9281af133091fff33a876
2014-04-08Merge "Fix decoder resolution change with tiles"Frank Galligan
2014-04-08Fix decoder resolution change with tilesFrank Galligan
There was a bug with the decoder that if you started the decoder with more threads than the first frame had tile columns. Afterwards tried to decode a frame with more tile columns than the first frame, the decoder would hang. E.g. run vpxdec --threads=4. The first frame had two tile columns, then the next key frame had 4 tile columns, the decoder would hang. If you started with 4 tiles and switched to 2 tiles the decoder would be fine. The issue is that the worker the thread loop is using is stale. I added a test vector "vp90-2-14-resize-848x480-1280x720.webm" that exhibited the bug. Change-Id: I7bdd47241a52ac0fe1c693a609bc779257e94229
2014-04-08Renaming VP9D_COMP & VP9Decompressor to VP9Decoder.Dmitry Kovalev
Change-Id: Ieb9b455b8aaef9884391021b7f640ef24c554687
2014-04-01Renaming two members in MACROBLOCKD struct.Dmitry Kovalev
Renames: mi_8x8 -> mi mode_info_stride -> mi_stride Change-Id: I66f3e5fd1e7b7f46f108af5bb711c5fd9493c1be
2014-03-10Merge "Renaming vp9_onyxd.h and vp9_onyxd_if.c to vp9_decoder.{h, c}."Dmitry Kovalev
2014-03-10Merge "vp9_reconinter.h static functions in header converted to global"Jim Bankoski
2014-03-06Renaming vp9_onyxd.h and vp9_onyxd_if.c to vp9_decoder.{h, c}.Dmitry Kovalev
Change-Id: Ibd0892be1ddadd93b8a22fa2c2e2053001f2948f
2014-03-05Removing vp9_onyxd_int.h file.Dmitry Kovalev
Moving VP9Decompressor struct from vp9_onyxd_int.h to vp9_onyxd.h. Change-Id: Ic86c15e44130541a7f692db43ef9109293f99ae8
2014-03-03vp9_reconinter.h static functions in header converted to globalJim Bankoski
Change-Id: I916944950deb22f4c2301d83a803b732bf3ecd77
2014-02-06vp9_dthread: interleave mutex/cond alloc+initJames Zern
this ensures both are properly initialized when calling _dealloc(). + check the arrays before access Change-Id: I789af39b41c271b5cb3c029526581b4d9903b895
2014-01-31Rename a loopfilter parameterYunqing Wang
As pointed out by Dmitry and James, "partial" is a Microsoft- specific c++ keyword, and it is renamed. Change-Id: Ia0fc11ceb89e54b3195287f89f7e26edbbe9beb8
2014-01-31vp9 decoder: row-based multi-threaded loopfilterYunqing Wang
Implemented parallel loopfiltering, which uses existing tile- decoding threads. Each thread works on one row, and when that row is loopfiltered, it moves to next unattended row. To ensure the correct filtering order, threads are synchronized and one superblock is filtered only if the superblocks it depends on are filtered already. To reduce synchronization overhead and speed up the decoder, we use nsync > 1 for high resolution. Performance tests: 1. on desktop: 8-tile 4k video using 8 threads, speedup: 70% - 80% 4-tile HD video using 4 threads, speedup: ~35% 2. on mobile device(Nexus 7): 4-tile 1080p video using 4 threads, speedup: 18% - 25% 4-tile 1080p video using 2 threads, speedup: 10% - 15% Change-Id: If54b4a11960dd706c22d5ad145ad94156031f36a