summaryrefslogtreecommitdiff
path: root/vp9/decoder/vp9_decoder.h
AgeCommit message (Collapse)Author
2023-05-05Add AVX2 intrinsic for idct16x16 and idct32x32 functionsAnupam Pandey
Added AVX2 intrinsic optimization for the following functions 1. vpx_idct16x16_256_add 2. vpx_idct32x32_1024_add 3. vpx_idct32x32_135_add The module level scaling w.r.t C function (timer based) for existing (SSE2) and new AVX2 intrinsics: Scaling Function Name SSE2 AVX2 vpx_idct32x32_1024_add 3.62x 7.49x vpx_idct32x32_135_add 4.85x 9.41x vpx_idct16x16_256_add 4.82x 7.70x This is a bit-exact change. Change-Id: Id9dda933aa1f5093bb6b35ac3b8a41846afca9d2
2019-11-14Move buffer from extend_and_predict into TileWorkerDataVitaly Buka
This avoids unneeded initializations. extend_and_predict is called from multiple nested loops, allocate large buffer on stack and use just a portion of it. -ftrivial-auto-var-init= inserts initializations which performed on multiple iterations of loops causing 258.5% regression on webrtc_perf_tests decode_time/pc_vp9svc_3sl_low_alice-video. Bug: 1020220, 977230 Change-Id: I7e5bb3c3780adab74dd8b5c8bd2a96bf45e0c231
2019-01-30Modify map read/write to sync logic in row_mt caseRitu Baldwa
Adds conditional wait/signal instead of sched_yield. Change-Id: I49a760eacdd6b6ac690e797ea5f10febf6a1a084
2019-01-19Revert "Revert "Add Tile-SB-Row based Multi-threading in Decoder""Ritu Baldwa
This reverts commit 06983668cf41f66765528db044419f954e5a5d64. Fixes Visual Studio build errors introduced by earlier row mt commit BUG=webm:1587 Change-Id: I792df86e8254cd6b2a511955b691af619a569cd0
2018-12-22Revert "Add Tile-SB-Row based Multi-threading in Decoder"James Zern
This reverts commit 02b3ef7faee5be5ee519856fbb3523d3ab49f6e7. Reason for revert: fails to build under visual studio Original change's description: > Add Tile-SB-Row based Multi-threading in Decoder > > Add the multi-thread function that decodes a video row by row instead > of a tile at a time. Create a job queue for queueing all parse and recon jobs. > Each SB row of a tile is a job. > > Performance Improvement: > > Platform Resolution 3 Threads 4 Threads > ARM 720p 36.81% 18.37% > 1080p 32.27% 14.76% > > ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz > > Change-Id: I3d4dd7a932fc2904c90d9546b2de99c809afd29e BUG=webm:1587 Change-Id: Ia4c8f5128922a205cd9fd83aaef8a2e73764d4a7
2018-12-18Add Tile-SB-Row based Multi-threading in DecoderRitu Baldwa
Add the multi-thread function that decodes a video row by row instead of a tile at a time. Create a job queue for queueing all parse and recon jobs. Each SB row of a tile is a job. Performance Improvement: Platform Resolution 3 Threads 4 Threads ARM 720p 36.81% 18.37% 1080p 32.27% 14.76% ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz Change-Id: I3d4dd7a932fc2904c90d9546b2de99c809afd29e
2018-10-30clang-tidy: fix vp9/decoder parametersJohann
BUG=webm:1444 Change-Id: I9c7c0a4161aaf52436bd5c01d30b035b2ff5508c
2018-10-25Add Memory to Enable Row DecodeRitu Baldwa
Row based multi-thread needs extra memory to store the parsed co-efficients, partitions and eob. This commit adds memory for the same. Change-Id: I13fa4a6ada2ec3048bc973e465055b832429388f
2018-10-08Loopfilter Multi-Thread OptimizationSupradeep T R
Take the original loopfilter multi-thread optimization (dafe064289a917977439ab6f4f002b9946496084) along with the fixes for bugs 1558 and 1562. BUG=webm:1558 BUG=webm:1562 Change-Id: Ibbf6bd13f4ffff0e79184ccfd6b85a49e067a6d8
2018-09-22Revert "Revert "Revert "Loopfilter MultiThread Optimization"""James Zern
This reverts commit bf6299010e815e111d7326530c249e9d99611f34. segfaults, causes an assertion failure with corrupt input: get_uv_tx_size: Assertion `mi->sb_type < BLOCK_8X8 || ss_size_lookup[mi->sb_type][pd->subsampling_x][pd->subsampling_y] != BLOCK_INVALID BUG=webm:1562 Change-Id: I05a711cad3d8e7f1a8e64422b4356bdf4edb3d12
2018-09-15cosmetics: normalize include guardsJames Zern
use the recommended format [1] of: <PROJECT>_<PATH>_<FILE>_H_ [1] https://google.github.io/styleguide/cppguide.html#The__define_Guard "All header files should have #define guards to prevent multiple inclusion. The format of the symbol name should be <PROJECT>_<PATH>_<FILE>_H_." Change-Id: I2e8ab0b32fb23c30fa43cff5fec12d043c0d2037
2018-09-12Revert "Revert "Loopfilter MultiThread Optimization""Venkatarama NG. Avadhani
This reverts commit 753fd86e86ac727dccac88376260b8f54502f2a3. This also has the fix for the DoS reported in bug 1558. BUG=webm:1558 Change-Id: I65ea84e0c11d6bd40d8cb0587dfe934b3ac11dce
2018-08-30Revert "Loopfilter MultiThread Optimization"James Zern
This reverts commit dafe064289a917977439ab6f4f002b9946496084. Corrupted files may cause the decoder to hang as row progress in the loopfilter is used to progress each thread. BUG=webm:1558 Change-Id: I0674ce9af14d3fb7b2da8124e7b600616c8e734a
2018-08-20Loopfilter MultiThread OptimizationSupradeep T R
Adding LPF within the tileworker hook. This means that LPF will be done immediately after decode, without waiting for all threads to sync. Performance Improvement - Platform Resolution 2 Threads 4 Threads X86 720p 7.24% 22.04% 1080p 5.29% 17.02% ARM 720p 4.61% 8.75% 1080p 5.55% 12.03% x86 Improvement measured on Intel Core i7-6700 CPU @ 2.10GHz set in performance with turbo mode off ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz Change-Id: Ifa73c71b40db3fa7fa16f54f4e3aa06d1258caae
2018-07-20Add Flag to Enable Row Based MultiThreadingVenkatarama NG. Avadhani
This commit adds a command line argument "--row-mt". Passing "--row-mt=1" will set the row_mt flag in the decoder context. This flag will be used to determine whether row-wise multi-threading path is to be taken when the row-wise multi-threading functions are added. Change-Id: I35a5393a2720254437daa5e796630709049e0bc2
2017-11-09vp9: Fix mem rel for non-ref for external buffer.Jerome Jiang
Release frame buffers for non-ref when the decoder is destroyed. Enable the non ref test. BUG=b/68819248 Change-Id: Id87ef3b0a62318f9812e927cd957c05c859047fa
2017-07-05vp9: remove FrameWorkerData & vp9_dthread.hJames Zern
the file was empty after the struct removal. the only remaining use was within vp9_dx_iface, but the wrapper became unnecessary after the removal of frame_parallel_decode. BUG=webm:1395 Change-Id: I515ab585d701e77d388d12b2802d844c424f9bcd
2017-06-30vp9_onyxc_int,RefCntBuffer: rm unused membersJames Zern
the last frame_worker_owner, row and col references were removed in: 131bd06e6 remove vp9_dthread.c BUG=webm:1395 Change-Id: Ia7fb2e8782b12a58d2a2263849d20a8abf06aef6
2017-06-30VP9Decoder: rm frame_parallel_decodeJames Zern
this has been 0 since the removal of frame_parallel_decode in vp9_dx_iface. BUG=webm:1395 Change-Id: I3f579766ecfa4777395b99686738e1c5610f86ef
2016-08-03vp9/decoder,vp9/*.[hc]: apply clang-formatclang-format
Change-Id: Ic38ea06c7b2fb3e8e94a4c0910e82672a1acaea7
2016-04-07VP9: Combine TileData with TileWorkerDataScott LaVarnway
Change-Id: I83536734a54ef7b85f90f56a51878d94fac7ff22
2015-11-17Fixed a few sanity checks.Zoe Liu
Change-Id: Ieec4a7be5945dc6de192e2d8292ab978baf47f53 (cherry picked from commit 2096296421c7fa56abb49470c0fbe7c4337b8a71)
2015-10-06vp9/tile_worker_hook: pass pbi directlyJames Zern
reduces the size of TileWorkerData reusing the storage in the worker itself Change-Id: If8a62fcb35167037c3da5814ab84fb81893f9cab
2015-10-06vp9/tile_worker_hook: add multiple tile decodingJames Zern
this reduces the number of synchronizations in decode_tiles_mt() and improves overall performance when the number of threads is less than the number of tiles Change-Id: Iaee6082673dc187ffe0e3d91a701d1e470c62924
2015-09-04VP9Decoder: remove duplicate tile_worker_infoJames Zern
unnecessary since: 86f4a3d Remove tile param Change-Id: Iff75d3acf6c5aade833ea0a214c919279403cf97
2015-07-20vpx_dsp/bitreader.h: vp9_->vpx_Yaowu Xu
Replace vp9_ in names to vpx_ as they are not codec specific. Change-Id: I2e583aa63dee769353ada4b42417aa15c4074ebb
2015-07-20Removed vp9_ prefix from vpx_dsp/bitreader file namesYaowu Xu
Change-Id: I0426126d0a65f13f9250983e44cc366b1b1a9c4a
2015-07-17Move bit reader files to vpx_dspYaowu Xu
Change-Id: Ib1cb1fbe92a39ff5312cee069559be6d3ea458d0
2015-07-08Don't allocate dqcoeff in MACROBLOCKD.Alex Converse
The encoder gets its dqcoeff from the context tree. In the decoder move it to directly after MACROBLOCKD. Change-Id: I46c9b76f26956a360d17de0b26ecb994dae34ecb
2015-07-02Rename vpx_thread to vpx_utilJingning Han
Change the dir name to include more util tools. Change-Id: Id5b16062803ce5eed872fe2edb36d7e56b32eed8
2015-07-02Use vpx prefix for codec independent threading functionsJingning Han
Replace vp9_ prefix with vpx_ for common multi-threading functions. Change-Id: I941a5ead9bfe8213fdad345511d2061b07797b55
2015-07-01Move multi-threading module functions into vpx_thread folderJingning Han
This commit moves the primitive multi-threading files from vp9 folder to vpx_thread, which will be accessible by all vpx codec. Change-Id: Ib51e66e9c69801c10631fab56d35a0c0aaed5883
2015-03-06Remove some unnecessary code in thread context copy.hkuang
Change-Id: Iddf098e1bae9c10fc2f325f84156f50a0bd0055a
2015-02-06Rename loopfilter_thread files to thread_common filesYunqing Wang
Renames the files to allow more common thread code to be moved to vp9/common. Change-Id: I7386e64e221086e3cdc087e79812f993c423413b
2015-02-04vp9_dthread: remove frame_parallel_decoding_mode requirementYunqing Wang
This patch continues the work to remove frame_parallel_decoding_mode requirement in VP9 multi-threaded tile decoder. In order to do that, the frame counts associated to each thread need to be accumulated together after the frame is decoded. Change-Id: Idba1a756cedfed3c154aef52ed82c8da3bbf9e0c
2015-01-30Try again to merge branch 'frame-parallel' into master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. Current frame parallel decode will only speed up the decoding for frame parallel encoded videos. For non frame parallel encoded videos, frame parallel decode is slower than serial decode due to lack of loopfilter worker thread. There are still some known issues that need to be addressed. For example: decode frame parallel videos with segmentation enabled is not right sometimes. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5. Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
2015-01-23Revert "Merge branch 'frame-parallel' to enable frame parallel decode in ↵Johann
master branch." This reverts commit bde04ce5039cbcf86c8b34bdb4127e18d7e1d0c7 Change-Id: I053dae04c761b04a36dc239558503905a14d2470
2015-01-22Merge branch 'frame-parallel' to enable frame parallel decode in master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. VP9 frame parallel decode is >30% faster than serial decode with tile parallel threading which will makes devices play 1080P VP9 videos more easily. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
2015-01-16vp9_ethread: add parallel loopfilterYunqing Wang
1. Added row-based loopfilter in encoder; 2. Moved common multi-threaded loopfilter functions from decoder to common; 3. Merged multi-threaded loopfilter code, and made encoder/ decoder call same function to reduce code duplication. Encoder tests showed that 1% - 2% speedup was seen for good-quality 2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6% speedup using 4 threads were seen for real-time mode(at speed 7). Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
2014-12-08Add error handling for frame parallel decode and unit test for that.hkuang
Change-Id: I6e309e11f1641618d2424b7a2c0fe744b8974dec
2014-10-22Implement frame parallel decode for VP9.Hangyu Kuang
Using 4 threads, frame parallel decode is ~3x faster than single thread decode and around 30% faster than tile parallel decode for frame parallel encoded video on both Android and desktop with 4 threads. Decode speed is scalable to threads too which means decode could be even faster with more threads. Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
2014-10-16vp9: store TileWorkerData allocations separatelyJames Zern
move them from VP9Worker::data[12] to allow the structure to be reused a bit more naturally by the multi-threaded loopfilter. Change-Id: I31b49c9e93ca744fd7f6d6ed8696671188fb2c1d
2014-09-09vp9: wait for key/intra-only frame after corruptionJames Zern
don't bother decoding any further after receiving an earlier decode error until a key/intra-only frame is encountered. Change-Id: I381917b70d7a9e6f8d6de42e3d181bb113a4cec4
2014-08-18[spatial svc]Add a few different encode frame tests.Minghai Shang
1. Clean the code for encode frame tests 2. Add encode w/ and w/o alt reference frame test 3. Add encode SNR layers test 4. Add encode multiple layers but decode partial layers test Change-Id: Ibd2c9bc02525db584a6f931a98405f2d851b3cd6
2014-08-08Common encode/decode function to get reference frameAdrian Grange
Replaced encoder and decoder functions to get a pointer to a reference frame with a common function, vp9_get_ref_frame, and simplified it. Change-Id: Icb206fcce8caace3bfd1db3dbfa318dde79043ee
2014-07-11Move vp9_thread.* to common.hkuang
Prepare for frame parallel decoding, the reference count buffers need to be protected by mutex. Move vp9_thread.* to common folder so that those buffers could use cross-platform mutex from vp9_thread.*. (cherry picked from commit 337e8015c9deaf8ab7e8d0c3c132160a77dd1590) Change-Id: I0587a08447925f4554d7788686a31483c2ae3f37
2014-07-09Merge "Move vp9_thread.* to common."hkuang
2014-07-08Fix decoder handling of intra-only framesAdrian Grange
This patch fixes bug 633: https://code.google.com/p/webm/issues/detail?id=633 The first decoded frame does not have to be a keyframe, it could be an inter-frame that is coded intra-only. This patch fixes the handling of intra-only frames. A test vector has also been added that encodes 3 intra-only frames at the start of the clip. The test vector was generated using the code in the following patch: https://gerrit.chromium.org/gerrit/#/c/70680/ Change-Id: Ib40b1dbf91aae2bc047e23c626eaef09d1860147
2014-07-07Move vp9_thread.* to common.hkuang
Prepare for frame parallel decoding, the reference count buffers need to be protected by mutex. Move vp9_thread.* to common folder so that those buffers could use cross-platform mutex from vp9_thread.*. Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154
2014-07-02Seperate the frame buffers from VP9 encoder/decoder structure.hkuang
Prepare for frame parallel decoding, the frame buffers must be separated from the encoder and decoder structure, while the encoder and decoder will hold the pointer of the BufferPool. Change-Id: I172c78f876e41fb5aea11be5f632adadf2a6f466