summaryrefslogtreecommitdiff
path: root/vp9/encoder/vp9_encoder.h
AgeCommit message (Collapse)Author
2015-01-30Try again to merge branch 'frame-parallel' into master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. Current frame parallel decode will only speed up the decoding for frame parallel encoded videos. For non frame parallel encoded videos, frame parallel decode is slower than serial decode due to lack of loopfilter worker thread. There are still some known issues that need to be addressed. For example: decode frame parallel videos with segmentation enabled is not right sometimes. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5. Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
2015-01-23Revert "Merge branch 'frame-parallel' to enable frame parallel decode in ↵Johann
master branch." This reverts commit bde04ce5039cbcf86c8b34bdb4127e18d7e1d0c7 Change-Id: I053dae04c761b04a36dc239558503905a14d2470
2015-01-22Merge branch 'frame-parallel' to enable frame parallel decode in master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. VP9 frame parallel decode is >30% faster than serial decode with tile parallel threading which will makes devices play 1080P VP9 videos more easily. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
2015-01-16vp9_ethread: add parallel loopfilterYunqing Wang
1. Added row-based loopfilter in encoder; 2. Moved common multi-threaded loopfilter functions from decoder to common; 3. Merged multi-threaded loopfilter code, and made encoder/ decoder call same function to reduce code duplication. Encoder tests showed that 1% - 2% speedup was seen for good-quality 2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6% speedup using 4 threads were seen for real-time mode(at speed 7). Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
2015-01-14Add encoder control for setting color spaceYaowu Xu
This commit adds encoder side control for vp9 to set color space info in the output compressed bitstream. It also amends the "vp9_encoder_params_get_to_decoder" test to verify the correct color space information is passed from the encoder end to decoder end. Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650
2015-01-13Merge "Added plumbing for setting color space"Yaowu Xu
2015-01-09Added plumbing for setting color spaceYaowu Xu
Change-Id: If64052cc6e404abc8a64a889f42930d14fad21d3
2015-01-07Use 64 bit to accumulate frame sse.Paul Wilkins
When testing frame sse to choose a loop filter value and when checking ambient error in kf Q selection, use 64 bit values for accumulating the sse, to avoid risk of overflow for large image formats. Change-Id: I03765d16c843d0ade61a45b0cd46312472697e57
2014-12-19resolve visual studio warnings around initializersJim Bankoski
Change-Id: Id2ad4fb24242f7ca8fa7a152f0889fded4113613
2014-12-18Remove mode dependent zbin boost.Paul Wilkins
Initial patch to remove get_zbin_mode_boost() and cpi->zbin_mode_boost. For now sets a dummy value of 0 for zbin extra pending a further clean up patch. Change-Id: I64a1e1eca2d39baa8ffb0871b515a0be05c9a6af
2014-12-12vp9: move encoder-only member from commonJames Zern
allow_comp_inter_inter VP9_COMMON -> VP9_COMP Change-Id: I6d9dc25d1cdd7e2ab62f5be69cd9fa883d21dbb6
2014-12-09Substantial restructuring of AQ mode 2.Paul Wilkins
The restructure moves the decision into the rd pick modes loop and makes a decision based at the 16x16 block level instead of only the 64x64 level. This gives finer granularity and better visual results on the clips I have tested. Metrics results are worse than the old AQ2 especially for PSNR and this mode now falls between AQ0 and AQ1 in terms of visual impact and metrics results. Further tuning of this to follow. It should be noted that if there are multiple iterations of the recode loop the segment for a MB could change in each loop if the previous loop causes a change in the complexity / variance bin of the block. Also where a block gets a delta Q this will alter the rd multiplier for this block in subsequent recode iterations and frames where the segmentation is applied. Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7
2014-12-04vp9_ethread: the tile-based multi-threaded encoderYunqing Wang
Currently, VP9 supports column-tile encoding, which allows a frame to be encoded in multiple column tiles independently. The number of column tiles are set by encoder option "--tile-columns". This provides a way to encode a frame in parallel. Based on previous set of patches, this patch implemented the tile- based multi-threaded encoder. Each thread processes one or more tiles. Usage: For HD clips: --tile-columns=2 --threads=1/2/3/4 While using 4 threads, tests showed that the encoder achieved 2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at realtime speed 5. Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4
2014-11-25vp9_ethread: calculate and save the tok starting address for tilesYunqing Wang
Each tile's tok starting address is calculated before the encoding process. These addresses are stored so that the same calculation won't be done again in packing bit stream. Change-Id: I0a3be0301f002260c19a850303f2f73ebc47aa50
2014-11-24vp9_ethread: modify VP9_COMP structureYunqing Wang
This patch modified struct VP9_COMP. Created a struct ThreadData to include data that need to be copied for each thread. In multiple thread case, one thread processes one tile. all threads share one copy of VP9_COMP, (refer to VP9_COMP *cpi in the code) but each thread has its own copy of ThreadData, (refer to ThreadData *td in the code). Therefore, within the scope of encode_tiles(), both cpi and td need to be passed as function parameters. In single thread case, the FRAME_COUNTS pointer in ThreadData points to "counts" in VP9_COMMON. Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
2014-11-20Revert "vp9_ethread: include a pointer to mb in VP9_COMP"Yunqing Wang
This reverts commit 6906d218ddd1af97228a797f4558e402231d94f1. Another way will be used to handle mb struct. Change-Id: Ic1111a46b2b1ee00f8f9e3fcd4cf3eb6030b2dc4
2014-11-17change to call vp9_refining_search_sad() directlyYaowu Xu
The function pointer in compressor instance does not change, so this commit changes to call the function directly. Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5
2014-11-14vp9_ethread: combine encoder counts in separate structYunqing Wang
Several frame counters in encoder are updated at SB level. Combine those counters and put them in a separate struct, which allows us to allocate one copy for each thread. Change-Id: I00366296a13c0ada4d8fa12f5e07728388b6cab7
2014-11-14vp9_ethread: include a pointer to mb in VP9_COMPYunqing Wang
Modified VP9_COMP struct to include MACROBLOCK *mb. This change makes it feasible in multi-thread case to allocate a mb for each thread. Change-Id: I624d6d1aa9c132362200753e5d90b581b1738d6e
2014-11-13Prepare for dynamic frame resizing in the recode loopAdrian Grange
Prepare for the introduction of frame-size change logic into the recode loop. Separated the speed dependent features into separate static and dynamic parts, the latter being those features that are dependent on the frame size. Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313
2014-11-04[spatial svc] Make spatial svc working for one pass rate controlMinghai Shang
Change-Id: Ibd9114485c3d747f9d148f64f706bf873ea473ac
2014-10-28Merge "Add a new control of golden frame boost in CBR mode"Yaowu Xu
2014-10-27Refactor encoder tile data structureJingning Han
Make the common tile info as one element in the encoder tile data struct. Change-Id: I8c474b4ba67ee3e2c86ab164f353ff71ea9992be
2014-10-27Add a new control of golden frame boost in CBR modeYaowu Xu
0 means that golden boost is off, and uses average frame target rate, a non-zero number means the percentage of boost over average frame bitrate is given initially to golden frames in CBR mode. Change-Id: If4334fe2cc424b65ae0cce27f71b5561bf1e577d
2014-10-27Merge "Add a new control of max bitrate for inter frame"Yaowu Xu
2014-10-24Merge "Tile based adaptive mode search in RD loop"Jingning Han
2014-10-24Add a new control of max bitrate for inter frameYaowu Xu
Change-Id: I205de3611622cff7f751ea8baf9f82784581730a
2014-10-24Tile based adaptive mode search in RD loopJingning Han
Make the spatially adaptive mode search in rate-distortion optimization loop inter tile independent. Experiments suggest that this does not significantly change the coding staticstics. Single tile, speed 3: pedestrian_area 1080p 1500 kbps 59192 b/f, 40.611 dB, 101689 ms blue_sky 1080p 1500 kbps 58505 b/f, 36.347 dB, 62458 ms mobile_cal 720p 1000 kbps 13335 b/f, 35.646 dB, 45655 ms as compared to 4 column tiles, speed 3: pedestrian_area 1080p 1500 kbps 59329 b/f, 40.597 dB, 101917 ms blue_sky 1080p 1500 kbps 58712 b/f, 36.320 dB, 62693 ms mobile_cal 720p 1000 kbps 13191 b/f, 35.485 dB, 45319 ms Change-Id: I35c6e1e0a859fece8f4145dec28623cbc6a12325
2014-10-23Move frame re-sizing into the recode loopAdrian Grange
The point at which frames are scaled to their coded dimensions is moved into the re-code loop. This is in preparation for a further patch that will add logic into the re-code loop to reduce the coded frame size if the encoder is struggling to hit the target data rate at the native frame size. Change-Id: Ie4131f5ec6fb93148879f6ce96123296442bf2d1
2014-10-21Merge "Extend --auto-alt-ref so it can enable multi-alt ref."Paul Wilkins
2014-10-20Extend --auto-alt-ref so it can enable multi-alt ref.Paul Wilkins
Extend --auto-alt-ref from parameter so we can use it to turn multi-arf on and off from the command line. For now the range is 0-off, 1-on, 2-multi-arf on. Rename play_alternate to enable_auto_arf Change-Id: Id7b64407cfbe76ba0090a83b588a03e22a240386
2014-10-17Remove the dependency in token storing locationsYunqing Wang
Currently, the tokens for a tile are stored immediately after its preceding tile, which causes a dependency. This is unnecessary since we always allocate enough memory for tokens. Removing the dependency allows token writing done in parallel. This patch doesn't change encoding result. Change-Id: I7365a6e5e2c2833eb14377c37e1503c9d0f26543
2014-10-16Revert "Move input frame scaling into the recode loop"Paul Wilkins
This reverts commit 452dc21500a2339ee685cb28efbd2af1b856ea12. This change has introduced a significant quality regression on content with forced key frames. (e.g. the YT and yt-hd set). It is most noticeable in static content where the kf bits dominate. Here, despite key frames being apparently coded at the same Q, there is a drop in all metrics of ~20% (e.g clXR and BFa0). Change-Id: Iba14cc61778c0846fa0a59c33c55a9fc49512cb4
2014-10-14Move input frame scaling into the recode loopAdrian Grange
Move the point at which input frames are scaled into the recode loop. This will allow us to change the coded frame size dynamically in response to previous attempts to encode the frame at a higher resolution. A following patch will implement a scheme for resizing the frame in the recode loop. Change-Id: I6a59c02d6ac1626512edad6de8b60063b79433e6
2014-10-09Merge "Subpel search cleanups and enhancements"Deb Mukherjee
2014-10-08Subpel search cleanups and enhancementsDeb Mukherjee
- Some fixes to surface fit. - Returns variance function as cost rather than sad in the pattern search and diamond search functions. Only vp9_pattern_search_sad function used in bigdia search uses sad as integer 1-away costs. - Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+. Results: derf [Speed 3]: About +0.036% in coding efficiency without any discernible speed loss. derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency. derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency. Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
2014-10-07Remove redundant header file from vp9_encoder.hJingning Han
Change-Id: Ia212390cf8d36db5436bb0f0e1b696f70066341a
2014-09-30Merge "Skip the partition search for still frames"Yunqing Wang
2014-09-26Skip the partition search for still framesYunqing Wang
This patch re-enabled the feature in Pengchong's patch (commit 12861260732a4fd5f6b667ce9d5105dc9b606eda). Originally, it was turned on while use_lastframe_partitioning > 0(not used anymore). Now it was added as a feature, and turned on while speed >= 2. As described in the original patch, this feature helps speed up the slideshows in YouTube. Change-Id: I1b0f18d65da1ee1c8d1e117dabba910c5207c471
2014-09-23Adds high bit-depth psnr/sse functionsDeb Mukherjee
Also adds some miscellaneous high bit-depth setup functions. Change-Id: I66488b08a5a2a8cb9518ca10497cf1c1501ceded
2014-09-11Adds high bitdepth transform functions and testsDeb Mukherjee
Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
2014-09-03Merge "[svc] Temporal svc with two pass rate control"Minghai Shang
2014-09-02Merge "Adds config opt for highbitdepth + misc. vpx"Deb Mukherjee
2014-09-02Adds config opt for highbitdepth + misc. vpxDeb Mukherjee
Adds config parameter vp9_highbitdepth, to support highbitdepth profiles. Also includes most vpx level high bit-depth functions. However encode/decode in the highbitdepth profiles will not work until the rest of the code is in place. Change-Id: I34c53b253c38873611057a6cbc89a1361b8985a6
2014-09-02[svc] Temporal svc with two pass rate controlMinghai Shang
It's built based on current spatial svc code. We only support one spatial two temporal layers at this time. Change-Id: I1fdc8584354b910331e626bfae60473b3b701ba1
2014-09-02Merge "Removing 'frames' field from VP9_COMP."Dmitry Kovalev
2014-08-29Removing dummy_packing member from VP9_COMP.Dmitry Kovalev
Change-Id: I571ce84c97087f8a1a36a10058393bfdcefbf72a
2014-08-29Minor fix in vp9_encoder.hYunqing Wang
Added the missing "int". Change-Id: I7c8af3dee700837b40f010d53e1431a59370ae3a
2014-08-28Merge "Removing unused arnr_type from VP9EncoderConfig and vp9_extracfg."Dmitry Kovalev
2014-08-28Merge "Updates vp9_pattern search to return integer sads"Deb Mukherjee