summaryrefslogtreecommitdiff
path: root/vp9/encoder/vp9_firstpass.c
AgeCommit message (Collapse)Author
2015-02-23Account for rate error in GF group Q calculation.paulwilkins
When GF group adaptive maxQ is enabled this patch accounts somewhat for accumulated error in the rate control. This improves accuracy quite a bit on many clips especially when there is overshoot. Examples when the overshoot and undershoot command line parameters are set to 100: Hall @ 1200 overshoot is reduced from 67-24%. Akiyo @ 400 undershoot is reduced from 28%-15%. Setting a lower value for undershoot or overshoot still reduces the error further. Impact on metrics is mixed with some gains in average psnr but generally a little lower (e.g. 0.5%) on overall and ssim. The GF group adaptation is still off by default in this patch. Compared to with the head, enabling this mode now gives big average psnr gains on the YT sets (e.g. YT_HD >11.2%), a drop in overall PSNR (YT-HD 3.9%) and a smaller drop or neutral for SSIM. Change-Id: If4b32cd0740d3fb941317b374f9c2951954eee90
2015-02-19Fix control string in firstpass stats fprintfAdrian Grange
20 items in the control string but only 19 arguments. Change-Id: I51dab9aa1c58c653b52395005a9cb41f09feb484
2015-02-10Auto-adaptive encoder frame resizing logicAdrian Grange
Note: This feature is still in development. Add an option for the encoder to decide the resolution at which to encode each frame. Each KF/GF/ARF goup is tested to see if it would be better encoded at a lower resolution. At present, each KF/GF/ARF is coded first at full-size and if the coded size exceeds a threshold (twice target data rate) at the maximum active Q then the entire group is encoded at lower resolution. This feature is enabled in vpxenc by setting: --resize-allowed=1 In addition, if the vpxenc command line also specifies valid frame dimensions using: --resize-width=XXXX & --resize_height=YYYY then *all* frames will be encoded at this resolution. Change-Id: I13f341e0a82512f9e84e144e0f3b5aed8a65402b
2015-01-30Try again to merge branch 'frame-parallel' into master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. Current frame parallel decode will only speed up the decoding for frame parallel encoded videos. For non frame parallel encoded videos, frame parallel decode is slower than serial decode due to lack of loopfilter worker thread. There are still some known issues that need to be addressed. For example: decode frame parallel videos with segmentation enabled is not right sometimes. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5. Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
2015-01-26Adjust active maxq for GF groups.Paul Wilkins
Currently disabled by default: enabled using #define GROUP_ADAPTIVE_MAXQ In this patch the active max Q is adjusted for each GF group based on the vbr bit allocation and raw first pass group error. This will tend to give a lower q for easy sections and a higher value for very hard sections. As such it is expected to improve quality in some of the easier sections where quality issues have been reported. This change tends to hurt overall psnr but help average psnr. SSIM also shows a small gain. Average results for derf, yt, std-hd and yt-hd test sets were as follows (%change for average psnr, overal psnr and ssim):- derf +0.291, - 0.252, -0.021 yt +6.466, -1.436, +0.552 std-hd +0.490, +0.014, +0.380 yt-hd +5.565, - 1.573, +0.099 Change-Id: Icc015499cebbf2a45054a05e8e31f3dfb43f944a
2015-01-23Revert "Merge branch 'frame-parallel' to enable frame parallel decode in ↵Johann
master branch." This reverts commit bde04ce5039cbcf86c8b34bdb4127e18d7e1d0c7 Change-Id: I053dae04c761b04a36dc239558503905a14d2470
2015-01-22Merge branch 'frame-parallel' to enable frame parallel decode in master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. VP9 frame parallel decode is >30% faster than serial decode with tile parallel threading which will makes devices play 1080P VP9 videos more easily. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
2015-01-21Bug when last group before forced key frame is short.Paul Wilkins
Just before a forced key frame we often get a foreshortened arf/gf group. In such a case, we do not want to update rc->last_boosted_qindex, which is used to define the Q range for the forced key frame itself. This gives a small average metrics gain for the YT and YT-HD sets (eg. YT SSIM +0.141%). Change-Id: Ie06698bc4f249e87183b8f8fb27ff8f3fde216d9
2014-11-24vp9_ethread: modify VP9_COMP structureYunqing Wang
This patch modified struct VP9_COMP. Created a struct ThreadData to include data that need to be copied for each thread. In multiple thread case, one thread processes one tile. all threads share one copy of VP9_COMP, (refer to VP9_COMP *cpi in the code) but each thread has its own copy of ThreadData, (refer to ThreadData *td in the code). Therefore, within the scope of encode_tiles(), both cpi and td need to be passed as function parameters. In single thread case, the FRAME_COUNTS pointer in ThreadData points to "counts" in VP9_COMMON. Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
2014-11-20Add adaptive midpoint for AQ1.Paul Wilkins
Make the midpoint variance used in AQ mode 1 segmentation depend on the overall complexity of the frame in two pass. Change-Id: I452814ec57f7a32352e41bb250e78066abe952dd
2014-11-20Merge "Fix bug in calculating number of mbs with scaling."Paul Wilkins
2014-11-20Fix bug in calculating number of mbs with scaling.Paul Wilkins
Correct calculation of number of mbs in two pass code when frame resizing is enabled. Always use initial number of mbs if scaling is enabled, as this is what was used in the first pass. Change-Id: I49a4280ab5a8b1000efcc157a449a081cbb6d410
2014-11-20Revert "vp9_ethread: include a pointer to mb in VP9_COMP"Yunqing Wang
This reverts commit 6906d218ddd1af97228a797f4558e402231d94f1. Another way will be used to handle mb struct. Change-Id: Ic1111a46b2b1ee00f8f9e3fcd4cf3eb6030b2dc4
2014-11-14vp9_ethread: include a pointer to mb in VP9_COMPYunqing Wang
Modified VP9_COMP struct to include MACROBLOCK *mb. This change makes it feasible in multi-thread case to allocate a mb for each thread. Change-Id: I624d6d1aa9c132362200753e5d90b581b1738d6e
2014-11-13Merge "Prepare for dynamic frame resizing in the recode loop"Adrian Grange
2014-11-13Prepare for dynamic frame resizing in the recode loopAdrian Grange
Prepare for the introduction of frame-size change logic into the recode loop. Separated the speed dependent features into separate static and dynamic parts, the latter being those features that are dependent on the frame size. Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313
2014-11-13Fix 32 bit build emms problem.Paul Wilkins
Add extra vp9_clear_system_state() calls to fix double / mmx issue introduced into first pass code for 32 bit builds. Change-Id: I84cd2986b80d83650a091ab25c43755efeb82e03
2014-11-07AQ1 - remove first pass weights.Paul Wilkins
Removed redundant weighting function tied for AQ1 from first pass code. Improvment in baseline AQ1 results:- Derf opsnr +0.142% SSIm +0.258% YT opsnr +0.173% SSIm +0.3% Change-Id: I16ef91caf2d7f302cd5940cc5e2626d48ebcb212
2014-11-06Add intra complexity and brightness weight to first pass.Paul Wilkins
The aim of this patch is to apply a positive weighting to frames that have a significant number of blocks that are of low spatial complexity and are dark. The rationale behind this is that artifacts tend to be more visible in such frames. In this patch the weight is only applied in regard to the distribution of bits between frames. Hence if all the frames share similar characteristics (as is the case for most of our short test clips) there will be little or no net effect. However, the effect can be seen on some longer form test content. For example Tears of steel baseline test: 2323.09 Kbit/s opsnr 39.915 ssim 74.729 With this patch:- 2213.34 Kbit/s opsnr 39.963 ssim 74.808 (Sligtly better metrics and about 5% smaller) The weighting may well need some further tuning along side changes to the aq modes. Change-Id: Ieced379bca03938166ab87b2b97f55d94948904c
2014-10-28Relax maximum Q for extreme overshoot.Paul Wilkins
Added code to relax the active maximum Q in response to extreme local overshoot to reduce bandwidth peaks. The impact is small in metrics terms, but it this helps reduce bandwidth spikes and overall overshoot in a number of clips in our tests sets (especially the YT test set). In particular this should help prevent very big spikes where a clip is mainly easy but has a short hard section. In such a case a choice of maximum Q for the clip as a whole may allow us to hit the overall target rate but give some extreme spikes. The chunked encoding in YT mitigates this problem but it can show up where a longer clip is coded as a single chunk. Change-Id: I213d09950ccb8489d10adf00fda1e53235b39203
2014-10-23Move frame re-sizing into the recode loopAdrian Grange
The point at which frames are scaled to their coded dimensions is moved into the re-code loop. This is in preparation for a further patch that will add logic into the re-code loop to reduce the coded frame size if the encoder is struggling to hit the target data rate at the native frame size. Change-Id: Ie4131f5ec6fb93148879f6ce96123296442bf2d1
2014-10-17Alter adjustment of two pass GF/ARF boost with Q.Paul Wilkins
Delete gfboost_qadjust() and move Q based adjustment into calc_frame_boost(). Also remove clamping. Making the adjustment here means that it influences not just the boost level but also the selection of the GF/ARF interval. This change gives a small average gain in PSNR but larger gains in SSIM, especially for harder std-hd set (1.5%) Change-Id: I3aa81b8feccaeff93d915e19fb9cf5cd64c86327
2014-10-16[spatial svc]Another workaround to avoid using prev_miMinghai Shang
We encode a empty invisible frame in front of the base layer frame to avoid using prev_mi. Since there's a restriction for reference frame scaling factor, we have to make it smaller and smaller gradually until its size is 16x16. Change remerged. Change-Id: I9efab38bba7da86e056fbe8f663e711c5df38449
2014-10-16Revert "Move input frame scaling into the recode loop"Paul Wilkins
This reverts commit 452dc21500a2339ee685cb28efbd2af1b856ea12. This change has introduced a significant quality regression on content with forced key frames. (e.g. the YT and yt-hd set). It is most noticeable in static content where the kf bits dominate. Here, despite key frames being apparently coded at the same Q, there is a drop in all metrics of ~20% (e.g clXR and BFa0). Change-Id: Iba14cc61778c0846fa0a59c33c55a9fc49512cb4
2014-10-16Revert "[spatial svc]Another workaround to avoid using prev_mi"Paul Wilkins
This reverts commit c113457af9880b8e15a36cdaabfd414d1c245693. Temporary revert to allow clean revert of another commit. Change-Id: Ia9b7b755e6c48e1b6e383329f121fef175a24b27
2014-10-14[spatial svc]Another workaround to avoid using prev_miMinghai Shang
We encode a empty invisible frame in front of the base layer frame to avoid using prev_mi. Since there's a restriction for reference frame scaling factor, we have to make it smaller and smaller gradually until its size is 16x16. Change-Id: I60b680314e33a60b4093cafc296465ee18169c19
2014-10-14Move input frame scaling into the recode loopAdrian Grange
Move the point at which input frames are scaled into the recode loop. This will allow us to change the coded frame size dynamically in response to previous attempts to encode the frame at a higher resolution. A following patch will implement a scheme for resizing the frame in the recode loop. Change-Id: I6a59c02d6ac1626512edad6de8b60063b79433e6
2014-10-13Clamp rate error estimate.Paul Wilkins
Add back clamp which ensures that the Q adaptation is turned off when the over_shoot_pct and under_shoot_pct parameters are set to 100. Change-Id: Id0161b114d39a3029cd3eb28020caab0c3914922
2014-10-13Add adaptation option for VBR.Paul Wilkins
Allow min and maxQ to creep when the undershoot or overshoot exceeds thresholds controlled by the command line under_shoot_pct and over_shoot_pct values. Default is 100%,100% which ~disables adaptation. Derf results for example undershoot% / overshoot%:- Head:- Mean abs (%rate error) = 14.4% This check in:- 25%/25% - Mean abs (%rate error) = 6.7% PSNR hit -1% SSIM -0.1% 5% / 5% - Mean abs (%rate error) = 2.2% PSNR hit -3.3% SSIM - 1.1% Most of the remaining error and most of the quality hit is at extreme data rates. The adaptation code still has an exception for material that is in effect static so that we don't over adjust and over spend on YT slide show type content. (Rebase of If25a2449a415449c150acff23df713e9598d64c9 to resolve a auto-merge error) Change-Id: Iec4e1613ef0d067454751d8220edb7058dfbd816
2014-10-10Revert "Add adaptation option for VBR."Alex Converse
This reverts commit 869d4ca51957614dcf5093ebb9e322cc8a8405ca. This breaks the build via conflict with e18edd5eb651f9b7563cbd829744807402bfe0d8. Change-Id: If544b99e367a449452834eb8cce600f58c34ec0d
2014-10-10Merge "Add adaptation option for VBR."Paul Wilkins
2014-10-10Add adaptation option for VBR.Paul Wilkins
Allow min and maxQ to creep when the undershoot or overshoot exceeds thresholds controlled by the command line under_shoot_pct and over_shoot_pct values. Default is 100%,100% which ~disables adaptation. Derf results for example undershoot% / overshoot%:- Head:- Mean abs (%rate error) = 14.4% This check in:- 25%/25% - Mean abs (%rate error) = 6.7% PSNR hit -1% SSIM -0.1% 5% / 5% - Mean abs (%rate error) = 2.2% PSNR hit -3.3% SSIM - 1.1% Most of the remaining error and most of the quality hit is at extreme data rates. The adaptation code still has an exception for material that is in effect static so that we don't over adjust and over spend on YT slide show type content. Change-Id: If25a2449a415449c150acff23df713e9598d64c9
2014-10-09Merge "Rename highbitdepth functions to use highbd prefix"Deb Mukherjee
2014-10-09Rename highbitdepth functions to use highbd prefixDeb Mukherjee
Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
2014-10-08Allow mode search breakout at very low prediction errorsYunqing Wang
In model_rd_for_sb function, the spatial domain SSE and variance are checked to see if transform coefficients are quantized to 0. Besides that, this patch adds another set of thresholds that are much more strict. These thresholds are used to conduct a partition block level check to measure if all its TX blocks are skippable for YUV planes. If it is true, x->skip is set for this partition block, and thus its mode search is terminated. This speeds up the encoding at very low prediction error case, such as screen sharing application. This patch covers what rd_encode_breakout_test() does, so that function is removed. Borg test at speed 3 shows: For stdhd set, psnr: +0.008%, ssim: +0.014%; For derf set, psnr: +0.018%, ssim: +0.025%. No noticeable speed change. Change-Id: I4e5f15cf10016a282a68e35175ff854b28195944
2014-10-06Improve two pass VBR accuracy.Paul Wilkins
Adjustments to the GF interval choice and minimum boost. Adjustment to the calculation of 2 pass worst q. Compared to 09/29 head there is metrics hit on derf of (-0.123%,-0.191%) Compared to the September 29 head and a baseline on September 18 baseline the accuracy of the VBR rate control measured on the derf set is as follows:- Mean error % / Mean abs(error %) Sept 18 baseline (-7.0% / 14.76%) Sept 29 head (-15.7%, 19.8%) This check in (-1.5% / 14.4%) The mean undershoot is reduced slightly but the worst case overshoot on e.g. harbour/highway is increased. This will be addressed in a later patch. Change-Id: Iffd9b0ab7432a131c98fbaaa82d1e5b40be72b58
2014-09-26Two pass rc changes.Paul Wilkins
Adjustments to the GF interval choice and minimum boost. Change-Id: I29951621484e1ee339adfb73ab430aa65f310ad8
2014-09-25Adds various high bit-depth encode functionsDeb Mukherjee
Change-Id: I6f67b171022bbc8199c6d674190b57f6bab1b62f
2014-09-19Remove mi_grid_* structures.hkuang
mi_grid_* are arrays of pointer to pointer. They save the pointers that point to the MIs in cm->mi. But they are unnecessary and complicated. The original goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer inside MODE_INFO_t, same goal could be achieved. This commit totally removes the mi_grid_* structures. But there are still many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit will do on-demand MODE_INFO_t allocation in order to save these memories. Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
2014-09-18Merge "[spatial svc] Use same golden frame for all temporal layers"Minghai Shang
2014-09-18[spatial svc] Use same golden frame for all temporal layersMinghai Shang
Overhead goes down from 8% to 3% for 1080 60p Change-Id: Idf3e5ca8712402a914a8cb79df17d3cdab63b163
2014-09-18Substantial reworking of code for arf and kf groups.Paul Wilkins
Substantial restructuring of the way we estimate the rate of decay in prediction quality and determine the arf interval and amount of boost used. Also other changes to support moving to a lower first pass Q which exposes some new features and allows us to better distinguish genuinely static blocks from low motion or noisy blocks. Net gains now visible on all the test sets with std-hd PSNR up 1.87%. There are still some bad outlier cases but most of these are low motion or slide show type content where the metrics are already high at any given rate. The best + case is up by more than 10%. Change-Id: I18e25170053bdf3188f493ff8062f48a74515815
2014-09-16Adds high bitdepth quantization functionsDeb Mukherjee
Adds various high bitdepth quantization functions. Change-Id: I36fc0bf75a1bd15128ed271df8723de0ac134b0c
2014-09-10[spatial svc]Add golden frame to first pass rate controlMinghai Shang
Change-Id: If3035f0e7dfcfe88c4bbf4eec66761e070476df0
2014-09-02[svc] Temporal svc with two pass rate controlMinghai Shang
It's built based on current spatial svc code. We only support one spatial two temporal layers at this time. Change-Id: I1fdc8584354b910331e626bfae60473b3b701ba1
2014-09-02Merge "Removing lookup_next_frame_stats()."Dmitry Kovalev
2014-08-25Merge "Replacing int_mv with MV inside the first pass code."Dmitry Kovalev
2014-08-22Removing source field from VP9_COMP.Dmitry Kovalev
Using local variables instead. Change-Id: I68737f7e392b81492ffd3ef2c2ff9afbf55fb097
2014-08-22Replacing int_mv with MV inside the first pass code.Dmitry Kovalev
Change-Id: Ia3be6b5a18e1ff6cc5c5f4d37e4a5d0972388308
2014-08-21Removing lookup_next_frame_stats().Dmitry Kovalev
Change-Id: Ib6b51b3d106de38a9ccbcd4a835025db185877e9