summaryrefslogtreecommitdiff
path: root/vp9/encoder/vp9_encodemb.c
AgeCommit message (Collapse)Author
2013-04-18Fixing member names inside TOKENVALUE and TOKENEXTRA structs.Dmitry Kovalev
Change-Id: I183ec5819d4d80966c92db36db75b8c3be0d381d
2013-04-18Merge "Make the use of pred buffers consistent in MB/SB" into experimentalJingning Han
2013-04-18Make the use of pred buffers consistent in MB/SBJingning Han
Use in-place buffers (dst of MACROBLOCKD) for macroblock prediction. This makes the macroblock buffer handling consistent with those of superblock. Remove predictor buffer MACROBLOCKD. Change-Id: Id1bcd898961097b1e6230c10f0130753a59fc6df
2013-04-16Replacing VP9_COMBINEENTROPYCONTEXTS macro with function.Dmitry Kovalev
Change-Id: I3bbc31840af69481e1d9bb4427c9ee25abf82946
2013-04-11Remove subtract_mb* functions.Ronald S. Bultje
Use subtract_sb* instead. Change-Id: I3f34140ab97061063a4452945347ef1fe37e13d1
2013-04-11Remove unused macroblock versions of reconstruction functions.Ronald S. Bultje
More specifically, remove vp9_quantize_mb*, vp9_optimize_mb*, vp9_inverse_transform_mb* and vp9_transform_mb*. Instead, use the generic _sb* functions that take a size argument, and call them with BLOCK_SIZE_MB16X16. Change-Id: I33024afea95d3a23ffbc1df7da426e4645110f29
2013-04-10Make usage of sb_type independent of literal values.Ronald S. Bultje
Change-Id: I0d12f9ef9d960df0172a1377f8e5236eb6d90492
2013-04-09Make SB coding size-independent.Ronald S. Bultje
Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code gives identical encoder results before and after. There are a few macros for rectangular block sizes under the sbsegment experiment; this experiment is not yet functional and should not yet be used. Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
2013-04-04Move EOB to per-plane dataJohn Koleszar
Continue migrating data from BLOCKD/MACROBLOCKD to the per-plane structures. Change-Id: Ibbfa68d6da438d32dcbe8df68245ee28b0a2fa2c
2013-04-04Move qcoeff, dqcoeff from BLOCKD to per-plane dataJohn Koleszar
Start grouping data per-plane, as part of refactoring to support additional planes, and chroma planes with other-than 4:2:0 subsampling. Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23
2013-03-28Framework changes in nzc to allow more flexibilityDeb Mukherjee
The patch adds the flexibility to use standard EOB based coding on smaller block sizes and nzc based coding on larger blocksizes. The tx-sizes that use nzc based coding and those that use EOB based coding are controlled by a function get_nzc_used(). By default, this function uses nzc based coding for 16x16 and 32x32 transform blocks, which seem to bridge the performance gap substantially. All sets are now lower by 0.5% to 0.7%, as opposed to ~1.8% before. Change-Id: I06abed3df57b52d241ea1f51b0d571c71e38fd0b
2013-03-26Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs.Ronald S. Bultje
These are mostly just for experimental purposes. I saw small gains (in the 0.1% range) when playing with this on derf. Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
2013-03-26Redo banding for all transforms.Ronald S. Bultje
Now that the first AC coefficient in both directions use the same DC as their context, there no longer is a purpose in letting both have their own band. Merging these two bands allows us to split bands for some of the very high-frequency AC bands. In addition, I'm redoing the banding for the 1D-ADST col/row scans. I don't think the old banding made any sense at all (it merged the last coefficient of the first row/col in the same band as the first two of the second row/col), which was clearly an oversight from the band being applied in scan-order (rather than in their actual position). Now, coefficients at the same position will be in the same band, regardless what scan order is used. I think this makes most sense for the purpose of banding, which is basically "predict energy for this coefficient depending on the energy of context coefficients" (i.e. pt). After full re-training, together with previous patch, derf gains about 1.2-1.3%, and hd/stdhd gain about 0.9-1.0%. Change-Id: I7a0cc12ba724e88b278034113cb4adaaebf87e0c
2013-03-26Use above/left (instead of previous in scan-order) as token context.Ronald S. Bultje
Pearson correlation for above or left is significantly higher than for previous-in-scan-order (absolute values depend on position in scan, but in general, we gain about 0.1-0.2 by using either above or left; using both basically just makes this even better). For eob branch skipping, we continue to use the previous token in scan order. This helps about 0.9% on derf after re-training on a limited data set. Full re-training and results on larger-resolution clips are pending. Note that this commit breaks trellis, so we can probably get further gains out of it by fixing trellis at some later point. Change-Id: Iead68e296fc3a105cca746b5e3da9555d6010cfe
2013-03-07Re-add support for ADST in superblocks.Ronald S. Bultje
This also changes the RD search to take account of the correct block index when searching (this is required for ADST positioning to work correctly in combination with tx_select). Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
2013-03-07Coding con-zero count rather than EOB for coeffsDeb Mukherjee
This patch revamps the entropy coding of coefficients to code first a non-zero count per coded block and correspondingly remove the EOB token from the token set. STATUS: Main encode/decode code achieving encode/decode sync - done. Forward and backward probability updates to the nzcs - done. Rd costing updates for nzcs - done. Note: The dynamic progrmaming apporach used in trellis quantization is not exactly compatible with nzcs. A suboptimal approach has been used instead where branch costs are updated to account for changes in the nzcs. TODO: Training the default probs/counts for nzcs Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
2013-03-04Make superblocks independent of macroblock code and data.Ronald S. Bultje
Split macroblock and superblock tokenization and detokenization functions and coefficient-related data structs so that the bitstream layout and related code of superblock coefficients looks less like it's a hack to fit macroblocks in superblocks. In addition, unify chroma transform size selection from luma transform size (i.e. always use the same size, as long as it fits the predictor); in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma transform will now use the 16x16 (instead of the 8x8) chroma transform, and 64x64 superblocks using the 32x32 luma transform will now use the 32x32 (instead of the 16x16) chroma transform. Lastly, add a trellis optimize function for 32x32 transform blocks. HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's a few negative points here and there that I might want to analyze a little closer. Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
2013-02-28Code cleanup.Dmitry Kovalev
Removing redundant 'extern' keyword, better formatting, code simplification. Change-Id: I132fea14f08c706ee9ea147d19464d03f833f25b
2013-02-27Move eob from BLOCKD to MACROBLOCKD.Ronald S. Bultje
Consistent with VP8. Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07
2013-02-26Spatial resamping of ZEROMV predictorsJohn Koleszar
This patch allows coding frames using references of different resolution, in ZEROMV mode. For compound prediction, either reference may be scaled. To test, I use the resize_test and enable WRITE_RECON_BUFFER in vp9_onyxd_if.c. It's also useful to apply this patch to test/i420_video_source.h: --- a/test/i420_video_source.h +++ b/test/i420_video_source.h @@ -93,6 +93,7 @@ class I420VideoSource : public VideoSource { virtual void FillFrame() { // Read a frame from input_file. + if (frame_ != 3) if (fread(img_->img_data, raw_sz_, 1, input_file_) == 0) { limit_ = frame_; } This forces the frame that the resolution changes on to be coded with no motion, only scaling, and improves the quality of the result. Change-Id: I1ee75d19a437ff801192f767fd02a36bcbd1d496
2013-02-26Merge "Merge cnvcontext experiment." into experimentalRonald S. Bultje
2013-02-26Merge "Refactor inter recon functions to support scaling" into experimentalJohn Koleszar
2013-02-26Merge cnvcontext experiment.Ronald S. Bultje
Change-Id: I35e64998b25694a3bb4a62164bba3c03c1db4bc7
2013-02-26Refactor inter recon functions to support scalingJohn Koleszar
Ensure that all inter prediction goes through a common code path that takes scaling into account. Removes a bunch of duplicate 1st/2nd predictor code. Also introduces a 16x8 mode for 8x8 MVs, similar to the 8x4 trick we were doing before. This has an unexpected effect with EIGHTTAP_SMOOTH, so it's disabled in that case for now. Change-Id: Ia053e823a8bc616a988a0af30452e1e75a739cba
2013-02-25Changing pitch value meaning for fht and iht transforms.Dmitry Kovalev
Pitch now means the number of elements, not the number of bytes. Change-Id: Idb9f2f012e39b09d596a3cc1802305a80b7c13af
2013-02-25clean up forward and inverse hybrid transformJingning Han
Rebased. Remove the old matrix multiplication transform computation. The 16x16 ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16 300/0 in vp9/common/vp9_blockd.h. Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
2013-02-23Split coefficient token tables intra vs. inter.Ronald S. Bultje
Change-Id: I5416455f8f129ca0f450d00e48358d2012605072
2013-02-23Further changes to coefficient contexts.Paul Wilkins
This patch alters the balance of context between the coefficient bands (reflecting the position of coefficients within a transform blocks) and the energy of the previous token (or tokens) within a block. In this case the number of coefficient bands is reduced but more previous token energy bands are supported. Some initial rebalancing of the default tables has been by running multiple derf clips at multiple data rates using the ENTOPY_STATS macro. Further balancing needs to be done using larger image formatsd especially in regard to the bigger transform sizes which are not as well represented in encodings of smaller image formats. Change-Id: If9736e95c391e711b04aef6393d26f60f36e1f8a
2013-02-21Forward butterfly hybrid transformJingning Han
This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT hybrid transform. The kernel of 4x4 ADST is sin((2k+1)*(n+1)/(2N+1)). The kernel of 8x8/16x16 ADST is of the form sin((2k+1)*(2n+1)/4N). Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1
2013-02-15Remove Y2 and Y-no-DC token types from the bitstream.Ronald S. Bultje
Change-Id: I7a5314daca993d46b8666ba1ec2ff3766c1e5042
2013-02-15Remove some Y2-related code.Ronald S. Bultje
Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78
2013-02-14Moved vp9_get_coef_band to header fileScott LaVarnway
allowing the compiler to inline. Change-Id: I66e5caf5e7fefa68a223ff0603aa3f9e11e35dbb
2013-02-13Abstract selection of coef band.Paul Wilkins
This patch abstracts the selection of the coefficient band context into a function as a precursor to further experiments with the coefficient context. It also removes the large per TX size coefficient band structures and uses a single matrix for all block sizes within the test function. This may have an impact on quality (results to follow) but is only an intermediate step in the process of redefining the context. Also the quality impact will be larger initially because the default tables will be out of step with the new banding. In particular the 4x4 will in this case only use 7 bands. If needed we can add back block size dependency localized within the function, but this can follow on after the other changes to the definition of the context. Change-Id: Id7009c2f4f9bb1d02b861af85fd8223d4285bde5
2013-02-13Abstract the selection of coefficient context.Paul Wilkins
This is an initial step to facilitate experimentation with changes to the prior token context used to code coefficients to take better account of the energy of preceding tokens. This patch merely abstracts the selection of context into two functions and does not alter the output. Change-Id: I117fff0b49c61da83aed641e36620442f86def86
2013-02-13Merge "Remove NEWCOEFCONTEXT experiment." into experimentalPaul Wilkins
2013-02-13enable bitstream lossless supportYaowu Xu
1. Added a bit in frame header to to indicate if a frame is encoded in lossless mode, so decoder does not make the decision based on Q0 2. Minor changes to make sure that lossy coding works same as when the lossless experiment is not enabled. 3. Renamed function pointers for transforms to be consistent, using prefix fwd_txm and inv_txm for forward and inverse respectively To encode in lossless mode, using "--lossless=1 --min-q=0 --max-q=0" with vpxenc. Change-Id: Ifae53b26d2ffbe378d707e29d96817b8a5e6c068
2013-02-13Remove NEWCOEFCONTEXT experiment.Paul Wilkins
Removal of the NEWCOEFCONTEXT experiment to reduce code clutter and make it easier to experiment with some other changes to the coefficient coding context. Change-Id: Icd17b421384c354df6117cc714747647c5eb7e98
2013-02-06Use fdct8x4 instead of fdct4x4 where the block size allows it.Ronald S. Bultje
This allows for faster SIMD implementations in the future (currently there is no speed impact). Change-Id: I732647e9148b5dcb44e6bc8728138f0141218329
2013-01-10Merge tx32x32 experiment.Ronald S. Bultje
Change-Id: I615651e4c7b09e576a341ad425cf80c393637833
2013-01-09New prediction filterAdrian Grange
This patch removes the old pred-filter experiment and replaces it with one that is implemented using the switchable filter framework. If the pred-filter experiment is enabled, three interopolation filters are tested during mode selection; the standard 8-tap interpolation filter, a sharp 8-tap filter and a (new) 8-tap smoothing filter. The 6-tap filter code has been preserved for now and if the enable-6tap experiment is enabled (in addition to the pred-filter experiment) the original 6-tap filter replaces the new 8-tap smooth filter in the switchable mode. The new experiment applies the prediction filter in cases of a fractional-pel motion vector. Future patches will apply the filter where the mv is pel-aligned and also to intra predicted blocks. Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4
2013-01-08Merge superblocks (32x32) experiment.Ronald S. Bultje
Change-Id: I0df99742029834a85c4933652b0587cf5b6b2587
2013-01-08Merge vp9-preview changes into experimental branchJohn Koleszar
Incorportate vp9-preview changes by merging master branch into experimental. Conflicts: test/test.mk vp9/common/vp9_filter.c vp9/common/vp9_idctllm.c vp9/common/vp9_invtrans.h vp9/common/vp9_mbpitch.c vp9/common/vp9_rtcd_defs.sh vp9/common/vp9_systemdependent.h vp9/common/vp9_type_aliases.h vp9/common/x86/vp9_asm_stubs.c vp9/common/x86/vp9_subpixel_mmx.asm vp9/decoder/vp9_decodframe.c vp9/decoder/vp9_dequantize.c vp9/decoder/vp9_dequantize.h vp9/decoder/vp9_onyxd_int.h vp9/encoder/vp9_bitstream.c vp9/encoder/vp9_encodeframe.c vp9/encoder/vp9_rdopt.c Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5
2012-12-26Build fixes to merge vp9-preview into masterJohn Koleszar
Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
2012-12-19New previous coef context experimentDeb Mukherjee
Adds an experiment to derive the previous context of a coefficient not just from the previous coefficient in the scan order but from a combination of several neighboring coefficients previously encountered in scan order. A precomputed table of neighbors for each location for each scan type and block size is used. Currently 5 neighbors are used. Results are about 0.2% positive using a strategy where the max coef magnitude from the 5 neigbors is used to derive the context. Change-Id: Ie708b54d8e1898af742846ce2d1e2b0d89fd4ad5
2012-12-18Use standard integer types for pixel values and coefficients.Ronald S. Bultje
For coefficients, use int16_t (instead of short); for pixel values in 16-bit intermediates, use uint16_t (instead of unsigned short); for all others, use uint8_t (instead of unsigned char). Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da
2012-12-18Give 4x4 scan and coef_band tables a _4x4 suffix.Ronald S. Bultje
This matches the names of tables for all other transform sizes. Change-Id: Ia7681b7f8d34c97c27b0eb0e34d490cd0f8d02c6
2012-12-11clean up tokenize_b() and stuff_b()Yaowu Xu
Change-Id: I0c1be01aae933243311ad321b6c456adaec1a0f5
2012-12-07experiment with CONTEXT conversionYaowu Xu
This commit changed the ENTROPY_CONTEXT conversion between MBs that have different transform sizes. In additioin, this commit also did a number of cleanup/bug fix: 1. removed duplicate function vp9_fix_contexts() and changed to use vp8_reset_mb_token_contexts() for both encoder and decoder 2. fixed a bug in stuff_mb_16x16 where wrong context was used for the UV. 3. changed reset all context to 0 if a MB is skipped to simplify the logic. Change-Id: I7bc57a5fb6dbf1f85eac1543daaeb3a61633275c
2012-12-0732x32 transform for superblocks.Ronald S. Bultje
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds code all over the place to wrap that in the bitstream/encoder/decoder/RD. Some implementation notes (these probably need careful review): - token range is extended by 1 bit, since the value range out of this transform is [-16384,16383]. - the coefficients coming out of the FDCT are manually scaled back by 1 bit, or else they won't fit in int16_t (they are 17 bits). Because of this, the RD error scoring does not right-shift the MSE score by two (unlike for 4x4/8x8/16x16). - to compensate for this loss in precision, the quantizer is halved also. This is currently a little hacky. - FDCT and IDCT is double-only right now. Needs a fixed-point impl. - There are no default probabilities for the 32x32 transform yet; I'm simply using the 16x16 luma ones. A future commit will add newly generated probabilities for all transforms. - No ADST version. I don't think we'll add one for this level; if an ADST is desired, transform-size selection can scale back to 16x16 or lower, and use an ADST at that level. Additional notes specific to Debargha's DWT/DCT hybrid: - coefficient scale is different for the top/left 16x16 (DCT-over-DWT) block than for the rest (DWT pixel differences) of the block. Therefore, RD error scoring isn't easily scalable between coefficient and pixel domain. Thus, unfortunately, we need to compute the RD distortion in the pixel domain until we figure out how to scale these appropriately. Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
2012-12-03merged optimiz_b_16x16() into optmize_b()Yaowu Xu
The commit changed the trellis quantization function optimize_b() to work for MBs using all transform sizes, and eliminated the function for MB using 16x16 transform only, optimize_b_16x16. Change-Id: I3fa650587ab5198ed16315b38754783a72b33ba2