summaryrefslogtreecommitdiff
path: root/vp9/encoder/vp9_quantize.c
AgeCommit message (Collapse)Author
2014-07-08Re-design quantization process for 32x32 transform blockJingning Han
This commit enables a new quantization process for 32x32 2D-DCT transform coefficient blocks. It improves the compression performance of speed 5 by 1.4%. The overall compression gains of speed 5 due to the new quantization scheme is 4.7%. It also includes the SSSE3 implementation of the 32x32 quantization process. Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
2014-07-02Split vp9_rdopt into vp9_rdopt and vp9_rd.Alex Converse
vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all other rd related routines. Anything used outside of making an rd optimal decision belongs in rd. Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b
2014-07-01Re-design quantization processJingning Han
This commit re-designs the quantization process for transform coefficient blocks of size 4x4 to 16x16. It improves compression performance for speed 7 by 3.85%. The SSSE3 version for the new quantization process is included. The average runtime of the 8x8 block quantization is reduced from 285 cycles -> 255 cycles, i.e., over 10% faster. Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
2014-06-12Fast computation path for forward transform and quantizationJingning Han
This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
2014-05-14vp9_quantizer.c: cleanup -wextra warningsYaowu Xu
Change-Id: If5a3c48a8c554018a5d63c1541a2900f15767a00
2014-04-22Renaming "onyx" to "encoder".Dmitry Kovalev
Actual renames: vp9_onyx_if.c -> vp9_encoder.c vp9_onyx_int.h -> vp9_encoder.h Change-Id: I80532a80b118d0060518e6c6a0d640e3f411783c
2014-04-22Fix the CONFIG_ALPHA build.Alex Converse
Change-Id: Ib89fe34812c17cd6294ce3c38f87d43a79abb16f
2014-04-16Remove old activity masking code.Paul Wilkins
Delete code relating to the old VP8_TUNE_SSIM flag as this code does not currently work and is largely made redundant in VP9 by the various AQ modes. Change-Id: I71f28e1f680573d296422254489000678552b17b
2014-04-09Moving q_trans[] table to vp9_quantize.{c, h}.Dmitry Kovalev
Change-Id: I1324c339815a47004ddccdaf651d24c60382b92f
2014-04-01Renaming two members in MACROBLOCKD struct.Dmitry Kovalev
Renames: mi_8x8 -> mi mode_info_stride -> mi_stride Change-Id: I66f3e5fd1e7b7f46f108af5bb711c5fd9493c1be
2014-03-28Moving encoder quantization parameters into separate struct.Dmitry Kovalev
Change-Id: I2a169535489aeda3943fb5a46ab53e7a12abaa36
2014-02-28Fixing include order in vp9_quantize.cDmitry Kovalev
Change-Id: Ic32eb103d0d7f98c0a16c4e7bdec117faf05df02
2014-02-28Cleaning up vp9_quantize.c.Dmitry Kovalev
Change-Id: I9a38af32f16f196b83dd69755eafb9543edf5691
2014-02-14A couple more V.S. warnings silenced.Paul Wilkins
Change-Id: Ica1b583d69810182f621de757d2543b2a3b35566
2014-02-10Merge "Encoder quantization cleanup."Dmitry Kovalev
2014-02-03Encoder quantization cleanup.Dmitry Kovalev
Change-Id: I633205c95f0e81ce0589580501d0be4425a3cb8e
2014-01-29Removing ENC_DEBUG.Dmitry Kovalev
Change-Id: I101017621003314f000a454725ea13fc9db43177
2013-12-03Moving eob array to the encoder.Dmitry Kovalev
In the decoder we don't need to save eobs, we can pass eob as an argument. That's why removing eob arrays from VP9Decompressor and TileWorkerData, and moving eob pointer from macroblockd_plane to macroblock_plane. Change-Id: I8eb919acc837acfb3abdd8319af63d1bbca8217a
2013-12-02Remove plane_block_idx.Alex Converse
Its last remaining caller can be passed its results directly without any additional work. Also, it's not non-4:2:0 safe. Change-Id: Ia5089ba5f7f66c7617270483c619c9271aefd868
2013-11-26Removing qcoeff buffers from the decoder.Dmitry Kovalev
We only need qcoeff buffers in the encoder. Reducing TileWorkerData struct and VP9Decompressor struct sizes by 24K. Change-Id: Id148868461f7ffa3d3dd634b371503ae9c57e207
2013-11-12Moving q_index from MACROBLOCKD to MACROBLOCK.Dmitry Kovalev
Moving because q_index is used only by encoder. Change-Id: I0b96175614ed4fd3d76ee56a0ba36258e1e896f6
2013-11-05Cleaning up vp9_quantize_b_c() function.Dmitry Kovalev
Change-Id: I42c75530a8c9cff68480657f074131e6b60d9fca
2013-10-29Adding const to vp9_quantize_b_{32x32,} parameters.Dmitry Kovalev
Change-Id: I56f8c50ac382202f66040cd9cfaa05d889572fc7
2013-10-28Cleaning up vp9_regular_quantize_b_4x4.Dmitry Kovalev
Passing scan & iscan as parameters, adding useful local variables. Change-Id: Ia2a87906941db9557350d273669ce5c3cdb7235d
2013-10-16Get rid of "this_mi", use "mi_8x8[0]" everywhere insteadGuillaume Martres
The only case where they were intentionally pointing to different structures was in mbgraph, and this didn't have the expected behavior because both of these pointers are used interchangeably through the code Change-Id: I979251782f90885fe962305bcc845bc05907f80c
2013-10-16Implement variance-based adaptive quantizationGuillaume Martres
This should be similar to what x264 does with --aq-mode 1. It works well with clips like parkjoy and touhou (http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv). At low bitrates, the segmentation signaling overhead may negate the benefits of this feature. (PGW) Default changed to feature OFF to allow provisional merge. Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b
2013-09-24Removing redundant 'extern' keyword.Dmitry Kovalev
Change-Id: Ie51306689c0dc527a8aa12d3984389dd8f360dea
2013-09-13Merge "New mode_info_context storage -- undo revert"Scott LaVarnway
2013-09-11New mode_info_context storage -- undo revertScott LaVarnway
mode_info_context was stored as a grid of MODE_INFO structs. The grid now constists of pointers to MODE_INFO structs. The MODE_INFO structs are now stored as a stream (decoder only), eliminating unnecessary copies and is a little more cache friendly. Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d
2013-09-10Remove redundant condition check in 32x32 quantJingning Han
The c code implementation of 32x32 quantization does the zbin check of all coefficients prior to the quant/dequant loop, hence removing the redundant zbin check inside the loop. This only affects the c code version. SSSE3 version does not separate the zbin check out. Change-Id: Ic197a7d61d0b25fcac3cc092987651378cb56e4e
2013-09-09Merge "Revert "New mode_info_context storage""James Zern
2013-09-09Revert "New mode_info_context storage"James Zern
This reverts commit dae17734ece414091ba1184f7becd0aa6c0004f1 Encode crashes, leaks and increases integer overflow errors. Change-Id: I595aa2649bb8d0b6552ff91652837a74c103fda2
2013-09-08Merge "New mode_info_context storage"Jim Bankoski
2013-09-06Fix overflow issue in 16x16 quantization SSSE3Jingning Han
The 16x16 transform unit test suggested that the peak coefficient value can reach 32639. This could cause potential overflow issue in the SSSE3 implmentation of 16x16 block quantization. This commit fixes this issue by replacing addition with saturated addition. Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e
2013-09-06New mode_info_context storageScott LaVarnway
mode_info_context was stored as a grid of MODE_INFO structs. The grid now constists of a pointer to a MODE_INFO struct and a "in the image" flag. The MODE_INFO structs are now stored as a stream, eliminating unnecessary copies and is a little more cache friendly. For the test clips used, the decoder performance improved by ~4.3% (1080p) and ~9.7% (720p). Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p) and 5.9% (720p). Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
2013-09-05Use saturated addition in SSSE3 of 32x32 quantJingning Han
The 32x32 forward transform can potentially reach peak coefficient value close to 32700, while the rounding factor can go upto 610. This could cause overflow issue in the SSSE3 implementation of 32x32 quantization process. This commit resolves this issue by replacing the addition operations with saturated addition operations in 32x32 block quantization. Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70
2013-08-29Fix overflow issue in SSSE3 32x32 quantizationJingning Han
The 32x32 quantization process can potentially have the intermediate stacks over 16-bit range, thereby causing enc/dec mismatch. This commit fixes this overflow issue in the SSSE3 implementation, as well as the prototype, of 32x32 quantization. This fixes issue 607 from webm@googlecode. Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806
2013-08-19Moving plane_block_idx from vp9_blockd.h to vp9_quantize.c.Dmitry Kovalev
Change-Id: Ib8af21f2e7f603c2fb407e5d15a3bba64b545b49
2013-08-15Moving segmentation struct from MACROBLOCKD to VP9_COMMON.Dmitry Kovalev
VP9_COMMON is the right place to segmentatation struct because it has global segmentation parameters, not something specific to macroblock processing. Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708
2013-08-12Quantization code cleanup.Dmitry Kovalev
Change-Id: I77b42418b852093f79260cbd880533a0bd86678f
2013-08-09Inlining 16 as a stride for BLOCK_OFFSET macro.Dmitry Kovalev
Change-Id: I7f23d174eb089e5500f268a10db09648634c1b82
2013-07-15Inline vp9_quantize() in xform_quant().Ronald S. Bultje
Cycle times: 4x4: 151 to 131 cycles (15% faster) 8x8: 334 to 306 cycles (9% faster) 16x16: 1401 to 1368 cycles (2.5% faster) 32x32: 7403 to 7367 cycles (0.5% faster) Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup. Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
2013-07-11Moving segmentation related vars into separate struct.Dmitry Kovalev
Adding segmentation struct to vp9_seg_common.h. Struct members are from macroblockd and VP9Common structs. Moving segmentation related constants and enums to vp9_seg_common.h. Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
2013-07-01Update quantize SSSE3 SIMD to cover 32x32 transform case also.Ronald S. Bultje
Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to 2min10.1, i.e. a 2.3% overall speed increase. Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
2013-07-01Quantize (64-bit only, for now) SSSE3 SIMD.Ronald S. Bultje
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-06-28Make coefficient skip condition an explicit RD choice.Ronald S. Bultje
This commit replaces zrun_zbin_boost, a method of biasing non-zero coefficients following runs of zero-coefficients to be rounded towards zero, with an explicit skip-block choice in the RD loop. The logic is basically that if individual coefficients should be rounded towards zero (from a RD point of view), the trellis/optimize loop should take care of it. If whole blocks should be zero (from a RD point of view), a single RD check is much more efficient than a complete serialization of the quantization loop. Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim. SIMD for quantize will follow in a separate patch. Results for other test sets pending. Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
2013-06-27Inline quantize so idiv instruction gets removed from inner loop.Ronald S. Bultje
Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min15.0 to 3min10.9, i.e. 2.1% faster overall. Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
2013-06-19Add two-pass quantizationYunqing Wang
Optimized the quantization function by making it a two-pass process. The first pass does a quick checking of the transform coefficients against the base ZBIN, and only keep the good enough set of coefficients for quantization. A skipping check is added. If all coefficients are within the base ZBIN, no quantization is needed. The second pass is the actual quantization pass, which only processes the coefficient subset determined in first pass. This reduces the computation. Furthermore, an alternitive method is used for large transform size, which often has sparse nonzero quantized coefficients. Overall, the encoder speedup is about 4%. The quantization function itself gets 20% faster. Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
2013-05-23Merge Scatter Scan experiment.Paul Wilkins
Removal from under configure flag. A bit renaming Change-Id: I2213229dfe852001dfec16b149f47c52ce88f3aa
2013-05-16Initial version of alpha channel supportJohn Koleszar
This is a mostly-working implementation of an extra channel in the bitstream. Configure with --enable-alpha to test. Notable TODOs: - Add extra channel to all mismatch tests, PSNR, SSIM, etc - Configurable subsampling - Variable number of planes (currently always uses all 4) - Loop filtering - Per-plane lossless quantizer - ARNR support This implementation just uses the same contents as the Y channel for the A channel, due to lack of content and general pain in playing back 4 channel content. A later patch will use the actual alpha channel passed in from outside the codec. Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592