summaryrefslogtreecommitdiff
path: root/vp9
AgeCommit message (Collapse)Author
2014-12-04vp9_ethread: the tile-based multi-threaded encoderYunqing Wang
Currently, VP9 supports column-tile encoding, which allows a frame to be encoded in multiple column tiles independently. The number of column tiles are set by encoder option "--tile-columns". This provides a way to encode a frame in parallel. Based on previous set of patches, this patch implemented the tile- based multi-threaded encoder. Each thread processes one or more tiles. Usage: For HD clips: --tile-columns=2 --threads=1/2/3/4 While using 4 threads, tests showed that the encoder achieved 2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at realtime speed 5. Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4
2014-12-01Cyclic refresh: factor segment delta-q into rate control.Marco Paniconi
Incorporate segment delta-q into estimated bits. This generally improves the rate control under cyclic refresh (aq=3) mode. Change-Id: I1dc60fb230e7d08357fae18909d8ed27bf58e037
2014-12-01Merge "Remove repeated search_type_check_frequency assign"Jingning Han
2014-12-01Merge "vp9_ethread: calculate and save the tok starting address for tiles"Yunqing Wang
2014-11-25Remove repeated search_type_check_frequency assignJingning Han
This parameter is initialized as 50. No need to re-assign the same value in speed -6. Change-Id: I8735a5593412df2fdcee53ae45c8ebd1c3d792e7
2014-11-25vp9_ethread: calculate and save the tok starting address for tilesYunqing Wang
Each tile's tok starting address is calculated before the encoding process. These addresses are stored so that the same calculation won't be done again in packing bit stream. Change-Id: I0a3be0301f002260c19a850303f2f73ebc47aa50
2014-11-25Separate rate_correction_factor for boosted GFsYaowu Xu
When the golden frame is boosted, the rate correction factor is not correlated well with other inter frames even in CBR mode. This commit changes to use GF specific rate_correction_factor when gf_cbr_boost is greater than 20%. Change-Id: I6312c1564387bcacc11f4c5e8a9cfdc781b5c3ab
2014-11-25Cosmetic change in vp9_pick_inter_modeJingning Han
Change-Id: Ic072585ebffdb36982ed7b8b9f875ca6c1c656c4
2014-11-25Adaptively adjust mode test kick-off thresholds in RTC codingJingning Han
This commit allows the encoder to increase the mode test kick-off thresholds if the previous best mode renders all zero quantized coefficients, thereby saving motion search runs when possible. The compression performance of speed -5 and -6 is down by -0.446% and 0.591%, respectively. The runtime of speed -6 is improved by 10% for many test clips. vidyo1, 1000 kbps 16578 b/f, 40.316 dB, 7873 ms -> 16575 b/f, 40.262 dB, 7126 ms nik720p, 1000 kbps 33311 b/f, 38.651 dB, 7263 ms -> 33304 b/f, 38.629 dB, 6865 ms dark720p, 1000 kbps 33331 b/f, 39.718 dB, 13596 ms -> 33324 b/f, 39.651 dB, 12000 ms mmoving, 1000 kbps 33263 b/f, 40.983 dB, 7566 ms -> 33259 b/f, 40.978 dB, 7531 ms Change-Id: I7591617ff113e91125ec32c9b853e257fbc41d90
2014-11-25Merge "Rework forward txfm/quantization skip system in RTC coding mode"Jingning Han
2014-11-24Merge "Remove redundant intra mode penalty from vp9_pick_inter_mode"Jingning Han
2014-11-24vp9_ethread: modify VP9_COMP structureYunqing Wang
This patch modified struct VP9_COMP. Created a struct ThreadData to include data that need to be copied for each thread. In multiple thread case, one thread processes one tile. all threads share one copy of VP9_COMP, (refer to VP9_COMP *cpi in the code) but each thread has its own copy of ThreadData, (refer to ThreadData *td in the code). Therefore, within the scope of encode_tiles(), both cpi and td need to be passed as function parameters. In single thread case, the FRAME_COUNTS pointer in ThreadData points to "counts" in VP9_COMMON. Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
2014-11-24Merge "Fix a tautological assert."Alex Converse
2014-11-24Fix a tautological assert.Alex Converse
Change-Id: I90ad08823e1d038384536fa9f458caadc2c87f38
2014-11-24Remove redundant intra mode penalty from vp9_pick_inter_modeJingning Han
The intra mode penalty is covered by intra_cost_penalty. This commit removes the other intra cost threshold, provided that the constant 50 is negligible in normal rate-distortion cost. Change-Id: I9b8b7483c43b9a41741622e7057def1f7d51bb72
2014-11-24Merge "Key frame non-RD mode decision process"Jingning Han
2014-11-24Merge "Refactored idct routines and headers"Debargha Mukherjee
2014-11-24Refactored idct routines and headersPeter de Rivaz
This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665 (cherry picked from commit 454342d4e77dbb67f4a3c10f97a57a6fcb46d9a0)
2014-11-24Key frame non-RD mode decision processJingning Han
This commit makes a non-RD coding mode decision process for key frame coding. It can be optionally turned on in speed -6 and above. Change-Id: I0847258b392877a0210b4768bef88ebc9ad009b5
2014-11-24Merge "Only allow for cyclic refresh (aq=3 mode) for base layer."Marco
2014-11-21Merge "Fix some minor nits."Paul Wilkins
2014-11-21Merge "Added highbitdepth sse2 acceleration for quantize"Debargha Mukherjee
2014-11-21Merge changes Ie077edd0,Id31a74fcPaul Wilkins
* changes: Remove rate component adjustment for AQ1 Switch AQ1 segment basis from q ratio to rate ratio.
2014-11-21Merge "Add adaptive midpoint for AQ1."Paul Wilkins
2014-11-21Merge "Add variance restriction to AQ2."Paul Wilkins
2014-11-21Only allow for cyclic refresh (aq=3 mode) for base layer.Marco
Condition existed for temporal case, added it for spatial as well. Issue: https://code.google.com/p/webm/issues/detail?id=878. Change-Id: I38339207f9a94924f5568a081eabe64f867a686d
2014-11-21Fix some minor nits.Paul Wilkins
Change-Id: Ib8810d431fa20a2c78e0caaa28eb2c99903e60fb
2014-11-21Merge "Further AQ1 clean up."Paul Wilkins
2014-11-21Rework forward txfm/quantization skip system in RTC coding modeJingning Han
This commit allows more aggressive decision to skip forward transform and quantization for luma component in RTC coding mode. The chroma components remains going through the normal coding routine, since they are not included in the non-RD mode search process. It reduces the runtime cost by 2% - 10%. In speed -6, vidyo1 1000 kbps 16576 b/f, 40.281 dB, 8402 ms -> 16576 b/f, 40.323 dB, 7764 ms nik720p 1000 kbps 33337 b/f, 38.622 dB, 7473 ms -> 33299 b/f, 38.660 dB, 7314 ms dark720p 1000 kbps 33330 b/f, 39.785 dB, 13505 ms -> 33325 b/f, 39.714 dB, 13105 ms The compression performance of speed -6 is improved by 0.44% in PSNR and 1.31% in SSIM. Change-Id: Iae9e3738de6255babea734e5897f29118bebc6d7
2014-11-21Merge "Initial AQ1 restructuring."Paul Wilkins
2014-11-21Remove rate component adjustment for AQ1Paul Wilkins
In AQ1 a rate adjustment was applied for blocks coded with a deltaq. This tends to skew the partition selection and cause rate overshoot. For example, consider a 64x64 super block where some but not all sub blocks are in a low q segment and some are in a high q segment. The choice of Q when considering large partition and transform sizes is defined by the lowest sub block segment id (currently this implies the lowest Q). If some parts of the larger partition are very hard this will cause a high rate component. The correct behavior here is for the rd code to discard the large partition choice and break down to sub blocks where some have low and some have high Q. However the rate correction factor above mask the high cost of coding at a larger partition size. Change-Id: Ie077edd0b1b43c094898f481df772ea280b35960
2014-11-21Switch AQ1 segment basis from q ratio to rate ratio.Paul Wilkins
In defining the Q deltas for segments in AQ1 use a rate ratio rather than a q ratio. Change-Id: Id31a74fcf2b7e55437e42a51c21b3cbcb57028d4
2014-11-20Add adaptive midpoint for AQ1.Paul Wilkins
Make the midpoint variance used in AQ mode 1 segmentation depend on the overall complexity of the frame in two pass. Change-Id: I452814ec57f7a32352e41bb250e78066abe952dd
2014-11-20Allow DC/H/V/TM on screen content.Alex Converse
6.3% better compression less than 1% compression time increase Change-Id: Ie83c059436e54c09de9e7c87e06e0a6d40dc38fe
2014-11-20Drop special inter mode selection for screen content.Alex Converse
Better mode selection was implemented for all content. Change-Id: I479778ed21d3968892f4dce396c83733583f4f23
2014-11-20Merge "vp9_ethread: move filter_cache out of RD_OPT struct"Yunqing Wang
2014-11-20Add variance restriction to AQ2.Paul Wilkins
Add an additional restriction to bit/complexity based segmentation based on spatial variance. Only lower Q when both the number of bits spent in the initial encoding pass and the spatial complexity are below a threshold. This will prevent the low Q segments being used just because there is a surfeit of bits. Small metrics gains especially opsnr. derf ~0.2% std-hd ~0.3% Change-Id: I6a8496d466d673f9b0e2b2ca6304ea7b6d8e1cce
2014-11-20Further AQ1 clean up.Paul Wilkins
Further patch to restructure AQ mode 1. Change-Id: I566452a033d047a49a40441a7be24690ea69412d
2014-11-20Initial AQ1 restructuring.Paul Wilkins
This is the first of a series of patches to restructure and improve AQ mode 1 (variance based AQ). Change-Id: Idcf693131a3ea2459dcfd957a54a65b971fa4a2a
2014-11-20Merge "Fix bug in calculating number of mbs with scaling."Paul Wilkins
2014-11-20Merge "vp9_ethread: move max/min partition size to mb struct"Yunqing Wang
2014-11-20vp9_ethread: move filter_cache out of RD_OPT structYunqing Wang
Similar to mask_filter, the filter_cache in RD_OPT struct can be moved out, and declared as a local variable since it is only used in pick_inter_mode functions. Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17
2014-11-20Merge "vp9_ethread: change mask_filter to a local variable"Yunqing Wang
2014-11-20Merge "Revert "vp9_ethread: include a pointer to mb in VP9_COMP""Yunqing Wang
2014-11-20Fix bug in calculating number of mbs with scaling.Paul Wilkins
Correct calculation of number of mbs in two pass code when frame resizing is enabled. Always use initial number of mbs if scaling is enabled, as this is what was used in the first pass. Change-Id: I49a4280ab5a8b1000efcc157a449a081cbb6d410
2014-11-20vp9_ethread: change mask_filter to a local variableYunqing Wang
The mask_filter in RD_OPT struct is used to record rd result in filter decision. It is only used in pick_inter_mode functions, and is removed from the struct and declared as a local variable. Change-Id: I3c95c8632ba7241591ce00ef2ef5677b5e297d7b
2014-11-20vp9_ethread: move max/min partition size to mb structYunqing Wang
The max_partition_size and max_partition_size are set at the beginning while setting speed features, and then adjusted at SB level. Moving them to mb struct ensures there is a local copy for each thread. Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4
2014-11-20Revert "vp9_ethread: include a pointer to mb in VP9_COMP"Yunqing Wang
This reverts commit 6906d218ddd1af97228a797f4558e402231d94f1. Another way will be used to handle mb struct. Change-Id: Ic1111a46b2b1ee00f8f9e3fcd4cf3eb6030b2dc4
2014-11-19Added highbitdepth sse2 acceleration for quantizePeter de Rivaz
Also includes block error. (This patch is mostly cherry picked from commit db7192e0b014a331a1dcb102c8a1148e9f0e1081) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78
2014-11-19Enable ssse3 version of vp9_fdct8x8_quantJingning Han
It improves the speed performance of vp9_fdct8x8_quant_sse2 by about 5%. Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097