Age | Commit message (Collapse) | Author |
|
Added some code to output normalized rd hit count stats.
In effect this approximates to the average number of rd
operations/tests per pixel for the sequence.
The results are not quite accurate and I have not bothered
to account for partial SB64s at frame edges and for key frames
However they do give some idea of the number of modes /
prediction methods being tested for each pixel across the
different partition sizes. This indicates how much scope their
is for further gains either by reducing the number of partitions
examined or the modes per partition through heuristics.
Patch 3 moved place where count incremented so partial rd
tests that are aborted with INT_MAX return are also counted.
Example numbers for first 50 frames of Akiyo.
Speed 0 ~84.4 rd operations / pixel
Speed 1 ~28.8
Speed 2 ~11.9
Change-Id: Ib956e787e12f7fa8b12d3a1a2f6cda19a65a6cb8
|
|
|
|
|
|
The two arrays are typically initialized to INT64_MAX, if they are not
filled with valid values before the addition, the values can overflow
and lead to wrong results.
Change-Id: I515de22cf3e8f55af4b74bdb2c8eb821a02d3059
|
|
Switching from mi_{width, height}_log2 and b_{width, height}_log2 to
num_8x8_blocks_{wide, high} and num_4x4_blocks_{wide, high}. Removing
redundant code, adding const.
Change-Id: Iaab2207590fd24d0b76999071778d1395dc5cd5d
|
|
Change-Id: I752e374867d459960995b24d197301d65ad535e3
|
|
|
|
Change-Id: I62bb07c377f947cb72fac68add7a6b199e42c6b9
|
|
|
|
|
|
This commit resolved a mis-alignment issue in compound inter-inter
prediction of sub8x8. This patch follows solution from dkovalev@.
Change-Id: I3cc0cf7e55b84110e0c42ef4b2e6ca7ac3f8f932
|
|
Removing references to plane_block_width and plane_block_height (we are
going to delete the latter ones).
Change-Id: I7982da4d373aebb54d2209dc8886f6192df4d287
|
|
|
|
|
|
Change-Id: I481d9bb2fa3ec72b6a83d5f04d545ad8013f295c
|
|
I've already renamed d27_predictor to d207_predictor but forgot about the
corresponding constant.
Change-Id: Id312aa80fc5b5a1ab8a709a33418a029552a6857
|
|
Making code more compact, adding consts, removing redundant arguments,
adding do/while(0) for macros.
Change-Id: Ic9ec0bc58cee0910a5450b7fb8cfbf35fa9d0d16
|
|
Values now carried over frame to frame.
Change to algorithm for decreasing threshold after
a hit and to max threshold (now based on speed)
Removed some old commented out code relating to
VP8 adaptive thresholds.
The impact of these changes tested on Akiyo (50 frames)
and measured in terms of unit rd hits is as follows:
Speed 0 84.36 -> 84.67
Speed 1 29.48 -> 22.22
Speed 2 11.76 -> 8.21
Speed 3 12.32 -> 7.21
Encode speed impact is broadly in line with these.
Change-Id: I5b886efee3077a11553fa950d796fd6d00c8cb19
|
|
Most of the focus so far has been on inter frames.
At high speed settings the key frame is now taking a high %
of the cycles.
This patch puts in some masking to reduce the number
of INTRA modes searched during key frame coding (as already
happens for inter frames) at higher speed settings
TODO: Develop this further with either adaptive rd thresholds
when choosing which intra modes to consider or some other
heuristic.
Impact.
At high speed settings on some clips the key frame was starting
to dominate. In a coding of the first 50 frames of AKIYO at speed
2 limiting the key frame intra modes to DC or TM_PRED resulted in
~30% overall speedup. For Bus the number was lower at ~4-5%.
Change-Id: I7bde68aee04995f9d9beb13a1902143112e341e2
|
|
Change-Id: Ieb7077ca3586b9491912027eed450a4f6fd38d30
|
|
|
|
We can determine plane_type for another function arguments.
Change-Id: I85331877aedb357632ae916a37b5b15f22c0bb1f
|
|
Change-Id: I32e43474e8651ef2eb181d24860a8f118cfea7bf
|
|
|
|
Check the minimum rate-distortion cost of regular quantization and
all zero coeffs cases in the sub8x8 inter prediction rd loop for
luma components. Use this as the cumulative rdcost sent to UV rd
estimation.
Change-Id: Ia4bc7700437d5e13d7cdad4cf9ae57ab036d3e97
|
|
Cleans up the switchable filter search logic. Also adds a
speed feature - a variance threshold - to disable filter search
if source variance is lower than this value.
Results: derfraw300
threshold = 16, psnr -0.238%, 4-5% speedup (tested on football)
threshold = 32, psnr -0.381%, 8-9% speedup (tested on football)
threshold = 64, psnr -0.611%, 12-13% speedup (tested on football)
threshold = 96, psnr -0.804%, 16-17% speedup (tested on football)
Based on these results, the threshold is chosen as 16 for speed 1,
32 for speed 2, 64 for speed 3 and 96 for speed 4.
Change-Id: Ib630d39192773b1983d3d349b97973768e170c04
|
|
This commit enables early termination in the rate-distortion
optimization search loop for chroma components. When the cumulative
rd cost is above the current best value, skip the rest per-block
transform/quantization/coeff_cost and continue to the next
prediction mode.
For bus_cif at 2000 kbps, the average run-time goes down from
168546ms -> 164678ms, (2% speed-up) at speed 0
36197ms -> 34465ms, (4% speed-up) at speed 1
Change-Id: I9d3043864126e62bd0166250d66b3170d520b3c0
|
|
Updating all foreach_transformed_block_visitor functions to work with
plane block size instead of general block. Removing a lot of duplicated
code.
Change-Id: I6a9069e27528c611f5a648e1da0c5a5fd17f1bb4
|
|
|
|
|
|
This change set is intermediate. The next one will remove all repetitive
plane_bsize calculations, because it will be passed as argument to
foreach_transformed_block_visitor.
Change-Id: Ifc12e0b330e017c6851a28746b3a5460b9bf7f0b
|
|
Initialize the best mode and tx_size values in the rate-distortion
optimization search loop.
Change-Id: Ibfb5c0895691f172abcd4265c23aef4cb99fa8af
|
|
Return the distortion value in vp9_rd_pick_intra_mode_sb as sum of
dist_y and dist_uv. Remove the right shift operation on dist_uv,
and make it consistent with that of vp9_rd_pick_inter_mode_sb.
Change-Id: I9d564e242d9add38e32595d33b0e0dddb1d55e5b
|
|
Redundant dst, pre[2] from build_inter_predictors_args, unused cm from
encode_b_args.
Change-Id: I2c476cd328c5c0cca4c78ba451ca6ba2a2c37e2d
|
|
|
|
Change-Id: I3814984a624bc64147c57efa74fbdda8eda47262
|
|
Updating foreach_transformed_block_visitor and corresponding functions
to accept tx_size instead of ss_txfrm_size. List of functions per file:
vp9_decodframe.c
decode_block
decode_block_intra
vp9_detokenize.c
decode_block
vp9_encodemb.c
optimize_block
vp9_xform_quant
vp9_encode_block_intra
vp9_rdopt.c
dist_block
rate_block
block_yrd_txfm
vp9_tokenize.c
set_entropy_context_b
tokenize_b
is_skippable
Change-Id: I351bf563eb36cf34db71c3f06b9bbc9a61b55b73
|
|
|
|
|
|
|
|
This commit makes the rate-distortion optimization search of chroma
components consistent across all block sizes. It removes redundant
codes.
Change-Id: I7e76f54d045e8efdd41d84a164c71f55b484471b
|
|
|
|
Updated function signatures:
txfrm_block_to_raster_block
txfrm_block_to_raster_xy
extend_for_intra
vp9_optimize_b
Change-Id: I7213f4c4b1b9ec802f90621d5ba61d5e4dac5e0a
|
|
Change-Id: I4fad357465022d14bfc7e13b348c6da267587314
|
|
VP9_COMMON is the right place to segmentatation struct because it has
global segmentation parameters, not something specific to macroblock
processing.
Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708
|
|
This commit unifies the rate-distortion cost calculation process of
luma and chroma components. It allows early termination to be enabled
later in the rd search loop of chroma components, in consistent with
luma pixels.
Change-Id: I2e52a7c6496176bf2a5e3ef338d34ceb8aad9b3d
|
|
The macro block mode info context originally contained an
entry for each 16x16 macroblock. In VP9 each entry refers
to an 8x8 region not a macro block, so the naming is misleading.
This first stage clean up changes the names of 3 entries in the
structure to remove the mb_ prefix.
TODO clean up the nomenclature more widely in respect of
mbmi and bmi.
Change-Id: Ia7305c6d0cb805dfe8cdc98dad21338f502e49c6
|
|
Refactor choose_largest_txfm_size_ and make it find the largest
transform size via lookup table.
Change-Id: I685e0396d71111b599d5367ab1b9c934bd5490c8
|
|
|
|
Enable SSE2 implementation of high precision 32x32 forward DCT. The
intermediate stacks are of 32-bits. The run-time goes down from
32126 cycles to 13442 cycles.
Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
|