summaryrefslogtreecommitdiff
path: root/vp9/encoder/vp9_dct.c
AgeCommit message (Collapse)Author
2014-11-24Refactored idct routines and headersPeter de Rivaz
This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665 (cherry picked from commit 454342d4e77dbb67f4a3c10f97a57a6fcb46d9a0)
2014-11-18Combine fdct8x8 and quantization processJingning Han
This commit reworks the forward transform and quantization process for 8x8 block coding. It combines the two operations in a single function to save a store/load stage of the original transform coefficients. Overall the speed -6 is slightly faster (around 1% range). The compression performance of speed -6 is improved by 3.4%. Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
2014-11-05Fix visual studio 2013 compiler warningsYaowu Xu
For configured with --enable-vp9-highbitdepth Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6
2014-10-09Rename highbitdepth functions to use highbd prefixDeb Mukherjee
Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
2014-09-11Adds high bitdepth transform functions and testsDeb Mukherjee
Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
2014-06-13Fix C versions of DC calculation functionsJingning Han
This commit fixes the scaling factors used in the C versions of the DC calculation functions. Change-Id: Iab41108c2bb93c2f2e78667214f3a772a2b707b5
2014-06-12Fast computation path for forward transform and quantizationJingning Han
This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
2014-05-19Adjust the forward 16x16 DCT computation stepsJingning Han
This commit adjusts the forward 16x16 DCT computation steps to simplify the register level operations. It fixes the corresponding sse2 version accordingly. Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08
2014-02-12minor spelling cleanup in commentsAndrew Russell
Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06
2014-02-06Finally removing "short" from transform names.Dmitry Kovalev
Change-Id: I5259b68dc1bcceb153e3ffe638a79a59a3019e9d
2014-01-27Removing _1d suffix from transform names.Dmitry Kovalev
It is enough to specify (e.g.) idct16, it is obviously different from idct16x16. Change-Id: I6b408a37a945de3162429380b59a775b03b95db0
2013-11-15Take out assertion from inverse transformsJingning Han
Separate the rounding and right shift operations of forward transform from those of inverse transform. Take out the assertion check from inverse transforms. If the transform coefficients were constructed to cause intermediate steps of inverse transform overflow, the codec will just let it overflow without breaking the decoding flow. Change-Id: I73cfc3706c4e840fc543a77cbc4cdb0b05d07730
2013-10-25Adding fht{4x4, 8x8, 16x16} functions.Dmitry Kovalev
Adding these functions to encapsulate tx_type check. Changing TX_TYPE to int to match the declaration in vo9_rtch.h. Change-Id: I6f3a2df6e35595ca73b6aaa9e3909ee7bc3fd16f
2013-10-24Making input pointer constant for all fdct/fht functions.Dmitry Kovalev
Change-Id: I78f7012f967a777ddd39bae6671eb501df6bbfe8
2013-10-23Renaming vp9_short_fdct4x4 and vp9_short_walsh4x4.Dmitry Kovalev
For consistency with idct function names. Renames: vp9_short_fdct4x4 -> vp9_fdct4x4 vp9_short_walsh4x4 -> vp9_fwht4x4 Change-Id: Id15497cc1270acca626447d846f0ce9199770f58
2013-10-23Renaming vp9_short_fdct32x32 to vp9_fdct32x32.Dmitry Kovalev
For consistency with idct function names. Change-Id: Ie77b7178e0894c57cd5cb9243c949eb9224ece18
2013-10-23Merge "Renaming vp9_short_fdct16x16 to vp9_fdct16x16."Dmitry Kovalev
2013-10-23Renaming vp9_short_fdct16x16 to vp9_fdct16x16.Dmitry Kovalev
For consistency with idct function names. Change-Id: I5ca355ba99fdba04f09254be95cf79808b534f71
2013-10-23Renaming vp9_short_fdct8x8 to vp9_fdct8x8.Dmitry Kovalev
For consistency with idct function names. Change-Id: I7b6af2f92c66eff56f84ed29edc3a66af8dc421f
2013-10-22Merge "Using stride (# of elements) instead of pitch (bytes) in fdct4x4."Dmitry Kovalev
2013-10-22Merge "Using stride (# of elements) instead of pitch (bytes) in fdct8x8."Dmitry Kovalev
2013-10-21Using stride (# of elements) instead of pitch (bytes) in fdct4x4.Dmitry Kovalev
Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b
2013-10-18Using stride (# of elements) instead of pitch (bytes) in fdct8x8.Dmitry Kovalev
Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: Ibc944952a192e6c7b2b6a869ec2894c01da82ed1
2013-10-18Using stride (# of elements) instead of pitch (bytes) in fdct16x16.Dmitry Kovalev
Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: I2d95fdcbba96aaa0ed24a80870cb38f53487a97d
2013-10-17Using stride (# of elements) instead of pitch (bytes) in fdct32x32.Dmitry Kovalev
Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: Id623c5113262655fa50f7c9d6cec9a91fcb20bb4
2013-10-15Removing unused 8x4 transform from the encoder.Dmitry Kovalev
Change-Id: Icbcf68b5b685a56f255ebc3859c9692accdadf9e
2013-10-11Adding const to the input argument of all 1D transforms.Dmitry Kovalev
Also adding static to iadst16_1d and fadst16 functions. Change-Id: I13c7df3b776f0f8efc6e80099bdb0a2f6d29edaf
2013-10-10Consistent names for FDCT functions.Dmitry Kovalev
Renames: fdct4_1d -> fdct4 fadst4_1d -> fadst4 fdct8_1d -> fdct8 fadst8_1d -> fadst8 fdct16_1d -> fdct16 fadst16_1d -> fadst16 "_1d" suffix is redundant, so removing it. The same will happen with idct in the next change sets. Change-Id: Ibf421cd2f569146c6079269df7a31819c098265e
2013-10-04cpplint vp9_dct.c issues resolvedJim Bankoski
Change-Id: Ia21653a447040f1b472d21ebd19103b0558c4b16
2013-09-24Rename defined constantsYaowu Xu
The change is to better reflect the nature of the constants. Change-Id: Icabac6e9bceefbdb3f03f8218f88ef75943c30fb
2013-09-19fix integer overflow errorsYaowu Xu
Change-Id: I76f440a917832c02d7a727697b225bac66b99f56
2013-08-31Fix 32x32 forward transform SSE2 versionJingning Han
This commit fixed the potential overflow issue in the SSE2 implementation of 32x32 forward DCT. It resolved the corrupted coded frames in the border of scenes. Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9
2013-07-03Refactor SSE2 8x8 functional unitsJingning Han
These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT hybrid transform coding. Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d
2013-06-29SSE2 version of vp9_short_fdct32x32_rd.Christian Duvivier
43,000 -> 5,750 cycles, about 7.5x faster. Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
2013-06-25Add 8x8 dct/adst unit testsJingning Han
This commit enables 8x8 DCT and hybrid transform unit tests. It also tunes the forward hybrid transform rounding opertions for more precise round-trip performance. Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
2013-06-18Make fdct32 computation flow within 16bit rangeJingning Han
This commit makes use of dual fdct32x32 versions for rate-distortion optimization loop and encoding process, respectively. The one for rd loop requires only 16 bits precision for intermediate steps. The original fdct32x32 that allows higher intermediate precision (18 bits) was retained for the encoding process only. This allows speed-up for fdct32x32 in the rd loop. No performance loss observed. Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
2013-05-30Changed to use a new variant of WHTYaowu Xu
The commit changed to use a new variant of Walsh-Hadamard Transform by Tim Terriberry. This new variant has the best compression among a number of variants that developed by Tim. Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9
2013-05-27Reduce WHT complexity.Timothy B. Terriberry
Saves 1 add, 3 shifts (and a shift bias) per 1-D transform. Change-Id: I1104bb1679fe342b2f9677df8a9cdc0cb9699e7d
2013-04-16Faster vp9_short_fdct4x4 and vp9_short_fdct8x4.Christian Duvivier
Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda
2013-04-16Merge branch 'experimental' into masterJohn Koleszar
VP9 preview bitstream 2, commit '868ecb55a1528ca3f19286e7d1551572bf89b642' Conflicts: vp9/vp9_common.mk Change-Id: I3f0f6e692c987ff24f98ceafbb86cb9cf64ad8d3
2013-03-18Merge "removed reference to "LLM" and "x8"" into experimentalYaowu Xu
2013-03-15Faster vp9_short_fdct16x16.Christian Duvivier
Scalar path is about 1.5x faster (3.1% overall encoder speedup). SSE2 path is about 7.2x faster (7.8% overall encoder speedup). Change-Id: I06da5ad0cdae2488431eabf002b0d898d66d8289
2013-03-13removed reference to "LLM" and "x8"Yaowu Xu
The commit changed the name of files and function to remove obselete reference to LLM and x8. Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516
2013-02-27Faster vp9_short_fdct8x8.Christian Duvivier
Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d
2013-02-27Code cleanup.Dmitry Kovalev
Fixing code style, using array lookup instead of switch statements for forward hybrid transforms (in the same way as for their inverses). Consistent usage of ROUND_POWER_OF_TWO macro in appropriate places. Change-Id: I0d3822ae11f928905fdbfbe4158f91d97c71015f
2013-02-27Merge "Improve 32x32 forward dct" into experimentalYaowu Xu
2013-02-26Improve 32x32 forward dctYaowu Xu
The commit improves the 32x32 forward dct implementation: 1. change to use same constants and rounding as other forward dcts 2. select rounding to specifically minimize the roundtrip error, which improved average 19/block to .77/block using 100000 random input. Test showed a small but consistent gain on all test sets, about .15% Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53
2013-02-25Changing pitch value meaning for fht and iht transforms.Dmitry Kovalev
Pitch now means the number of elements, not the number of bytes. Change-Id: Idb9f2f012e39b09d596a3cc1802305a80b7c13af
2013-02-25Improving the forward 16x16 ADST/DCT accuracyJingning Han
Increase the first stage dynamic range by 4 times, and reduce it back with proper rounding before applying the second stage. Hence it still fits in the given dynamic range and slightly improves the key frame coding performance. Change-Id: Ia4c5907446f20a95dc3de079c314b3ad1221d8aa
2013-02-25clean up forward and inverse hybrid transformJingning Han
Rebased. Remove the old matrix multiplication transform computation. The 16x16 ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16 300/0 in vp9/common/vp9_blockd.h. Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f