Age | Commit message (Collapse) | Author |
|
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.
Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
|
|
vestigial. replace instances with memset() which they already were being
defined to.
Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201
|
|
Also fixes a broken build with --enable-coefficient-range-checking
configuration option.
Change-Id: Icc536f53088e8cec59dfb8f635668555fdb9125e
|
|
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.
The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected. For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.
Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
(cherry picked from commit 454342d4e77dbb67f4a3c10f97a57a6fcb46d9a0)
|
|
Change-Id: I266777d40c300bc53b45b205144520b85b0d6e58
(cherry picked from commit a1b726117f5470f227bc90cd030b7d25045dc510)
|
|
For configured with --enable-vp9-highbitdepth
Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6
|
|
Uses highbd_ prefix convention consistently.
Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
|
|
Bit-stream clarification related to Issue 868.
Change-Id: I92a7bc5b7782c9ea5c3f6cceec761742183c9514
|
|
Resolves a visual studio warning, and includes some cleanups.
Change-Id: I6a7576ef323c475b7d1c659800cd82c6cb1fd18d
|
|
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.
Also incudes some minor cleanups/renaming etc.
Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
|
|
Some header file in vp9_idct.c has been included in vp9_idct.h.
This commit removes these redundant declarations.
Change-Id: I0238c27e4efff5c981eb437022c6bc6970c4e445
|
|
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
|
|
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.
The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).
Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
|
|
It is enough to specify (e.g.) idct16, it is obviously different from
idct16x16.
Change-Id: I6b408a37a945de3162429380b59a775b03b95db0
|
|
Change-Id: Ia568f70bddc1a2b62141a0197459119ca74c22b5
|
|
Change-Id: If97ae16a4478717933345b6b9d5bc1b417b8dd84
|
|
When only upper-left 8x8 area has non-zero dct coefficients, we
could skip 1D IDCT for 9th to 32th rows to save operations. This
function is called when eob <= 34.
Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
|
|
Also renaming dest_stride to stride in some places.
Change-Id: I75f602b623a5a7071d4922b747c45fa0b7d7a940
|
|
Renames:
vp9_iht_add -> vp9_iht4x4_add
vp9_iht_add_8x8 -> vp9_iht8x8_add
vp9_iht_add_16x16 -> vp9_iht16x16_add
Change-Id: I8f1a2913e02d90d41f174f27e4ee2fad0dbd4a21
|
|
Renames:
vp9_short_iht4x4_add -> vp9_iht4x4_16_add
vp9_short_iht8x8_add -> vp9_iht8x8_64_add
vp9_short_iht16x16_add_c -> vp9_iht16x16_256_add
Change-Id: Ibca7a188fd062b196787ac5efc1ea545e7f166c0
|
|
Also adding static to iadst16_1d and fadst16 functions.
Change-Id: I13c7df3b776f0f8efc6e80099bdb0a2f6d29edaf
|
|
We have two SSE2-optimized functions for idct4_1d:
vp9_idct4_1d_sse2 <-- removing this one
idct4_1d_sse2
vp9_idct4_1d_sse2 was used only by the following functions which already
have SSE2 optimized variants:
vp9_idct4x4_16_add_c -> vp9_idct4x4_16_add_see2
idct8_1d -> vp9_idct8x8_{16, 10, 1}_see2
vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2
Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
|
|
Renames:
vp9_short_idct32x32_add -> vp9_idct32x32_1024_add
vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add
vp9_idct_add_32x32 -> vp9_idct32x32_add
Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
|
|
|
|
When all coefficients are zeros, skip the corresponding 1-D inverse
transform. This practice has been used in the SSE2 implementation of
inverse 32x32 DCT. This commit imports this algorithm into the C code.
Change-Id: I0f58bfcb183a569fab85d524d5d9cf8ae8653f86
|
|
Renames:
vp9_short_idct16x16_add -> vp9_idct16x16_256_add
vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add
vp9_short_idct16x16_1_add -> vp9_idct16x16_1_add
vp9_idct_add_16x16 -> vp9_idct16x16_add
Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3
|
|
Renames:
vp9_short_idct8x8_add -> vp9_idct8x8_64_add
vp9_short_idct8x8_1_add -> vp9_idct8x8_1_add
vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add
vp9_idct_add_8x8 -> vp9_idct8x8_add
Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1
|
|
The idea is to have the following names for each transform size:
vp9_idct4x4_add
vp9_idct4x4_1_add
vp9_idct4x4_10_add
vp9_idct4x4_16_add
vp9_idct8x8_add
vp9_idct8x8_1_add
vp9_idct8x8_10_add
vp9_idct8x8_64_add
etc for 16x16, 32x32
The actual list of renames in this patch:
vp9_idct_add_lossless -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
vp9_idct_add -> vp9_idct4x4_add
vp9_short_idct4x4_add -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
|
|
Moving functions from vp9_idct_blk to vp9_idct because these functions are
used from both encoder and decoder. Removing duplicated code from
vp9_encodemb.c and reusing existing functions.
Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8
|
|
We don't need these functions anymore. The only one which was actually
used is vp9_add_constant_residual_32x32. Addition of
vp9_short_idct32x32_1_add eliminates this single usage. SSE2 optimized
version of vp9_short_idct32x32_1_add will be added in the next patch set,
right now it is only C implementation. Now we have all idct functions
implemented in a consistent manner.
Change-Id: I63df79a13cf62aa2c9360a7a26933c100f9ebda3
|
|
Making name consistent with vp9_short_idct8x8 and vp9_short_idct8x8_1.
Change-Id: I99e0be040ec893f9571dcf090e18f98dc58339f5
|
|
Making function name consistent with vp9_short_idct16x16 and
vp9_short_idct16x16_1.
Change-Id: I70e54be9e6b9a1dddab0de470686591e96d05517
|
|
The change is to better reflect the nature of the constants.
Change-Id: Icabac6e9bceefbdb3f03f8218f88ef75943c30fb
|
|
The inverse 32x32 transform detects all zero entries and skips the
computations accordingly per 8 rows in the first 1-D operation. The
function vp9_short_idct10_32x32_add performs differently and is not
used anywhere, hence removed.
Change-Id: Ic4fad422debbde7b6b6ffed47c69fbd4268a906c
|
|
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.
Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
|
|
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.
Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
|
|
They share the same functionality, so merging together.
Change-Id: I98a0386fcee052cb854f9ff90c283c1b844bcb79
|
|
Change-Id: I386066b9bcfb4bffb582e6827af36ca0181f6a83
|
|
This commit enables SSE2 implementation of 16x16 inverse ADST/DCT
hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles.
This provides about 1% encoding speed-up at speed 0.
Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b
|
|
Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b
|
|
The commit changed to use a new variant of Walsh-Hadamard Transform
by Tim Terriberry. This new variant has the best compression among a
number of variants that developed by Tim.
Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9
|
|
Saves 1 add, 3 shifts (and a shift bias) per 1-D transform.
Change-Id: I1104bb1679fe342b2f9677df8a9cdc0cb9699e7d
|
|
No longer used.
Change-Id: Id28c9247cebba183c6fa786dff96824ae100132c
|
|
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22
|
|
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: Iacfd57324fbe2b7beca5d7f3dcae25c976e67f45
|
|
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: Iea7976b22b1927d24b8004d2a3fddae7ecca3ba1
|
|
This patch eliminates the intermediate diff buffer usage by
combining the short idct and the add residual into one function.
The encoder can use the same code as well.
Change-Id: I4ea09df0e162591e420d869b7431c2e7f89a8c1a
|
|
The commit changed the name of files and function to remove obselete
reference to LLM and x8.
Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516
|