Age | Commit message (Collapse) | Author |
|
Add vp9_copy_neon.c
- vp9_convolve_copy_neon
Change-Id: I291fc5423d06240876411bbceab03eae5ef585be
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Add vp9_avg_neon.c
- vp9_convolve_avg_neon
Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd
Signed-off-by: James Yu <james.yu@linaro.org>
|
|
Change-Id: Ic9438031282e63e627550f7e4cdeda36e43e647b
|
|
Change-Id: I3b5a478d198868c2796366f0ac59d0e2036308b8
|
|
Uses highbd_ prefix convention consistently.
Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
|
|
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.
Also incudes some minor cleanups/renaming etc.
Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
|
|
warning: comparison between signed and unsigned integer expressions.
Change-Id: Ib6ee7500fe910983f290fc321ad89c0ab9989455
|
|
Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3
|
|
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
|
|
If optimizations use more than one cpu feature, allow
specifying them so that '--disable-X' still works
https://code.google.com/p/webm/issues/detail?id=854
Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18
|
|
A bug in Microsoft compiler was found in the function
vp9_filter_block1d16_v8_avx2 and a workaround applied.
the bug occur when there was 4 consecutive maddubs + min + adds
intrinsic instructions.
Change-Id: I83499faeb70971e650e5663fd2490360ddb1a51b
|
|
_t is reserved by posix
+ switch to camelcase
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Type_Names
Change-Id: I2a22ffc36e9f88781bc7db0d5a28a7ed924bab1a
|
|
used to wrap API functions to ensure full environment consistency as
opposed to the renamed ASM_REGISTER_STATE_CHECK which is used with
assembly functions.
currently checks the FPU tag word in x86/x86_64 gcc builds to ensure
emms has been called.
Change-Id: Ie241772dbf903d33d516a1add4c8c6783f2e1490
|
|
tests failing under Win32/Win64
Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf
|
|
This patch turned on unit tests for AVX2 convolve functions.
Change-Id: I51b8bfdaa290fb22862c68af61abf2394d00d47c
|
|
The intepolation filter functions can be better tested withe extreme
values, especially given the optimization functions are prone to
overflow signed 16 bit intermediate value when operation order is
wrong.
Change-Id: I712142b0bc1e5969c692c0486a57ffa37c9742b5
|
|
Change-Id: Id401da740b0a0141caaef9e1bcccd981e5cef4a4
|
|
Allow selectively building just the intrinsics for armv8
Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477
|
|
Change-Id: Ic24a371817c9dd5c4035a6fe01111bd9ab63f552
|
|
Change-Id: I826655a708010149de231ca31a2e3ba4f1842c0c
|
|
Change-Id: I23ed873a6c47b15491a2ffbcdd4f0fdeef1207a0
|
|
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.
Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
|
|
Change-Id: I401536778e3c68ba2b3ae3955c689d005e1f1d59
|
|
- Intermediate height was not correct i.e. when block size is 4 and
y_step_q4 is 6. In this case intermediate height was
(4*6) >> 4 = 1 and vertical interpolation needs two source pixels
plus 7 extra pixels for taps.
- Also if the current output block is 16x16 and we are using 4x upscaling
we need only 12 rows after horizontal filtering instead of 16.
Patch Set 2: Intermediate_height updated after CL 66723
"Fix bug in convolution functions (filter selection)"
Change-Id: I5a1a1bc2ac9d5edb3a6e0818de618bf318fdd589
|
|
(In response to Issue 604:
https://code.google.com/p/webm/issues/detail?id=604)
There were bugs in the convolution code for two cases:
1. Where the filter table was assumed to be aligned to a
256 byte boundary. The offset of the pixel in the
source buffer was computed incorrectly.
2. Where no such alignment assumption was made. An
incorrect address for the filter table base was used.
To fix both problems, I now assume that the filter table is
256-byte aligned and modify the pixel offset calculation to
match.
A later patch should remove the restriction that the filter
table is aligned to a 256-byte boundary.
There was also a bug in the ConvolveTest unit test
(convolve_test.cc).
(Bug & initial fix suggestion submitted by Tero Rintaluoma
and Sami Pietilä).
Change-Id: I71985551e62846e55e40de9e7e3959d4805baa82
|
|
Chromium does not support 32bit builds for Mac which use x86inc.asm.
Make the files which include it work if 64bit or not PIC enabled
starting with vp9_copy_sse2.asm
Consolidate these targets in vp9_rtcd_defs.sh
Change-Id: If18f0b957a611efd085a3ee7d245cf1eb91e8248
|
|
Call the individually optimized horizontal and vertical functions. This
implementation abuses the temp buffer.
This will be replaced with a custom optimized function.
Over 2x speedup.
Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd
|
|
Super basic conversion from the other implementations. Any changes to
one should be trivial to copy over keep in sync.
Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8
|
|
Independent horizontal and vertical implementations.
Requires that blocks be built from 4x4 and [xy]_step_q4 == 16
6-10% improvement. CIF improved the least.
Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
|
|
Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa
|
|
fixes issue #583
Change-Id: I4b855a5b5b168c8961410cef6ab5e6d86f14d301
|
|
No bitstream change.
Removes unused filters and the code for the case of 2 switchable filters;
also changes the 8tap-smooth filter coefficients for integer shifts to be
interpolating to be consistent with the way it is implemented currently.
Change-Id: I96c542fd8c06f4e0df507a645976f58e6de92aae
|
|
fix indent, whitespace, casts
Change-Id: Ifea8618a90f9da263a8955dd242bb3aa7fc59ae5
|
|
input_ is filled with random values just afterward.
the size was wrong anyway as input_ is allocated with memalign so
sizeof(input_)==sizeof(uint8_t*)
Change-Id: I014b832ac60960cd22b6f369dbc9fd648d4055b5
|
|
Updates the common convoloution code to support blocks larger than
16x16, and rectangular blocks. This uncovered a bug in the SSSE3
filtering routines due to the order of application of saturation.
This commit fixes that bug, adjusts the unit test to bias its
random values towards the extremes, and adds a test to ensure that
all filters conform to the expected pairwise addition structure.
Change-Id: I81f69668b1de0de5a8ed43f0643845641525c8f0
|
|
Since the 8-tap lowpass filter is non-interpolating, the results are
different between applying it at whole-pel values and not. This
means that 1D-only versions are requried to be implemented, as
opposed to being an optimization of the 2D case. Calling the 2D
filter instead of the horizontal-only filter is not equivalent
in this case. Update the test to pass invalid filters to the
unused stage of the 1D-only calls, to verify they're unused.
Change-Id: Idc1c490f059adadd4cc80dbe770c1ccefe628b0a
|
|
Updates the convolve test to verify that all filters match the
reference implementation. This verifies commit 30f866f, which
fixed some problems with the SSE3 version of the filters for
the vp9_sub_pel_filters_8s and vp9_sub_pel_filters_8lp banks
due to overflow and order of operations.
Change-Id: I6b5fe1a41bc20062e2e64633b1355ae58c9c592c
|
|
This avoids duplicating all the filters twice. Includes fixups to the
convolve routines and associated tests to make this work.
Change-Id: I922f86021594e55072ddb63b42b2313605db6e00
|
|
Ensure that all inter prediction goes through a common code path
that takes scaling into account. Removes a bunch of duplicate
1st/2nd predictor code. Also introduces a 16x8 mode for 8x8
MVs, similar to the 8x4 trick we were doing before. This has an
unexpected effect with EIGHTTAP_SMOOTH, so it's disabled in that
case for now.
Change-Id: Ia053e823a8bc616a988a0af30452e1e75a739cba
|
|
This commit adds the 8 tap SSSE3 subpixel filters back into the code
underneath the convolve API. The C code is still called for 4x4
blocks, as well as compound prediction modes. This restores the
encode performance to be within about 8% of the baseline.
Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c
|
|
This commit introduces a new convolution function which will be used to
replace the existing subpixel interpolation functions. It is much the
same as the existing functions, but allows for changing the filter
kernel on a per-pixel basis, and doesn't bake in knowledge of the
filter to be applied or the size of the resulting block into the
function name.
Replacing the existing subpel filters will come in a later commit.
Change-Id: Ic9a5615f2f456cb77f96741856fc650d6d78bb91
|