Age | Commit message (Collapse) | Author |
|
Initial ssse3 convolve avg functions and is one step closer
to using x86inc.asm. The decoder performance improved by 8% for
the test clip used. This should be revisited later to see if
averaging outside the loop is better than having many similar
filter functions.
Change-Id: Ice3fafb423b02710b0448ffca18b296bcac649e9
|
|
A 16 bit overflow condition occurs when using the EIGHTTAP_SMOOTH filters.
(vp9_sub_pel_filters_8lp) Changed the order of the adds to fix this problem.
Also added ssse3 support for 4x4 subpixel filtering.
Change-Id: I475eaadae920794c2de5e01e9735c059a856518e
|
|
* changes:
Restore SSSE3 subpixel filters in new convolve framework
Convert subpixel filters to use convolve framework
Add 8-tap generic convolver
|
|
This commit adds the 8 tap SSSE3 subpixel filters back into the code
underneath the convolve API. The C code is still called for 4x4
blocks, as well as compound prediction modes. This restores the
encode performance to be within about 8% of the baseline.
Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c
|
|
Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
|
|
Update the code to call the new convolution functions to do subpixel
prediction rather than the existing functions. Remove the old C and
assembly code, since it is unused. This causes a 50% performance
reduction on the decoder, but that will be resolved when the asm for
the new functions is available.
There is no consensus for whether 6-tap or 2-tap predictors will be
supported in the final codec, so these filters are implemented in
terms of the 8-tap code, so that quality testing of these modes
can continue. Implementing the lower complexity algorithms is a
simple exercise, should it be necessary.
This code produces slightly better results in the EIGHTTAP_SMOOTH
case, since the filter is now applied in only one direction when
the subpel motion is only in one direction. Like the previous code,
the filtering is skipped entirely on full-pel MVs. This combination
seems to give the best quality gains, but this may be indicative of a
bug in the encoder's filter selection, since the encoder could
achieve the result of skipping the filtering on full-pel by selecting
one of the other filters. This should be revisited.
Quality gains on derf positive on almost all clips. The only clip
that seemed to be hurt at all datarates was football
(-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR,
0.347% SSIM.
Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff
|
|
Updated the instrinsic code to match Yaowu's latest loopfilter change.
(I584393906c4f5f948a581d6590959522572743bb)
The decoder performance improved by ~30% for the test clip used.
Change-Id: I026cfc75d5bcb7d8d58be6f0440ac9e126ef39d2
|
|
During master jenkins verification proces
Change-Id: I3722b8753eaf39f99b45979ce407a8ea0bea0b89
|
|
Change-Id: I0c94475075e66e13cfe4c20fab7db6474441ae86
|
|
experimental
|
|
and vp9_mb_lpf_vertical_edge_w_sse2. This was quickly done so we can
run some tests over the weekend. Future commits will optimize/refactor these
functions further.
The decoder performance improved by ~17% for the clip used.
Change-Id: I612687cd5a7670ee840a0cbc3c68dc2b84d4af76
|
|
|
|
On block boundary within a MB when 8x8 block boundary only is filtered
for Y.
Change-Id: Ie1c804c877d199e78e2fecd8c2d3f1e114ce9ec1
|
|
Updated the rtcd_defs and used the sse2 uv version
of the loopfilter. The performance improved by ~8%
for the test clip used.
Change-Id: I5a0bca3b6674198d40ca4a77b8cc722ddde79c36
|
|
|
|
About 5% decoder speedup.
Change-Id: Ib6687d337af758a536a0e7e289f400990f1f9794
|
|
Incorportate vp9-preview changes by merging master branch into experimental.
Conflicts:
test/test.mk
vp9/common/vp9_filter.c
vp9/common/vp9_idctllm.c
vp9/common/vp9_invtrans.h
vp9/common/vp9_mbpitch.c
vp9/common/vp9_rtcd_defs.sh
vp9/common/vp9_systemdependent.h
vp9/common/vp9_type_aliases.h
vp9/common/x86/vp9_asm_stubs.c
vp9/common/x86/vp9_subpixel_mmx.asm
vp9/decoder/vp9_decodframe.c
vp9/decoder/vp9_dequantize.c
vp9/decoder/vp9_dequantize.h
vp9/decoder/vp9_onyxd_int.h
vp9/encoder/vp9_bitstream.c
vp9/encoder/vp9_encodeframe.c
vp9/encoder/vp9_rdopt.c
Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5
|
|
Various fixups to resolve issues when building vp9-preview under the more stringent
checks placed on the experimental branch.
Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
|
|
These filters will not work with VP9.
Change-Id: Ic26c77961084fcea6bfa97f4cd95afdea2282e85
|
|
|
|
Change-Id: Ibc077cf1c1da0c86063f88c6d3073c6876989119
|
|
Change-Id: If7822e6fcd0d3568b934032322b19ba3e401df26
|
|
Change-Id: I6e43ca73f35401a974ed8ee27738d4318f09fd37
|
|
|
|
Change-Id: I467bf0fdf3b35326bcce58d5459e6d2dbfd6c5e5
|
|
Change-Id: I2c252f3ddcc99e96c1f5d3dab8bcb25a2a3637ea
|
|
|
|
Change-Id: I20c426e91ee49666db42e20eb074095ab6b8ec5d
|
|
Change-Id: Ieefd76e164ca4aa87597da0412977614ddfbacb7
|
|
and some miscellaneous invoke left overs
Change-Id: I63191b1bfd3bea4ce30cceaeb686ec850570fc43
|
|
This change included:
1. Aligned reads in vp9_mbloop_filter_vertical_edge function.
Since we actually read 16 bytes, we can align the reads to read
starting at (s - 8) instead of (s - 5).
2. Combined u, v loop filters.
3. Added 8x16 transpose.
This gave 2% decoder performance gain (tulip clip).
Change-Id: Ib14c2f1645c4a3436df17fe2f24789506bf0bb58
|
|
Support for gyp which doesn't support multiple objects in the same
static library having the same basename.
Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
|
|
Vp9_sad3x16_sse2() is heavily called in decoder, in which the
unaligned reads consume lots of cpu cycles. When CONFIG_SUBPELREFMV
is off, the unaligned offset is 1. In this situation,
we can adjust the src_ptr to be 4-byte aligned, and then do the
aligned reads. This reduced the reading time significantly. Tests
on 1080p clip showed over 2% decoder performance gain with
CONFIG_SUBPELREFM off.
Change-Id: I953afe3ac5406107933ef49d0b695eafba9a6507
|
|
More cleanup to do after this, but this is a good chunk of removing rtcd.
Change-Id: I551db75e341a0a85c3ad650df1e9a60dc305681a
|
|
Removed the rtcd subpixel invoke functions.
Change-Id: I8b7618bd5813333fac66b2817bdf807616e0fb33
|
|
Used ref_stride.
Change-Id: I31f0a3bb935520f54d11a1d87315627f162ae845
|
|
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
|
|
Change-Id: Ib39ad47a7d188f3b45416937b7eeb28c3e79b74c
|
|
use unsigned ints to extended filter values in
vp9_mbloop_filter_horizontal_edge_c_sse2
Change-Id: I55ec3ac2bcb9baf55626b0384d151b07fc8e087d
|
|
Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4
|