Age | Commit message (Collapse) | Author |
|
This reverts commit 89a1efa4c436c58c101c8b3de866e3014be7d77a.
This causes a segfault when decoding vp8, in both 32 and 64-bit
Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
|
|
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
|
|
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
|
|
8x8 Intra predictor implemented with MMX is replaced with SSE2.
Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
|
|
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
|
|
|
|
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
|
|
* changes:
add vp9_satd_neon
fix vp9_satd_sse2
vp9_satd: return an int
|
|
~60-65% faster at the function level across block sizes
Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893
|
|
Change-Id: Ic0ec32c1d7f7c08c9f956592dccbfd9060b1f624
|
|
accumulate satd in 32-bits
+ add unit test
Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5
|
|
the final sum may use up to 26 bits
+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit
Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
|
|
|
|
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
|
|
Change-Id: I5a4f1f7b9de20fbfc28cb743dcd29c0eeca736f8
|
|
Temporary fix to make sure it always passes.
Change-Id: I56a0529986ad7049b6090f871c14e9e06d573d5f
|
|
Change-Id: I635e37f81237e9703d7d9a11ed76a043f4ec6eb0
|
|
Change-Id: I9bfa80de73847d9be88b6ce9865d7bb5fafaaa57
|
|
The unit test requires a longer clip which is already in the repo.
Change-Id: Ic42e8d83e636fafd20d485a7f5f8422835319245
|
|
For 1 pass CBR mode: increase waiting time after key frame
before we start sampling rate control behavior for determining
resize. This change need to disable one internal resize(DownUp)
temporally since it requires a longer clip to do so.
Change-Id: If21beda1be23f169ee541ab4dd642f718347887a
|
|
this helps some toolchains (vs9) resolve the type of the parameter
Change-Id: I8c83b86da53b1783cd18c0f765b67ba33da91d72
|
|
this helps some toolchains (vs9) resolve the type of the parameter
Change-Id: Ic53b2ed5fbce05c5b5e633b4a4ef9ea75c55360a
|
|
this helps some toolchains (vs9) resolve the type of the parameter
Change-Id: I4acc8a844d1e55b766f66482bd6d32998174d70f
|
|
-l -> -sl, renamed in:
be3b08d [svc] Temporal svc with two pass rate control
Change-Id: I5a7b179b33d94e20e54825090659156dece928c0
|
|
Current threshold is little too strict.
Change-Id: I99ec1409d095e0c2fd3b7ab398742cabcc05700b
|
|
this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in
vp9_encoder_parms_get_to_decoder.cc
Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05
|
|
Refer to doc "vp9-test-vectors".
BUG=https://code.google.com/p/webm/issues/detail?id=1086
Change-Id: I523d1f39141a3a86f113604cbdb9cd41cc2d6470
|
|
These videos change resolution every 10 frames versus every 3 frames in current
test sets.
Change-Id: Ic33f449fc9b6d2f480825d4715b8f63e70801232
|
|
|
|
Change-Id: I70b1b8162a0c9b8501358ba7d32fecd1dc020ab5
|
|
|
|
Change-Id: Ic64b6928af7ae8ecc987f845b0bf0faecdacb072
|
|
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.
Further 2 optimizations are applied:
1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.
The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.
Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
|
|
to make meaning of color_range obvious.
Change-Id: I303582e448b82b3203b497e27b22601cc718dfff
|
|
single-threaded:
swanky (silvermont): ~1% faster overall
peppy (celeron,haswell): ~1.5% faster overall
Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073
|
|
Change-Id: Iccb4cdc23c1845cf9cb7d69101c9f4f43675d368
|
|
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.
Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
|
|
* changes:
vp9/tile_worker_hook: add multiple tile decoding
invalid_file_test: loosen error check w/tile-threading
|
|
some mingw32 configs define this. force this to be on to ensure the
build succeeds
Change-Id: I2cc490782b6a0736aa617e6a1457fc2bc984adbb
|
|
The serial decode check is too strict for tile-threaded decoding as
there is no guarantee on the decode order nor which specific error
will take precedence. Currently a tile-level error is not forwarded so
the frame will simply be marked corrupt.
Change-Id: I51cf1e39e44bedeac93746154b36a4ccb2f059b1
|
|
|
|
|
|
|
|
Change-Id: I936c2430c3c5b1e0ab5dec0a20110525e925b5e4
|
|
Change-Id: I2000820e0c04de2c975d370a0cf7145330289bb2
|
|
* changes:
vp9_thread_test: clarify test case names
vp9_thread_test: add non-frame-parallel files
|
|
|
|
define NOMINMAX to allow the std:: versions to be used; min/max will be
defined transitively via windows.h otherwise
Change-Id: I692b03fa3e70b7a53962d3fd209498f70f712fed
|
|
Change-Id: Iad73b490b171cdda5c368ada69fb8eab2a86c156
|
|
|