Age | Commit message (Collapse) | Author |
|
|
|
|
|
Change-Id: Idfd69e66e8982275eb00d8007a55efd1a4f86a98
|
|
|
|
- size_t vs int.
Change-Id: Ib47ebd932a4b69db9f52a43000bb69d0a96b9134
|
|
This reverts commit 90a9900abb79fabfd44189a959d14ca677c2777a
Seems to break the Mac build:
src/include/gtest/internal/gtest-port.h:1208:: pthread_mutex_lock(&mutex_)failed with error 22
Abort trap: 6
Change-Id: Icbe31161d7c27f1b0a28d33409e7712430bbf0ae
|
|
|
|
|
|
|
|
|
|
Improves the rd modeling function and implements them using interpolation
from a table which is a little faster. Also uses sse as input to the
modeling function rather than var - since there is no dc prediction
used and as a result the sse works a little better.
derfraw300: +0.05%
Speedup: ~1%
Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
|
|
dboolhuff.c(50) : warning C4267: 'initializing' : conversion from
'size_t' to 'int'
Change-Id: I6b85759efb2fa19f362f406623d8a7583a55c036
|
|
adds a new speed feature to force partitioning to be greater than
or less than a certain size
Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
|
|
this feature lets you set a partitioning size to be used by the entire
frame.
Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
|
|
This uses variance to split partition. Variance is calculated using
nearest mv, always from last ref frame.
Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
|
|
Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
|
|
Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
|
|
This uses the speed feature functionality for code.
Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
|
|
force us to go through slow partitioning for keyframes, altref and
overlays.
Change-Id: I1a286361bf74083e71973575a7296be46eb98742
|
|
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):
sse2 4x4: 99 -> 82 cycles
sse2 4x8: 128 cycles
sse2 8x4: 121 cycles
sse2 8x8: 149 -> 129 cycles
sse2 8x16: 235 -> 245 cycles (?)
sse2 16x8: 269 -> 203 cycles
sse2 16x16: 441 -> 349 cycles
sse2 16x32: 641 cycles
sse2 32x16: 643 cycles
sse2 32x32: 1733 -> 1154 cycles
sse2 32x64: 2247 cycles
sse2 64x32: 2323 cycles
sse2 64x64: 6984 -> 4442 cycles
ssse3 4x4: 100 cycles (?)
ssse3 4x8: 103 cycles
ssse3 8x4: 71 cycles
ssse3 8x8: 147 cycles
ssse3 8x16: 158 cycles
ssse3 16x8: 188 -> 162 cycles
ssse3 16x16: 316 -> 273 cycles
ssse3 16x32: 535 cycles
ssse3 32x16: 564 cycles
ssse3 32x32: 973 cycles
ssse3 32x64: 1930 cycles
ssse3 64x32: 1922 cycles
ssse3 64x64: 3760 cycles
Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
|
|
need to rework these
Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
|
|
The new print out includes skips and has prefixed sections so you can
grep to find things like transforms chosen on each frame.
Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b
|
|
|
|
Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
|
|
Enable sign bias check and round-trip error unit tests for 4x4 hybrid
transform modules.
Change-Id: Icd3d839f098d4b92b00ff76eac146765b039d0d3
|
|
|
|
Since intra block decoding is handled by decode_sb_intra() separately.
Change-Id: I42d757884714084c92fc23ec5d35d4dc946f4b15
|
|
Change-Id: Iab96e6a50aec543c63e15cd134f9d5f01ca7ceff
|
|
currently threading is internal to libvpx so thread safety is unneeded
in libgtest -- visual studio builds already operate in this way as they
do not have pthread.h available by default.
this removes an unconditional link to libpthread using $(extralibs)
should libvpx require it.
Change-Id: Ieae1d693406653a54b54fba818c598836797d33b
|
|
|
|
Optimized the quantization function by making it a two-pass
process. The first pass does a quick checking of the transform
coefficients against the base ZBIN, and only keep the good
enough set of coefficients for quantization. A skipping
check is added. If all coefficients are within the base ZBIN, no
quantization is needed. The second pass is the actual quantization
pass, which only processes the coefficient subset determined
in first pass. This reduces the computation. Furthermore, an
alternitive method is used for large transform size, which often
has sparse nonzero quantized coefficients.
Overall, the encoder speedup is about 4%. The quantization function
itself gets 20% faster.
Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
|
|
Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c
|
|
Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09
|
|
|
|
|
|
add ClearSystemState() to reset MMX registers avoiding corrupting
subsequent tests.
Change-Id: I668deb09aa7aa467709776e5819f936910698bc0
|
|
|
|
|
|
This commit makes use of dual fdct32x32 versions for rate-distortion
optimization loop and encoding process, respectively. The one for
rd loop requires only 16 bits precision for intermediate steps.
The original fdct32x32 that allows higher intermediate precision (18
bits) was retained for the encoding process only.
This allows speed-up for fdct32x32 in the rd loop. No performance
loss observed.
Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
|
|
|
|
|
|
|
|
fixes issue #583
Change-Id: I4b855a5b5b168c8961410cef6ab5e6d86f14d301
|
|
Change-Id: I052647e13dd24354888c890f6b4a987d989552ae
|
|
Change-Id: I927c7223996cdeb44f46e0e6c2e2054d458c300b
|
|
This seems to only be used in the encoder. Also remove an empty wrapper
file that contained forward declarations for this function, but didn't
actually define any actual functions.
Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b
|
|
Moving single function from vp9_invtrans.c to vp9_encodemb.c.
Change-Id: I26bf6bb90de342a3036c0dbfba78a7dd75a61fe7
|
|
2.5% faster when encoding first 50 frames of bus @ 1500kbps.
Change-Id: I5a64703996cf7fd39b07e32c72311c4b125ec6d4
|
|
|
|
|