Age | Commit message (Collapse) | Author |
|
|
|
Change-Id: I91921b0a90dbaddc7010380b038955be347964b3
|
|
Use aligned store.
Change-Id: Icab4c0c53da811d0c52bb7e8134927f249ba2499
|
|
Added SSSE3 function
Change-Id: I8c304c92458618d93fda3a2f62bd09ccb63e75ad
|
|
Removed some unused functions
Change-Id: Ifdfc27453e53cfc75997b38492901d193a16b245
|
|
Rewrote these functions to process 16 pixels once instead of 8.
Change-Id: Ic67e80124467a446a3df4cfecfb76a4248602adb
|
|
Skip filter at zero offset.
Change-Id: I95fc7e211869bc0ab5bcfb7ab2e3259d1c0ccf38
|
|
1. Process 16 pixels at one time instead of 8.
2. Add check for both xoffset =0 and yoffset=0, which happens
during motion search.
This change gave encoder 1%~3% performance gain.
Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
|
|
|
|
This improved encoding performance by 0.5% (good, speed 1) to
1.5% (good, speed 5).
Change-Id: I843d72a0d68a90b5f694adf770943e4a4618f50e
|
|
It is not used in realtime mode. Reduces memory footprint.
Change-Id: I7f163225762368df5457cfd413050161d3704a3f
|
|
This reverts commit f50f2fd2a73f2c5ee3f10ad077e780398df17cd7.
Change Ib7506e3e aligns the buffer
Change-Id: Ie0f8bd3e57cfdfef81d39638a1451458ebbae2e0
|
|
|
|
In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).
Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
|
|
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.
Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
|
|
|
|
In sub-pixel calculation, xoffset and yoffset mostly take some
specific values. Modified sub-pixel filter functions according to
these possible values to improve performance.
Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
|
|
Remove allocation/deallocation of stats storage.
Remove full search functions in machine specific encoder inits.
Remove last pass validation in validate_config.
Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
|
|
about ~5% gain on 32bit. disabled for 64bit
unset executable bit on ssse3 version (cosmetic)
Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
|
|
source buffer is not guaranteed to be aligned for odd size buffers
Change-Id: Id0b1fd40ba3bd6c994bcfada788feccd2b53c5a9
|
|
count can be reduced to short because the max number of filtered frames
is set to 15. the max value for any frame is 32 (modifier = 16,
filter_weight = 2). 15*32 = 480 which requires 9 bits
this function goes from about 7000 us / 1000 iterations for the C code
to < 275 us / 1000 iterations for sse2 for block_size = 16 and from
about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8
Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
|
|
Use the fast quantizer for inter mode selection and the
regular quantizer for the rest of the encode for good quality,
speed 1. Both performance and quality were improved. The
quality gains will make up for the quality loss mentioned in
I9dc089007ca08129fb6c11fe7692777ebb8647b0.
Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
|
|
This code is unused, as the current preproc implementation uses the
same spatial filter that postproc uses.
Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
|
|
Changed the end of block computation to use pmaxw. Removed
additional pushing and popping of registers that was not needed.
Change-Id: I08cb9b424513cd8a2c7ad8cea53b4e2adc66ef98
|
|
x86-64 passes arguments in registers. There is no need to push
them to the stack before using them.
This fixes 15acc84f10cefd98b2f8dbd2eac2cc92c5a3f851 where ebx
was not getting preserved on x86.
Change-Id: I1214b5f818a0201f75ab6ad7d5c6f448e09b16c2
|
|
This reverts commit 15acc84f10cefd98b2f8dbd2eac2cc92c5a3f851.
Change-Id: Ia640be8cbc134432914849c1750f62575ea084e6
|
|
|
|
Fixed up the fdct for mmx and 8x4 sse2 to match them
most recent changes.
Change-Id: Ibee2d6c536fe14dcf75cd6eb1c73f4848a56d719
|
|
(test clip: tulip)
For good quality mode with speed=1, this gave the encoder
a small (2 - 3%) performance boost.
Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
|
|
XMM6/7 are used in these functions, and need to be saved.
Change-Id: I3dfaddaf2a69cd4bf8e8735c7064b17bac5a14e5
|
|
Unlike GCC, Visual Studio compiler doesn't allocate SAD output
array 16-byte aligned, which causes crash in visual studio.
Change-Id: Ia755cf5a807f12929bda8db94032bb3c9d0c2362
|
|
Use mpsadbw, and calculate 8 sad at once. Function list:
vp8_sad16x16x8_sse4
vp8_sad16x8x8_sse4
vp8_sad8x16x8_sse4
vp8_sad8x8x8_sse4
vp8_sad4x4x8_sse4
(test clip: tulip)
For best quality mode, this gave encoder a 5% performance boost.
For good quality mode with speed=1, this gave encoder a 3%
performance boost.
Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
|
|
This patch fixes the system dependent entries for the half-pixel
variance functions in both the RTCD and non-RTCD cases:
- The generic C versions of these functions are now correct.
Before all three cases called the hv code.
- Wire up the ARM functions in RTCD mode
- Created stubs for x86 to call the optimized subpixel functions
with the correct parameters, rather than falling back to C
code.
Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
|
|
These functions made global references but did not set up the GOT,
causing compilation failures in PIC mode.
Change-Id: Iac473bf46733f87eb2e001cd736af4acf73fa51d
|
|
Most of the code that actually uses these matrices indexes them as
if they were a single contiguous array, and coverity produces
reports about the resulting accesses that overflow the static
bounds of the first row.
This is perfectly legal in C, but converting them to actual [16]
arrays should eliminate the report, and removes a good deal of
extraneous indexing and address operators from the code.
Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
|
|
Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
|
|
x86-64 passes most arguments in registers. There is no need to
push them to the stack before using them.
Change-Id: I13c683f1358782682ecafaf1df3fb0af23b978ea
|
|
This rewriting reflects changes made in commit "Improve the
accuracy of forward walsh-hadamard transform". Since this function
is not called much, only a small encoder performance gain (~0.5% )
is seen.
Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
|
|
Instead of doing 8-bit data unpack and 16-bit subtraction, use
psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
sign information. This does not bring noticable gain since
these functions are not called frequently.
Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
|
|
Typo had function defined as _ssse2 and prototyped as _sse2.
Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe
|
|
QWORD was being undefined because it was being used
incorrectly.
Change-Id: I3610cefa3d6f0da4054316760f78b9694cde3876
|
|
|
|
These functions should never change their input, and there's no
reason not to declare that.
This allows them to be passed static const data.
Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
|
|
|
|
Remove vp8/encoder/x86/csystemdependent.c
Change-Id: I7c590dcd07b68704d463a1452f62f29ffb1402f4
|
|
Moved vp8_fast_quantize_b_sse from quantize_mmx.asm into
quantize_sse2.asm and renamed. Updated the assembly code to
match the C version.
Change-Id: I1766d9e1ca60e173f65badc0ca0c160c2b51b200
|
|
nasm does not support `label wrt rip', it requires `rel label'. It is
still fully compatible with yasm.
Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
|
|
nasm requires the instruction length (movd/movq) to match to its
parameters. I find it more clear to really use 64bit instructions when
we use 64bit registers in the assembly.
Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
|
|
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
|
|
Labels should end by colon (':'), nasm requires it.
Provide nasm compatibility. No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Change-Id: I0b2ec6f01afb061d92841887affb5ca0084f936f
|