Age | Commit message (Collapse) | Author |
|
The code was not checking for frame sizes smaller than 3 bytes, and the
partition size checks might have failed if the input buffer was within
16MB of the top of the heap.
In addition, the reference count on the current frame buffer was not
being decremented on error, so after a small number of errors, no new
frame buffer could be found and it would run off the list of them.
Change-Id: I0c60dba6adb1e2a29df39754f72a56ab6c776b46
|
|
|
|
Also, move with other ppc32 options
Change-Id: I0b97413c767909c5682afc9bdd954f3d43401f6c
|
|
|
|
The MV decoding changes in c5fb0eb introduced a bug where the
macroblock clamping state was reset for each partition, so if an
earlier partition needed clamping but a subsequent one didn't,
the MB wouldn't receive clamping. Instead, the state is only
set during splitmv decoding, never cleared.
Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
|
|
|
|
|
|
|
|
|
|
Movdqu is more expensive (throughput, uops) than movq. Minimal
impact for newer big cores, but ~2.25% gain on Atom.
Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
|
|
|
|
|
|
|
|
Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.
Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
|
|
The patch related with issue #55 (5a72620) fixed some warnings, but the
fix was not optimal. It actually was a trick to confuse compiler rather
than a fix.
This patch fixes it by creating a new macro used when needed just a high
limit check for an unsigned.
Change-Id: I94b322e0f7fb07604b3b1df1f9321185f48cfcb5
|
|
the previous commit laid the groundwork by doing two sets of idcts
together. this moved that further by grouping the interesting data
(q[0], q+16[0]) together to allow using wider instructions. also
managed to drop a few instructions by recognizing that the constant
for sinpi8sqrt2 could be downshifted all the time which avoided a
dowshift as well as workarounds for a function which only accepted
signed data
looks like a modest gain for performance: at qcif, went from ~180
fps to ~183
Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
|
|
On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.
The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.
Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.
Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
|
|
These files aren't currently used, and we can get them back if we
need them.
Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5
|
|
This patch reduces the size of the global tables maintained by the
tokenizer to 16k from 80k-96k. See issue #177.
Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
|
|
GET_GOT was producing a zero length call. This resulted in
pipeline flushes occuring when returing from the assembly
functions. Masked on out of order cores, but evident on
Atom cores.
Change-Id: I8c375af313e8a169c77adbaf956693c0cfeb5ccd
|
|
There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.
Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
|
|
|
|
Fixes issue 89. Thanks to josejx for the patch.
Change-Id: I7e664fed703b49f2fb3af4c5e6ce1173742000c2
|
|
Change-Id: I88ddb0afb56ef2be8184b56fe125ad938ead7a84
|
|
Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.
Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
|
|
|
|
Improved the subset block search and fill. (about 3% improvement for
32 bit) Modified/merged the code in order to create
vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock
level. This will allow the decode loop (in the future) to decode
modes/mvs on a frame, row, or mb level.
Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3
|
|
Expand 93c32a55 which used SSE2 instructions to do two
idct/dequant/recons at a time to NEON. Initial working
commit. More work needs to be put into rearranging and
interlacing the data to take advantage of quadword
operations, which is when we'll hopefully see a much
better boost
Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1
|
|
When ARFs are enabled in non-lagged compress modes, the GF interval
was being reset to zero. Non-lagged ARF updates were enabled in commit
63ccfbd, but this incorrect GF interval caused a quality regression.
Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3
|
|
|
|
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
|
|
vp8_get_compressed_data() was defeating logic in
encode_frame_to_datarate() that determined the reference buffers to
search and forcing all frames to be eligible to search. In cases
where buffers have identical contents, this is unnecessary extra
work.
Change-Id: I9e667ac39128ae32dc455a3db4c62e3efce6f114
|
|
ARFs were explicitly disabled except in lagged compress mode. New
ARF logic allows for the ARF buffer to hold an older golden frame,
which does not require lagged compress.
Change-Id: I1dff82b6f53e8311f1e0514b1794ae05919d5f79
|
|
Used pmaddubsw for multiply and add of two filter taps
at once for 16x16 and 8x8 blocks.
Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea
|
|
Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK. Also reduced the size of other members
of the MB_MODE_INFO struct. For 1080p, the memory was reduced
by 1,209,516 bytes. The decoder performance appeared to improve
by 3% for the clip used.
Note: The main goal for this change is to improve the decoder
performance. The encoder will be revisited at a later date for
further structure cleanup.
Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
|
|
Change-Id: I184e927987544e9f34f890249b589ea13a93a330
|
|
Change-Id: I0395ffa107651a773fd11d12682ab9372f76a90b
|
|
Change-Id: I8b9fdf9875a8fcff4cb49a3357ce44f18108c2e7
|
|
Changed to use QueryPerformanceCounter on Windows rather than only
when building with MSVC, so that MSVC can link libs built with
MinGW.
Fixes issue #149.
Change-Id: Ie2dc7edc8f4d096cf95ec5ffb1ab00f2d67b3e7d
|
|
gcc -dumpmachine returns only 'mingw32'
Change-Id: I774d05a97c5131fc12009e436712c319e54490a5
|
|
Fixes http://code.google.com/p/webm/issues/detail?id=112
Thanks to Ramiro Polla for the issue/fix.
Change-Id: I7f7b547a4ea3270e183f59280510066cc29a619e
|
|
Remove the dependency on postproc.c for the encoder in general, the only
unchecked need for it is when CONFIG_PSNR is enabled. All other cases
are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file
will still be included.
Additionally, when VP8_SET_POSTPROC is used with the encoder when post
processing has been disabled an error will be returned.
This addresses issue #153.
Change-Id: Ia6dfe20167f7077734a6058cbd1d794550346089
|
|
|
|
|
|
This allows experiments of using different rounding and
zerobin constants for 2nd order blocks.
Change-Id: Idd829adba3edd1f713c66151a8d29bb245e33a71
|
|
This is not the behavior that most users expect.
Change-Id: I226126ea400c22cf1f7918e80ea7fe0771c569cb
|
|
There was an extremely rare deadlock that happened when one thread
was waiting to start the loop filter on frame n while the other
threads were starting to work on frame n+1.
Change-Id: Icc94f728b3b6663405435640d9a2996735ba19ef
|
|
|
|
This is a workaround for gLucid problem.
Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa
|
|
These changes improve the behaviour of the code with
forced key frames sent in by a calling application.
The sizing of the frames is still suboptimal for two pass in
particular but the behaviour is much better than it was.
Change-Id: I35fae610c67688ccc69d11f385e87dfc884e65a1
|