summaryrefslogtreecommitdiff
path: root/vp8/decoder
AgeCommit message (Collapse)Author
2011-02-08clarify *_offsets.asm differencesJohann
it's difficult to mux the *_offsets.c files because of header conflicts. make three instead, name them consistently and partititon the contents to allow building them as required. Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796
2011-02-07move one of the offset filesJohann
common/arm/vpx_asm_offsets moves up a level. prepare for muxing with encoder/arm/vpx_vp8_enc_asm_offsets Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24
2011-02-04remove unused dboolhuff codeJohann
we were holding on to this "just in case." purge it instead Change-Id: I77a367b36d0821d731019f2566ecfffdae1d4b8a
2011-02-03Make vp8_adjust_mb_lf_value return the updated value rather thanGaute Strokkenes
manipulating it in situ via a pointer. Change-Id: If4a87a4eccd84f39577c0e91e171245f4954c5cf
2011-01-28Adds "armvX-none-rvct" targetsTero Rintaluoma
Adds following targets to configure script to support RVCT compilation without operating system support (for Profiler or bare metal images). - armv5te-none-rvct - armv6-none-rvct - armv7-none-rvct To strip OS specific parts from the code "os_support"-config was added to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS specific parts such as OS specific includes and function calls for timers and threads etc. This was done to enable RVCT compilation for profiling purposes or running the image on bare metal target with Lauterbach. Removed separate AREA directives for READONLY data in armv6 and neon assembly files to fix the RVCT compilation. Otherwise "ldr <reg>, =label" syntax would have been needed to prevent linker errors. This syntax is not supported by older gnu assemblers. Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43
2011-01-19Merge "Implement error tracking in the decoder"John Koleszar
2011-01-19Implement error tracking in the decoderHenrik Lundin
A new vpx_codec_control called VP8D_GET_FRAME_CORRUPTED. The output from the function is non-zero if the last decoded frame contains corruption due to packet losses. The decoder is also modified to accept encoded frames of zero length. A zero length frame indicates to the decoder that one or more frames have been completely lost. This will mark the last decoded reference buffer as corrupted. The data pointer can be NULL if the length is zero. Change-Id: Ic5902c785a281c6e05329deea958554b7a6c75ce
2011-01-18Merge "fix last frame buffer copy logic regression"John Koleszar
2011-01-11Remove unused local variablesHenrik Lundin
Removing unused local variables causing compiler warnings in Visual Studio. Change-Id: I0e2096303be1fdbc01428a6e57cca9796bb32c8a
2011-01-06fix last frame buffer copy logic regressionJohn Koleszar
Commit 0ce3901 introduced a change in the frame buffer copy logic where the NEW frame could be copied to the ARF or GF buffer through the copy_buffer_to_{arf,gf}==1 flags, if the LAST frame was not being refreshed. This is not correct. The intent of the copy_buffer_to_{arf,gf}==1 flag is to copy the LAST buffer. To copy the NEW buffer, the refresh_{alt_ref,golden}_frame flag should be used. The original buffer copy logic is fairly convoluted. For example: if (cm->refresh_last_frame) { vp8_swap_yv12_buffer(&cm->last_frame, &cm->new_frame); cm->frame_to_show = &cm->last_frame; } else { cm->frame_to_show = &cm->new_frame; } ... if (cm->copy_buffer_to_arf) { if (cm->copy_buffer_to_arf == 1) { if (cm->refresh_last_frame) vp8_yv12_copy_frame_ptr(&cm->new_frame, &cm->alt_ref_frame); else vp8_yv12_copy_frame_ptr(&cm->last_frame, &cm->alt_ref_frame); } else if (cm->copy_buffer_to_arf == 2) vp8_yv12_copy_frame_ptr(&cm->golden_frame, &cm->alt_ref_frame); } Effectively, if refresh_last_frame, then new and last are swapped, so when "new" is copied to ARF, it's equivalent to copying LAST to ARF. If not refresh_last_frame, then LAST is copied to ARF. So LAST is copied to ARF in both cases. Commit 0ce3901 removed the first buffer swap but kept the refresh_last_frame?new:last behavior, changing the sense since the first swap wasn't done to the more readable refresh_last_frame?last:new, but this logic is not correct when !refresh_last_frame. This commit restores the correct behavior from v0.9.1 and prior. This case is missing from the test vector set. Change-Id: I8369fc13a37ae882e31a8a104da808a08bc8428f
2010-11-17vp8mt_alloc_temp_buffers: make prototype return voidJohn Koleszar
This function was never called in a context expecting a return value, the return value was always a constant, and the !CONFIG_MULTITHREAD path didn't have a return statement, which caused a compiler warning. This patch changes the function to return void instead. Fixes issue #231 Change-Id: I9ef7f56e54418b7265026c54fc4ed5660c1418d1
2010-11-10postproc : Re-work posproc calling to allow more flags.Fritz Koenig
Debugging in postproc needs more flags to allow for specific block types to be turned on or off in the visualizations. Must be enabled with --enable-postproc-visualizer during configuration time. Change-Id: Ia74f357ddc3ad4fb8082afd3a64f62384e4fcb2d
2010-11-05Merge commit 'fix integer promotion bug in partition size check'John Koleszar
Change-Id: I4081917b46013fa8f4218cade8bd12cb2d013aee
2010-11-05fix integer promotion bug in partition size checkJohn Koleszar
The check '(user_data_end - partition < partition_size)' must be evaluated as a signed comparison, but because partition_size was unsigned, the LHS was promoted to unsigned, causing an incorrect result on 32-bit. Instead, check the upper and lower bounds of the segment separately. Change-Id: I6266aba7fd7de084268712a3d2a81424ead7aa06
2010-10-27Eliminate more warnings.Timothy B. Terriberry
This eliminates a large set of warnings exposed by the Mozilla build system (Use of C++ comments in ISO C90 source, commas at the end of enum lists, a couple incomplete initializers, and signed/unsigned comparisons). It also eliminates many (but not all) of the warnings expose by newer GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite without checking the return values). There are a few spurious warnings left on my system: ../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used uninitialized in this function gcc seems to be unable to figure out that the value shortcut doesn't change between the two if blocks that test it here. ../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned expression >= 0 is always true ../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned expression >= 0 is always true This is true, so far as it goes, but it's comparing against an enum, and the C standard does not mandate that enums be unsigned, so the checks can't be removed. Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
2010-10-27fix implicit declarationsJohann
ARM used to explicitly remove this file from the build. With the RTCD changes, that's no longer possible. These errors also exist for x86 w/o RTCD, but that's not the default configuration Change-Id: I3e10e5553ddf3278e8d3c9365ca6fb84f52f5066
2010-10-25Add runtime CPU detection support for ARM.Timothy B. Terriberry
The primary goal is to allow a binary to be built which supports NEON, but can fall back to non-NEON routines, since some Android devices do not have NEON, even if they are otherwise ARMv7 (e.g., Tegra). The configure-generated flags HAVE_ARMV7, etc., are used to decide which versions of each function to build, and when CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen at run time. In order for this to work, the CFLAGS must be set to something appropriate (e.g., without -mfpu=neon for ARMv7, and with appropriate -march and -mcpu for even earlier configurations), or the native C code will not be able to run. The ASFLAGS must remain set for the most advanced instruction set required at build time, since the ARM assembler will refuse to emit them otherwise. I have not attempted to make any changes to configure to do this automatically. Doing so will probably require the addition of new configure options. Many of the hooks for RTCD on ARM were already there, but a lot of the code had bit-rotted, and a good deal of the ARM-specific code is not integrated into the RTCD structs at all. I did not try to resolve the latter, merely to add the minimal amount of protection around them to allow RTCD to work. Those functions that were called based on an ifdef at the calling site were expanded to check the RTCD flags at that site, but they should be added to an RTCD struct somewhere in the future. The functions invoked with global function pointers still are, but these should be moved into an RTCD struct for thread safety (I believe every platform currently supported has atomic pointer stores, but this is not guaranteed). The encoder's boolhuff functions did not even have _c and armv7 suffixes, and the correct version was resolved at link time. The token packing functions did have appropriate suffixes, but the version was selected with a define, with no associated RTCD struct. However, for both of these, the only armv7 instruction they actually used was rbit, and this was completely superfluous, so I reworked them to avoid it. The only non-ARMv4 instruction remaining in them is clz, which is ARMv5 (not even ARMv5TE is required). Considering that there are no ARM-specific configs which are not at least ARMv5TE, I did not try to detect these at runtime, and simply enable them for ARMv5 and above. Finally, the NEON register saving code was completely non-reentrant, since it saved the registers to a global, static variable. I moved the storage for this onto the stack. A single binary built with this code was tested on an ARM11 (ARMv6) and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder, and produced identical output, while using the correct accelerated functions on each. I did not test on any earlier processors. Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
2010-10-22Merge "Improve handling of invalid frames."John Koleszar
Change-Id: Icef5226a70260607c190126c1c0cc28b796e759c
2010-10-22Improve handling of invalid frames.Timothy B. Terriberry
The code was not checking for frame sizes smaller than 3 bytes, and the partition size checks might have failed if the input buffer was within 16MB of the top of the heap. In addition, the reference count on the current frame buffer was not being decremented on error, so after a small number of errors, no new frame buffer could be found and it would run off the list of them. Change-Id: I0c60dba6adb1e2a29df39754f72a56ab6c776b46
2010-10-21Convert [4][4] matrices to [16] arrays.Timothy B. Terriberry
Most of the code that actually uses these matrices indexes them as if they were a single contiguous array, and coverity produces reports about the resulting accesses that overflow the static bounds of the first row. This is perfectly legal in C, but converting them to actual [16] arrays should eliminate the report, and removes a good deal of extraneous indexing and address operators from the code. Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
2010-10-04nasm: address labels 'rel label' vice 'wrt rip'Jan Kratochvil
nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04nasm: match instruction length (movd/movq) to parametersJan Kratochvil
nasm requires the instruction length (movd/movq) to match to its parameters. I find it more clear to really use 64bit instructions when we use 64bit registers in the assembly. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-09-27move reconintra_mt to decoder (fixup)John Koleszar
Missed the .h file in the move. Change-Id: Ib408183fbb4d019fd46394b362f89ca6ea9d10bc
2010-09-24Merge "move reconintra_mt to decoder (for now)"John Koleszar
2010-09-24move reconintra_mt to decoder (for now)John Koleszar
reconintra_mt.c is only required for building the decoder right now. It could definitely be used for the encoder in the future, but it currently depends on decoder only data structures. (onyxd_int.h, VP8D_COMP, etc). Move it from common/ to decoder/ until the necessary changes to the common multithread code are complete. This patch is needed to build with --disable-vp8-decoder. Change-Id: I568c52221a2b309234d269675cba97131ce35c86
2010-09-23Adjust multi-thread sync ranges according to image sizesYunqing Wang
In multi-threaded decoder, set different sync ranges for different video resolutions. Change-Id: Iea48fd36f51919e0152c8ed3b1f10e1b723c0ca7
2010-09-21unset execute bit on c sourceJohn Koleszar
Change-Id: I6625ee41f8872908cb015ce0729e1c7a105b5217
2010-09-21Merge "Don't reset mb clamping state during splitmv decoding"John Koleszar
2010-09-21Don't reset mb clamping state during splitmv decodingJohn Koleszar
The MV decoding changes in c5fb0eb introduced a bug where the macroblock clamping state was reset for each partition, so if an earlier partition needed clamping but a subsequent one didn't, the MB wouldn't receive clamping. Instead, the state is only set during splitmv decoding, never cleared. Change-Id: I224fe258493405ee0f6a04596acdb622c475e845
2010-09-21Merge "Restructure multi-threaded decoder"Yunqing Wang
2010-09-20Merge "reorder data to use wider instructions"Johann
2010-09-20Merge "Update NEON wide idcts"Johann
2010-09-17reorder data to use wider instructionsJohann
the previous commit laid the groundwork by doing two sets of idcts together. this moved that further by grouping the interesting data (q[0], q+16[0]) together to allow using wider instructions. also managed to drop a few instructions by recognizing that the constant for sinpi8sqrt2 could be downshifted all the time which avoided a dowshift as well as workarounds for a function which only accepted signed data looks like a modest gain for performance: at qcif, went from ~180 fps to ~183 Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
2010-09-17Restructure multi-threaded decoderYunqing Wang
On each MB, loopfiltering is done right after MB decoding. This combines two loops in multi-threaded code into one, which reduces number of synchronizations to half. The above-row/left-col data are saved in temp buffers for next-row/next MB decoding. Tests on 4-core gLucid machine showed 10% decoder performance gain with threads=4 (tulip clip). Testing on other platforms isn't done yet. Change-Id: Id18ea7c1e84965dabea65d4c01ca5bc056ddeac9
2010-09-16cleanup: remove unused xprintfJohn Koleszar
These files aren't currently used, and we can get them back if we need them. Change-Id: I62aa3bff828e491a80c80eeb84a7c44903df29b5
2010-09-09Improved subset block searchScott LaVarnway
Improved the subset block search and fill. (about 3% improvement for 32 bit) Modified/merged the code in order to create vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock level. This will allow the decode loop (in the future) to decode modes/mvs on a frame, row, or mb level. Change-Id: If637d994b508792f846d39b5d44a7bf9aa5cddf3
2010-09-09Update NEON wide idctsJohann
Expand 93c32a55 which used SSE2 instructions to do two idct/dequant/recons at a time to NEON. Initial working commit. More work needs to be put into rearranging and interlacing the data to take advantage of quadword operations, which is when we'll hopefully see a much better boost Change-Id: I86d59d96f15e0d0f9710253e2c098ac2ff2865d1
2010-09-09Use WebM in copyright notice for consistencyJohn Koleszar
Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-09-03Reduced the size of MB_MODE_INFOScott LaVarnway
Moved partition_bmi and partition_count out of MB_MODE_INFO and placed into MACROBLOCK. Also reduced the size of other members of the MB_MODE_INFO struct. For 1080p, the memory was reduced by 1,209,516 bytes. The decoder performance appeared to improve by 3% for the clip used. Note: The main goal for this change is to improve the decoder performance. The encoder will be revisited at a later date for further structure cleanup. Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
2010-09-01Fix rare deadlock before loop filterFrank Galligan
There was an extremely rare deadlock that happened when one thread was waiting to start the loop filter on frame n while the other threads were starting to work on frame n+1. Change-Id: Icc94f728b3b6663405435640d9a2996735ba19ef
2010-08-31Replace sleep(0) calls in multi-threaded decoderYunqing Wang
This is a workaround for gLucid problem. Change-Id: I188a016a07e4c2ea212444c5a6284ff3c48a5caa
2010-08-31followup arm patchJohann
make the arm asm detokenizer work with the new structures Change-Id: I7cd92c2a018ec24032bb1cfd1bb9739bc84b444a
2010-08-31Changed above and left context data layoutScott LaVarnway
The main reason for the change was to reduce cycles in the token decoder. (~1.5% gain for 32 bit) This layout should be more cache friendly. As a result of this change, the encoder had to be updated. Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837 Note: dixie uses a similar layout
2010-08-24clean up compiler warningsJohann
did a test compile with clang and got rid of some warnings that have been annoying me for a while: vp8/decoder/detokenize.c: In function 'vp8_init_detokenizer': vp8/decoder/detokenize.c:121: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:122: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:123: warning: assignment from incompatible pointer type vp8/decoder/detokenize.c:124: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:125: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:128: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:129: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:130: warning: assignment discards qualifiers from pointer target type vp8/decoder/detokenize.c:131: warning: assignment discards qualifiers from pointer target type Change-Id: I78ddab176fe47cbeed30379709dc7bab01c0c2e4
2010-08-23update structuresJohann
mbmi and eob moved in previous commits Change-Id: I30a2eba36addf89ee50b406ad4afdd059a832711
2010-08-23Rework idct calling structure.Fritz Koenig
Moving the eob structure allows for a non-struct based function to handle decoding an entire mb of idct/dequant/recon data. This allows for SIMD functions to idct/dequant/recon multiple blocks at once. SSE2 implementation gives 3% gain on Atom. Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2
2010-08-12framework for assembly version of the detokenizerJohann
adds a compile time option: --enable-arm-asm-detok which pulls in vp8/decoder/arm/detokenize.asm currently about break even speed wise, but changes are pending to the fill code (branch and load 3 bytes versus conditionally always load one) and the error handling. Currently it doesn't handle zero runs or overrunning the buffer. this is really just so i don't have to rebase my changes all the time to run benchmarks - now just need to replace one file! Change-Id: I56d0e2354dc0ca3811bffd0e88fe1f952fa6c797
2010-08-12Removed unnecessary MB_MODE_INFO copiesScott LaVarnway
These copies occurred for each macroblock in the encoder and decoder. Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD. As a result, a large number compile errors had to be fixed. Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3
2010-08-11Moved gf_active code to encoder onlyScott LaVarnway
The gf_active code is only used by the encoder, so it was moved from common and decoder. Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025
2010-08-10First modification of multi-thread decoderYunqing Wang
This is the first modification of VP8 multi-thread decoder, which uses same threads to decode macroblocks and then do loopfiltering for each frame. Inspired by Rob Clark, synchronization was done on every 8 macroblocks instead of every macroblock to reduce lock contention. Comparing with the original code, this implementation gave about 15%- 20% performance gain while decoding my test clips on a Core2 Quad platform (Linux). The work is not done yet. Test on other platforms are needed. Change-Id: Ice9ddb0b511af1359b9f71e65066143c04fef3b5