libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2010-10-25	Add runtime CPU detection support for ARM.	Timothy B. Terriberry
	The primary goal is to allow a binary to be built which supports NEON, but can fall back to non-NEON routines, since some Android devices do not have NEON, even if they are otherwise ARMv7 (e.g., Tegra). The configure-generated flags HAVE_ARMV7, etc., are used to decide which versions of each function to build, and when CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen at run time. In order for this to work, the CFLAGS must be set to something appropriate (e.g., without -mfpu=neon for ARMv7, and with appropriate -march and -mcpu for even earlier configurations), or the native C code will not be able to run. The ASFLAGS must remain set for the most advanced instruction set required at build time, since the ARM assembler will refuse to emit them otherwise. I have not attempted to make any changes to configure to do this automatically. Doing so will probably require the addition of new configure options. Many of the hooks for RTCD on ARM were already there, but a lot of the code had bit-rotted, and a good deal of the ARM-specific code is not integrated into the RTCD structs at all. I did not try to resolve the latter, merely to add the minimal amount of protection around them to allow RTCD to work. Those functions that were called based on an ifdef at the calling site were expanded to check the RTCD flags at that site, but they should be added to an RTCD struct somewhere in the future. The functions invoked with global function pointers still are, but these should be moved into an RTCD struct for thread safety (I believe every platform currently supported has atomic pointer stores, but this is not guaranteed). The encoder's boolhuff functions did not even have _c and armv7 suffixes, and the correct version was resolved at link time. The token packing functions did have appropriate suffixes, but the version was selected with a define, with no associated RTCD struct. However, for both of these, the only armv7 instruction they actually used was rbit, and this was completely superfluous, so I reworked them to avoid it. The only non-ARMv4 instruction remaining in them is clz, which is ARMv5 (not even ARMv5TE is required). Considering that there are no ARM-specific configs which are not at least ARMv5TE, I did not try to detect these at runtime, and simply enable them for ARMv5 and above. Finally, the NEON register saving code was completely non-reentrant, since it saved the registers to a global, static variable. I moved the storage for this onto the stack. A single binary built with this code was tested on an ARM11 (ARMv6) and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder, and produced identical output, while using the correct accelerated functions on each. I did not test on any earlier processors. Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
2010-10-25	isolate new temporal filtering code	Johann
	onyx_if is getting pretty big. split out the temporal code to make it easier to look at. Change-Id: I207c3a94c90e91b32e3ea5e1836a53b7a990fabd
2010-10-21	Convert [4][4] matrices to [16] arrays.	Timothy B. Terriberry
	Most of the code that actually uses these matrices indexes them as if they were a single contiguous array, and coverity produces reports about the resulting accesses that overflow the static bounds of the first row. This is perfectly legal in C, but converting them to actual [16] arrays should eliminate the report, and removes a good deal of extraneous indexing and address operators from the code. Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
2010-10-21	Merge "Move firstpass motion map to stats packet"	John Koleszar

2010-10-21	Move firstpass motion map to stats packet	John Koleszar
	The first implementation of the firstpass motion map for motion compensated temporal filtering created a file, fpmotionmap.stt, in the current working directory. This was not safe for multiple encoder instances. This patch merges this data into the first pass stats packet interface, so that it is handled like the other (numerical) firstpass stats. The new stats packet is defined as follows: Numerical Stats (16 doubles) -- 128 bytes Motion Map -- 1 byte / Macroblock Padding -- to align packet to 8 bytes The fpmotionmap.stt file can still be generated for debugging purposes in the same way that the textual version of the stats are available (defining OUTPUT_FPF in firstpass.c) Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca
2010-10-21	Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm	Yunqing Wang
	Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
2010-10-21	Merge "Rewrite vp8_short_walsh4x4_sse2()"	Yunqing Wang

2010-10-21	Merge "Add SSE2 subtract functions"	Yunqing Wang

2010-10-21	Rewrite vp8_short_walsh4x4_sse2()	Yunqing Wang
	This rewriting reflects changes made in commit "Improve the accuracy of forward walsh-hadamard transform". Since this function is not called much, only a small encoder performance gain (~0.5% ) is seen. Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
2010-10-19	Merge "change to make use of more trellis quantization"	Yaowu Xu

2010-10-18	Add SSE2 subtract functions	Yunqing Wang
	Instead of doing 8-bit data unpack and 16-bit subtraction, use psubb to do 16 8-bit subtractions and pcmpgtb to preserve the sign information. This does not bring noticable gain since these functions are not called frequently. Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
2010-10-18	copy compiler warning fixes	Johann
	generic version got fixed, but not the arm version. fixes: vp8/encoder/arm/mcomp_arm.c: In function 'vp8_full_search_sadx3': vp8/encoder/arm/mcomp_arm.c:1208: warning: pointer targets in passing argument 5 of 'fn_ptr->sdx3f' differ in signedness vp8/encoder/arm/mcomp_arm.c:1208: note: expected 'unsigned int ' but argument is of type 'int ' and another unsigned change to keep the files similar Change-Id: I1b6255dc3a03b90394a791ee0d15d8167d9454db
2010-10-15	remove dead code	Johann
	vp8_diamond_search_sadx4 isn't used in arm because there is no corrosponding sdx4df as in x86. rather than keep it in sync with ../mcomp.c, delete it vp8_hex_search had the original, more readable/understandable code if`d out. it's also available in ../mcomp.c, so remove the dead copy Change-Id: Ia42aa6e23b3a2e88040f467280befec091ec080e
2010-10-15	change to make use of more trellis quantization	Yaowu Xu
	when a subsequent frame is encoded as an alt reference frame, it is unlikely that any mb in current frame will be used as reference for future frames, so we can enable quantization optimization even when the RD constant is slightly rate-biased. The change has an overall benefit between 0.1% to 0.2% bit savings on the test sets based on vpxssim scores. Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea
2010-10-14	Merge "Improve bounds checking in vp8_diamond_search_sadx4()"	Yunqing Wang

2010-10-14	Improve bounds checking in vp8_diamond_search_sadx4()	Yunqing Wang
	In order to know if all 4/8 neighbor points are within the bounds, 4 bounds checking are enough instead of checking 4 bounds for each points (16/32 checkings). This improvement reduces cost of vp8_diamond_search_sadx4() by 30%, and gives encoder a 1.5% performance gain (test options: 1 pass, good, speed=4). Change-Id: Ie8da29d18a6ecfc9829e74ac02f6fa70e042331a
2010-10-13	Fix compiler warning about vp8_fast_quantize_b_impl_ssse2.	Fritz Koenig
	Typo had function defined as _ssse2 and prototyped as _sse2. Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe
2010-10-13	Correct QWORD usage in assembly files	Fritz Koenig
	QWORD was being undefined because it was being used incorrectly. Change-Id: I3610cefa3d6f0da4054316760f78b9694cde3876
2010-10-12	Centralize mb skip state calculation	John Koleszar
	This patch moves the scattered updates to the mb skip state (mode_info_context->mbmi.mb_skip_coeff) to vp8_tokenize_mb. Recent changes to the quantizer exposed a bug where if a macroblock could be coded as a skip but isn't, the encoder would run the loopfilter but the decoder wouldn't, causing a reference buffer mismatch. The loopfilter is controlled by a flag called dc_diff. The decoder looks at the number of decoded coefficients when setting this flag. The encoder sets this flag based on the skip state, since any skippable macroblock should be transmitted as a skip. The coefficient optimization pass (vp8_optimize_b()) could change the coefficients such that a block that was not a skip becomes one. The encoder was not updating the skip state in this situation for intra coded blocks. The underlying issue predates it, but this bug was recently triggered by enabling trellis quantization on the Y2 block in commit dcd29e3, and by changing the quantizer range control in commit 305be4e. Change-Id: I5cce5da0dbc2d22f7d79ee48149f01e868a64802
2010-10-12	Merge "Add const qualifiers to variance/SAD functions."	John Koleszar

2010-10-12	Add const qualifiers to variance/SAD functions.	Timothy B. Terriberry
	These functions should never change their input, and there's no reason not to declare that. This allows them to be passed static const data. Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
2010-10-12	Merge "Move vp8_strict_quantize_b inside EXACT_QUANT #define."	John Koleszar

2010-10-12	Merge "Remove INTRARDOPT #define and intra_rd_opt option."	John Koleszar

2010-10-11	Move vp8_strict_quantize_b inside EXACT_QUANT #define.	Timothy B. Terriberry
	There is currently no inexact version of this function, so do not even compile it without EXACT_QUANT. This will prevent someone from inadvertently trying to use it without the proper EXACT_QUANT setup. Change-Id: Ia13491e0128afb281c05c9222ee5987101e4010d
2010-10-11	Remove INTRARDOPT #define and intra_rd_opt option.	Timothy B. Terriberry
	This is just eliminating some cruft. Although a number of variables are declared only when INTRARDOPT is defined, they are used elsewhere without that protection, and no longer just for intra RDO. The intra_rd_opt flag was hard-coded to 1 and never checked. Change-Id: I83a81554ecee8053e7b4ccd8aa04e18fa60f8e4f
2010-10-11	Merge "Added vp8_fast_quantize_b_sse2"	Scott LaVarnway

2010-10-07	Remove unused file in encoder	Yunqing Wang
	Remove vp8/encoder/x86/csystemdependent.c Change-Id: I7c590dcd07b68704d463a1452f62f29ffb1402f4
2010-10-07	Added vp8_fast_quantize_b_sse2	Scott LaVarnway
	Moved vp8_fast_quantize_b_sse from quantize_mmx.asm into quantize_sse2.asm and renamed. Updated the assembly code to match the C version. Change-Id: I1766d9e1ca60e173f65badc0ca0c160c2b51b200
2010-10-06	optimize fast_quantizer c version	Yaowu Xu
	As the zbin and rounding constants are normalized, rounding effectively does the zbinning, therefore the zbin operation can be removed. In addition, the memset on the two arrays are no longer necessary. Change-Id: If39c353c42d7e052296cb65322e5218810b5cc4c
2010-10-05	Merge "Tune effect of motion on KF/GF boost in two pass;"	Paul Wilkins

2010-10-04	nasm: address labels 'rel label' vice 'wrt rip'	Jan Kratochvil
	nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04	nasm: match instruction length (movd/movq) to parameters	Jan Kratochvil
	nasm requires the instruction length (movd/movq) to match to its parameters. I find it more clear to really use 64bit instructions when we use 64bit registers in the assembly. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-10-04	Merge "enable trellis quantization for 2nd order blocks"	Yaowu Xu

2010-10-02	Tune effect of motion on KF/GF boost in two pass;	Paul Wilkins
	This code adjust the impact of the amount and speed of motion on GF and KF boost. Sections with lots of slow motion will tend to have a somewhat bigger boost and sections with fast motion may have less. There is a knock on effect to the selection of the active quantizer range. This will likely require further tuning but helps with a couple of particularly bad edge cases. Change-Id: Ic2449cda7305672b69acf42fc0a845b77ac98d40
2010-10-02	enable trellis quantization for 2nd order blocks	Yaowu Xu
	Experimented with different value for Y2_RD_MULT ranging f[1, 32], without adapting the value to MB coding mode/frame type/Q value, 4 works out best among all values, providing overall 0.1% coding gain on the test set. Change-Id: I6b2583a8aa5db5e7e5c65c646301909c0c58f876
2010-10-01	Made temporal filter default to use centered mode	Adrian Grange
	If temporal filtering is enabled but a filter type is not specified centered filter mode is used by default. Change-Id: I87306f267c1390074c806c506a69b4ba914d92a2
2010-09-29	Rename mode_ref_lf_test_function	John Koleszar
	This function graduated from being a test func to something that's on by default. Rename it and remove some spurious comments that confuse its status. Change-Id: I689695a3ad29c35e9a72a43ec93766733ac6c20b
2010-09-29	Fix loopfilter delta zero transitions	John Koleszar
	Loopfilter deltas are initialized to zero on keyframes in the decoder. The values then persist from the previous frame unless an update bit is set in the bitstream. This data is not included in the entropy data saved by the 'refresh entropy' bit in the bitstream, so it is effectively an additional contextual element beyond the 3 ref-frames and the entropy data. The encoder was treating this delta update bit as update-if-nonzero, meaning that the value would be refreshed even if it hadn't changed, and more significantly, if the correct value for the delta changed to zero, the update wouldn't be sent, and the decoder would preserve the last (presumably non-zero) value. This patch updates the encoder to send an update only if the value has changed from the previously transmitted value. It also forces the value to be transmitted in error resilient mode, to account for lost context in the event of lost frames. Change-Id: I56671d5b42965d0166ac226765dbfce3e5301868
2010-09-29	Change to coefficient optimization rules.	Paul Wilkins
	Allow coefficient optimization for good quality speed 0. Change-Id: Id0cb363df6823c6798671584fbba097916a7df2c
2010-09-29	Merge "Moved row-specific computation of MV bounds out of col loop"	Adrian Grange

2010-09-29	Moved row-specific computation of MV bounds out of col loop	Adrian Grange
	Moved the bounds computation on vertical MV component out of the loop that processes MBs within a MB row.
2010-09-29	Control of active min quantizer for two pass.	Paul Wilkins
	Create look up tables for controlling the active quantizer range. Some initial tuning to improve quality circa 0.5% on test set. Clean up of some stats output code Change-Id: Ia698a8525f8b8129a503cadace3ee73fe888f543
2010-09-28	Enabled AltRef motion map creation	Adrian Grange
	Enabled the first-pass encode to output the map of macroblock coding modes required by the AltRef filter.
2010-09-28	Made AltRef filter adaptive & added motion compensation	Adrian Grange
	Modified AltRef temporal filter to adapt filter length based on macroblock coding modes selected during first-pass encode. Also added sub-pixel motion compensation to the AltRef filter.
2010-09-27	Badly placed initialization of rolling rate monitors.	Paul Wilkins
	This affects control of the active quantizer range. Change-Id: I30511fc81ac9f75ff20d9f1372382423d56739da
2010-09-24	disable compilation of debugging code	John Koleszar
	This patch avoids compiling some debugging code in onyx_if.c. The most significant fix is to avoid generating code for vp8_write_yuv_frame, which is never called. Some other code was removed by the dead code elimination performed by the compiler, and this patch does it with the preprocessor instead. There are advantages both ways. Change-Id: I044fd43179d2e947553f0d6f2cad5b40907ac458
2010-09-16	Reduce size of tokenizer tables	John Koleszar
	This patch reduces the size of the global tables maintained by the tokenizer to 16k from 80k-96k. See issue #177. Change-Id: If0275d5f28389af11ac83c5d929d1157cde90fbe
2010-09-09	Fix GF interval for non-lagged ARFs	John Koleszar
	When ARFs are enabled in non-lagged compress modes, the GF interval was being reset to zero. Non-lagged ARF updates were enabled in commit 63ccfbd, but this incorrect GF interval caused a quality regression. Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3
2010-09-09	Use WebM in copyright notice for consistency	John Koleszar
	Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-09-08	Skip unnecessary search of identical frames	Jim Bankoski
	vp8_get_compressed_data() was defeating logic in encode_frame_to_datarate() that determined the reference buffers to search and forcing all frames to be eligible to search. In cases where buffers have identical contents, this is unnecessary extra work. Change-Id: I9e667ac39128ae32dc455a3db4c62e3efce6f114