summaryrefslogtreecommitdiff
path: root/vp9/common
AgeCommit message (Collapse)Author
2015-03-24Refactor fast loop filter code to handle 444.Alex Converse
Change-Id: I921b1ebabdf617049f8fa26fbe462c3ff115c1ce
2015-03-23Merge "Optimize the intra frame decode to skip some unnecessary copy."hkuang
2015-03-23Optimize the intra frame decode to skip some unnecessary copy.hkuang
This speeds up a normal YT style 1080P clip decode by ~1% on nexus 7. Change-Id: Ied7fa0d8bc941b2adb4db9382f549ee4d5654f3a
2015-03-19Safely free all the frame buffers after all the workers finish the work.hkuang
Issue: 978 Change-Id: Ia7aa809095008f6819a44d7ecb0329def79b1117
2015-03-12Fix a typo introduced in #94401affYaowu Xu
This fixes all test vector failures Change-Id: Ie1a9fe0f023f7a0c7e89eb55df1b40ff65302adc
2015-03-11Merge "Refactor the block decode code to make it simpler."hkuang
2015-03-11Refactor the block decode code to make it simpler.hkuang
Change-Id: I0f983cb821ad7ec6fbefe7895cb8124a8fa39df6
2015-03-10Accumulate tx_totals counters in multi-threaded encoderYunqing Wang
Tx_totals counters weren't handled correctly in multi-thread case, which caused the mismatch while encoding using threads > 1. This patch fixed that. Change-Id: Ice9b0386f57175fb92a0bdcd5042686a3106246a
2015-03-06Merge "Only wait for previous frame's motion vector if needed."Hangyu Kuang
2015-03-05Only wait for previous frame's motion vector if needed.Hangyu Kuang
Change-Id: Iecce685a33b64844446c0009f21bc85566d7469f
2015-03-04Declare function used by 'once' with 'void' parametersJohann
Visual Studio is exceptionally picky about this: vp9_reconintra.c(900): warning C4113: 'void (__cdecl *)()' differs in parameter lists from 'void (__cdecl *)(void)' [.build-x86_64-win64-vs10\vpx.vcxproj] Change-Id: I564c7415f4608fd962be8c699d6133a996b545f7
2015-03-04Make encoder buffer allocation dynamicAdrian Grange
Frame buffers are now allocated dynamically on-demand. Entries in the reference frame map, cm->ref_frame_map, may now be set to -1 (INVALID_IDX) to indicate that there is not a valid reference buffer in that "slot". All slots in the reference frame map are now initialized to the empty state (-1) and each buffer is initialized to have a reference count of 0. Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582
2015-03-03fix a race condition caused by intra function pointer initializationYunqing Wang
This patch fixed webm issue 962. (https://code.google.com/p/webm/issues/detail?id=962) The data races occurred when an encoder and a decoder were created at the same time, and the function pointers were initialized twice. Change-Id: I8851b753c4b4ad4767d6eea781b61f0ac9abb44b
2015-03-01Use variance metric for integral projection vector matchJingning Han
This commit replaces the SAD with variance as metric for the integral projection vector match. It improves the search accuracy in the presence of slight light change. The average speed -6 compression performance for rtc set is improved by 1.7%. No speed changes are observed for the test clips. Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d
2015-02-27Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 4"Jingning Han
2015-02-27Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 3"Jingning Han
2015-02-27Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 2"Jingning Han
2015-02-26Fix high bit-depth loop-filter sse2 compiling issue - part 3Jingning Han
Change-Id: Idb14b9a285f8098126f967c5e2750221d6a58f69
2015-02-26Fix high bit-depth loop-filter sse2 compiling issue - part 2Jingning Han
Change-Id: I6728b69bb3dff1daa64ff7142f691e80a089f1c4
2015-02-25Fix high bit-depth loop-filter sse2 compiling issue - part 1Jingning Han
The intrinsic statement _mm_subs_epi16() should take immediate. Feeding variable as its input argument will cause compile failure in older version gcc. Change-Id: I6a71efcc8d3b16b84715e0a9bcfa818494eea3f4
2015-02-24Merge "vp9_loopfilter: quiet integer constant size warnings"James Zern
2015-02-24Fix high bit-depth loop-filter sse2 compiling issue - part 4Jingning Han
Change-Id: I39f56f60425836f2e1ec07da71edd4810a4c78bb
2015-02-24vp9_loopfilter: quiet integer constant size warningsJames Zern
mark uint64_t constants with 'ULL' Change-Id: I7648e161b4004fba35e1fa7ab79e34cc19e39716
2015-02-20Move dequant table from VP9_COMMON to VP9_COMP as decoderHangyu Kuang
does not need it any more. This reduces VP9_COMMON size from 25776 bytes to 17584 bytes(~31%). Change-Id: Ic5daea732ccefb6d512b048af7983f0efe08589b
2015-02-19Merge "Integral projection based motion estimation"Jingning Han
2015-02-19Integral projection based motion estimationJingning Han
This commit introduces a new block match motion estimation using integral projection measurement. The 2-D block and the nearby region is projected onto the horizontal and vertical 1-D vectors, respectively. It then runs vector match, instead of block match, over the two separate 1-D vectors to locate the motion compensated reference block. This process is run per 64x64 block to align the reference before choosing partitioning in speed 6. The overall CPU cycle cost due to this additional 64x64 block match (SSE2 version) takes around 2% at low bit-rate rtc speed 6. When strong motion activities exist in the video sequence, it substantially improves the partition selection accuracy, thereby achieving better compression performance and lower CPU cycles. The experiments were tested in RTC speed -6 setting: cloud 1080p 500 kbps 17006 b/f, 37.086 dB, 5386 ms -> 16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster) pedestrian_area 1080p 500 kbps 53537 b/f, 36.771 dB, 18706 ms -> 51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings) blue_sky 1080p 500 kbps 70214 b/f, 33.600 dB, 13979 ms -> 53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster) jimred 400 kbps 13380 b/f, 36.014 dB, 5723 ms -> 13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower) Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c
2015-02-13loop_filter_rows_mt: remove dependency on 'last_height'James Zern
using this to control reallocation would miss a change if the function were not called for every frame. fixes potential memory corruption by the subsequent memset Change-Id: I4c6bb6ab68803104fc824c7e27cc2f9b2cf53e33
2015-02-11Merge "Make vp9_print_modes_and_motion_vectors() work"Yunqing Wang
2015-02-11Merge "vp9_thread: prefer pthread.h if available"James Zern
2015-02-10vp9_highbd_tm_predictor_16x16: fix win64James Zern
by saving xmm8; cglobal's xmm reg arg is 0-based Change-Id: Ic8426ec9ac59ab4478716aa812452a6406794dcb
2015-02-10Make vp9_print_modes_and_motion_vectors() workYunqing Wang
MODE_INFO struct was modified, and vp9_print_modes_and_motion_vectors() didn't work anymore. This patch modified vp9_debugmodes.c so that this function works again for debug usage. Change-Id: I293fae0295235deb2529a460a274caf7c045ac1a
2015-02-10vp9_thread: prefer pthread.h if availableJames Zern
this avoids conflicts with recent versions of mingw-w64 (tested g++ 4.8.2) and the unit tests Change-Id: Ic41ea31eebe0e3e712ed5e657f37d8cad6712088
2015-02-10Merge "Make encoder and decoder share common thread function"Yunqing Wang
2015-02-10Merge "Rename loopfilter_thread files to thread_common files"Yunqing Wang
2015-02-09Set the maximum decode threads to be 8.hkuang
This will fix the frame parallel decode hang on windows due to not enough semaphores. This will also make the frame parallel decode safer as the number of frame buffers could only support maximum 8 threads. Change-Id: Id9ef50692819dcbebbd74a0aabffbfb3f39a4309
2015-02-06Make encoder and decoder share common thread functionYunqing Wang
Moved vp9_accumulate_frame_counts to vp9_thread_common.c to eliminate the duplicate code. Change-Id: I9cf506d729603c8bf1494b4c86a3b7d47af1917a
2015-02-06Rename loopfilter_thread files to thread_common filesYunqing Wang
Renames the files to allow more common thread code to be moved to vp9/common. Change-Id: I7386e64e221086e3cdc087e79812f993c423413b
2015-02-04Account for chroma component costs in RTC mode decisionJingning Han
This commit allows the encoder to account for additional chroma plane costs in the mode decision process, if the current block potentially contains significant color change. It improves the visual quality at very low bit-rates. The compression performance of dark720p is improved by 12.39% in speed 6. For jimred at 150 kbps, the PSNR of V component (red) increased by 0.2 dB, at the expense of about 5% increase in encoding time. Note that for sequences where the chroma components are fairly consistent, the encoding time increase is negligible. On average the rtc set compression performance is improved by 1.172% in PSNR and 1.920% in SSIM. Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c
2015-02-03Merge "Remove duplicate code."hkuang
2015-02-03make low bitrates a lot less blockyJim Bankoski
Remove loop filter skip at speed 7+ because of bad visual artifacts and up the postprocessing. Change-Id: Ibdd0bac71aaee232d2bb2e14462733c51517768d
2015-02-01Merge "Optimize coef update"Yaowu Xu
2015-01-30Try again to merge branch 'frame-parallel' into master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. Current frame parallel decode will only speed up the decoding for frame parallel encoded videos. For non frame parallel encoded videos, frame parallel decode is slower than serial decode due to lack of loopfilter worker thread. There are still some known issues that need to be addressed. For example: decode frame parallel videos with segmentation enabled is not right sometimes. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5. Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
2015-01-30vp9: rename 'near' parametersJames Zern
+ nearest for consistency near is a reserved word in windows builds so using it as a parameter name may cause build failures with some configurations Change-Id: Iddf1d4ecdb39843f14e95dbfd9dca55f07f81403
2015-01-30Optimize coef updateYaowu Xu
1. move the check of search method of USE_TX_8X8 up one level to avoid operations of build_tree_distributions() 2. count tx used and avoid computaton for coef udpate when one size is not used at all. Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c
2015-01-28Remove duplicate code.hkuang
(issue #934). Change-Id: Ic8adaaff87aae0b33d9b508f160b48e0ccdaaf4c
2015-01-27Add vp9_sad32x32x4d_neon Neon intrinsic function.Frank Galligan
On Nexus 7 speed -6 saw ~18% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c
2015-01-27Add vp9_sad16x16x4d_neon Neon intrinsic function.Frank Galligan
On Nexus 7 speed -6 saw ~15% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9
2015-01-27Add vp9_sad64x64x4d_neon Neon intrinsic function.Frank Galligan
On Nexus 7 speed -6 saw ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: Id12af7d1883243c23e6692e898aea82299633d58
2015-01-24Add Neon intrinsic vp9_fdct8x8_quant_neonFrank Galligan
On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51
2015-01-23Merge "Replace divide with look-up"Yaowu Xu