summaryrefslogtreecommitdiff
path: root/vp9
AgeCommit message (Collapse)Author
2015-01-23Revert "Merge branch 'frame-parallel' to enable frame parallel decode in ↵Johann
master branch." This reverts commit bde04ce5039cbcf86c8b34bdb4127e18d7e1d0c7 Change-Id: I053dae04c761b04a36dc239558503905a14d2470
2015-01-22Merge branch 'frame-parallel' to enable frame parallel decode in master branch.hkuang
In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. VP9 frame parallel decode is >30% faster than serial decode with tile parallel threading which will makes devices play 1080P VP9 videos more easily. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
2015-01-22Modify variance partition selection for low resolutions.Marco
For low spatial resolutions: bias partittion selection to smaller block sizes, and base the variance computation on 4x4 down-sampling. Also move the threshold computations into the choose_partitioning, so they are computed once for each sb block. On low-res clips (RTC_derf) PSNR/SSIMetrics increase by about 4-5%. No change for resolutions above CIF. Change-Id: I93f8ff742c8044786977bb6e31dcf8efda6dd1b0
2015-01-22Merge "Bug when last group before forced key frame is short."Paul Wilkins
2015-01-21Bug when last group before forced key frame is short.Paul Wilkins
Just before a forced key frame we often get a foreshortened arf/gf group. In such a case, we do not want to update rc->last_boosted_qindex, which is used to define the Q range for the forced key frame itself. This gives a small average metrics gain for the YT and YT-HD sets (eg. YT SSIM +0.141%). Change-Id: Ie06698bc4f249e87183b8f8fb27ff8f3fde216d9
2015-01-21Merge "Fix compile error in Chromium building."JackyChen
2015-01-21Fix compile error in Chromium building.JackyChen
The comparison of address in the condition is not necessary, since they will constantly be non-null. Change-Id: Id0b0075283f5af65215d5761a8160a4cb2a15c9b
2015-01-21Allow external resize via vpx_codec_enc_config_setAlex Converse
Change-Id: I3d324e2baa4de2d266c5f7ca7b635b62372e90a7
2015-01-21Merge "Replace "colorspace" with "color_space""Yaowu Xu
2015-01-20Merge "Add Neon intrinsics for vp9_avg_8x8_neon"Frank Galligan
2015-01-20Add non420 code in multi-threaded loopfilterYunqing Wang
Added non420 part back to make it consistent with single thread code in vp9_loopfilter.c. Change-Id: I8ca255d73bffebae294d2627d6655eafe535cb90
2015-01-20Merge "vp9_ethread: add parallel loopfilter"Yunqing Wang
2015-01-17Fix variance Neon intrinsics > 32x32Frank Galligan
The 16 bit sum vector was overflowing. Change-Id: I0fdf38e832ee99457ec8680a92691a6175ff8c3f
2015-01-16vp9_ethread: add parallel loopfilterYunqing Wang
1. Added row-based loopfilter in encoder; 2. Moved common multi-threaded loopfilter functions from decoder to common; 3. Merged multi-threaded loopfilter code, and made encoder/ decoder call same function to reduce code duplication. Encoder tests showed that 1% - 2% speedup was seen for good-quality 2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6% speedup using 4 threads were seen for real-time mode(at speed 7). Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
2015-01-16Merge "Fix frame buffer swap in denoiser"Jingning Han
2015-01-16Fix frame buffer swap in denoiserJingning Han
This commit fixes a bug in denoiser reference frame buffer swap, which disables frame buffer update. Change-Id: I39a9427180fd18f9692602064ad821f7af4714c0
2015-01-15Replace "colorspace" with "color_space"Yaowu Xu
This is to make the usage of the variable name consistent across the code base. Change-Id: I698739e55841c59358d1c6e5cc97c96088772943
2015-01-15[two pass temporal svc]Fix crash issue in transcoder app caused by last fix.Minghai Shang
Change-Id: I78ecc8ec3fa3ba5f69bb23813e68a5255d0534e1
2015-01-15Add Neon intrinsics for vp9_avg_8x8_neonFrank Galligan
On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee
2015-01-14Align thread data in vp9_ethreadYunqing Wang
On some platforms, such as 32bit Windows and 32bit Mac, the allocated memory isn't aligned automatically. The thread data is aligned to ensure the correct access in SIMD code. Change-Id: I1108c145fe982ddbd3d9324952758297120e4806
2015-01-14Merge "Add encoder control for setting color space"Yaowu Xu
2015-01-14Merge "Switch remaining Neon variance functions to shifts"Frank Galligan
2015-01-14Merge "Add 64x64 sub_pel_variance Neon function"Frank Galligan
2015-01-14Add encoder control for setting color spaceYaowu Xu
This commit adds encoder side control for vp9 to set color space info in the output compressed bitstream. It also amends the "vp9_encoder_params_get_to_decoder" test to verify the correct color space information is passed from the encoder end to decoder end. Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650
2015-01-14Merge "Enable decoder to pass through color space info"Yaowu Xu
2015-01-14Add 64x64 sub_pel_variance Neon functionFrank Galligan
On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334
2015-01-14Switch remaining Neon variance functions to shiftsFrank Galligan
Saves 5 instructions on 8x8 and 16x16 and 8 instructions on 32x32, when compiled with 4.9. Change-Id: Id3da613a36a9d27d8c5169c59ba45d247c920c6c
2015-01-13Merge "Add 64x variance Neon functions"Frank Galligan
2015-01-13[twopass temporal svc] Fix decoding error on seek.Minghai Shang
Don't put small empty frame in front of a key frame. We will put key frame flag in webm container if there's a visible key frame. But there will be decoding error when we seek to here if we put the small empty frame, which will be inter frame, in front of it. Change-Id: Id50c2c1fd31da0405ff6faa7375cc2f49c55402d
2015-01-13Enable decoder to pass through color space infoYaowu Xu
This commit added a field to vpx_image_t for indicating color space, the field is also added to YUV_BUFFER_CONFIG. This allows the color space information pass through the decoder from input stream to the output buffer. The commit also updated compare_img() function with added verification of matching color space to ensure the color space information to be correctly passed from encode to decoder in compressed vp9 streams. Change-Id: I412776ec83defd8a09d76759aeb057b8fa690371
2015-01-13Add 64x variance Neon functionsFrank Galligan
Add optimized Neon functions of: vp9_variance32x64 vp9_variance64x32 vp9_variance64x64 On Nexus 7 speed -5 and -6 saw about a 4% increase in perf. Speeds -7 and -8 saw about a 6% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa
2015-01-13Merge "Added plumbing for setting color space"Yaowu Xu
2015-01-11Merge "Fix comments and color format"Yaowu Xu
2015-01-09Added plumbing for setting color spaceYaowu Xu
Change-Id: If64052cc6e404abc8a64a889f42930d14fad21d3
2015-01-09Fix comments and color formatYaowu Xu
Replaced "color space" with "color format" in comments where color sampling format is concerned, so to differentiate from the concept defined in COLOR_SPACE. Change-Id: I8c935034c166b24307a99352dab1686531276bb8
2015-01-09Merge "Use 64 bit to accumulate frame sse."Paul Wilkins
2015-01-08Merge "Refactor mc reference block fetch in denoiser"Jingning Han
2015-01-08Merge "vp9: add per-tile longjmp error handling"James Zern
2015-01-08Merge "vp9: fix -Wclobbered (longjmp + local variables)"James Zern
2015-01-08Merge "Use lookup table to find pixel numbers in block"Jingning Han
2015-01-08Merge "Disable vp9 _8_ loopfilters"Johann
2015-01-08Refactor mc reference block fetch in denoiserJingning Han
This commit refactors the motion compensated reference block fetch process in denoiser. It skips the stage that generates motion compensated reference block if denoiser decides to use copy block mode. For high motion clips, this could speed up the denoising process by about 10%. Change-Id: I8ef4fa5fe766a8c4529119b9ec01faefb3d4ef53
2015-01-08Use lookup table to find pixel numbers in blockJingning Han
This could save one multiplication in each threshold funtion called by the denoiser per block. Change-Id: I35f437e09999f0a087180878ef7805f0d86e5819
2015-01-08Merge "Refactor denoiser frame buffer update"Jingning Han
2015-01-08Merge "Initalize zeromv_sse and newmv_sse in vp9_pick_inter_mode"Jingning Han
2015-01-08Merge "Use vp9_convolve_copy in denoiser output"Jingning Han
2015-01-08Merge "Remove unnecessary init_macroblockd."hkuang
2015-01-07Refactor denoiser frame buffer updateJingning Han
Use frame buffer pointer swap instead of memcpy when possible. These two CLs make the denoiser when running on vidyo1 720p at speed -6 over 10% faster. Change-Id: I64fe8a2422cafca6787a50c7f4dfb961191c0a9d
2015-01-07Use vp9_convolve_copy in denoiser outputJingning Han
Replace copy_block with vp9_convolve_copy for speed performance improvement. Change-Id: I3a08c4d01dff2253b6ee573efd02f65ccdc1b5a5
2015-01-07Removed redundant local variables in the forward hybrid transforms.Zoe Liu
Change-Id: I60f7ccbbc8dc624134e325bdce6042bc183075b6