summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-06-08Merge "fdct16x16 neon optimization"Johann Koenig
2017-06-07fdct16x16 neon optimizationJohann
Roughly 2x speedup. Since the only change for HBD is to store(), the improvement appears to hold there as well. BUG=webm:1424 Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
2017-06-07Merge "vp9: SVC: Enable simple_block_yrd for temporal layers."Marco Paniconi
2017-06-07Merge changes Iade45f69,I18d90658,Ieca3f1efJohann Koenig
* changes: buffer.h: add num_elements_ buffer.h: zero-init all values buffer.h: use size_t
2017-06-07vp9: SVC: Enable simple_block_yrd for temporal layers.Marco
Enable simple_block_yrd for temporal enhancement layers (TL > 0). And remove block size condiiton for SVC mode. Only affects speed >= 7 SVC. Speedup ~3-4%. avgPSNR regression on RTC for (3 spatial, 3 temporal) layers: ~1%. Change-Id: Iff4fc191623b71c69cd373e7c0823385e7ac67ed
2017-06-07buffer.h: add num_elements_Johann
raw_size_ was being incorrectly computed and used Change-Id: Iade45f69964c567ffb258880f26006a96ae5a30d
2017-06-07buffer.h: zero-init all valuesJohann
Change-Id: I18d90658bcd4365d49adcadd6954090b3b399aa8
2017-06-07buffer.h: use size_tJohann
Change-Id: Ieca3f1ef23cd1d7b844ea3ecb054007ed280b04f
2017-06-07vp9: SVC: Enable row-mt in sample encoder.Marco
Change-Id: I4b51043cb3f5955efe947fe4685aed4a21adb8bd
2017-06-06Merge "ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}"James Zern
2017-06-06Merge "vp9: SVC: Adjust some speed settings for SVC speed >= 7."Marco Paniconi
2017-06-06vp9: SVC: Adjust some speed settings for SVC speed >= 7.Marco
Keep the 1/4subpel for all frames, use SUBPEL_TREE_PRUNED_EVENMORE for all temporal enhancement layer frames. Change-Id: Ibc681acbb6fc75b7b3c57fc483fcb11d591dfc9a
2017-06-06buffer.h: split out initJohann
Change-Id: Idfbd2e01714ca9d00525c5aeba78678b43fb0287
2017-06-06buffer.h: Use T for valuesJohann
Change-Id: I2da4110e843b6e361028b921c24b6ca2ea9077d9
2017-06-06Initialize cost_list all to INT_MAX.Jerome Jiang
It is initialized to be { INT_MAX, 0, ... } in ffe0f9b. No effect on encoders. Make it consistent with other initializations. BUG=webm:1440 Change-Id: Ie2a180d93626b55914c8c4255e466a1986d2b922
2017-06-05vp9_mcomp,get_cost_surf_min: quiet conversion warningJames Zern
visual studio will warn if a 32-bit shift is implicitly converted to 64. in this case integer storage is enough for the result. since: f3a9ae5ba Fix ubsan failure in vp9_mcomp.c. Change-Id: I7e0e199ef8d3c64e07b780c8905da8c53c1d09fc
2017-06-06Merge "Fix valgrind failure on uninitialized variables."Jerome Jiang
2017-06-06Merge "ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx"James Zern
2017-06-05Fix valgrind failure on uninitialized variables.Jerome Jiang
BUG=webm:1440 Change-Id: I7074e42bdfa8dd25f11bbb3f2ab1b41d6f4c12e4
2017-06-02Fix ubsan failure in vp9_mcomp.c.Jerome Jiang
Change-Id: Iff1dea1fe9d4ea1d3fc95ea736ddf12f30e6f48d
2017-06-01vp9: SVC: Force subpel search off under certain conditions.Marco
For SVC 1 pass non-rd mode: Force subpel seach off for SVC for non-reference frames under motion threshold. Add flag to svc context to indicate if the frame is not used as a reference. Little/no quaity loss, ~2% speedup. Change-Id: Ic433c44b514d19d08b28f80ff05231dc943b28e9
2017-06-01Merge "vp9: Speed >8: Set subpel_search_method for low motion."Marco Paniconi
2017-06-01vp9: Speed >8: Set subpel_search_method for low motion.Marco
Speed >=8: for resolutions above CIF, and for low motion content, set subpel_search_method to SUBPEL_TREE_PRUNED_EVENMORE. Small speed gain (~2%) on vga clips, RTC metrics up by ~2-3% on average. Change-Id: Ie26ba0264589652f92dfe74308740debf94cf0cc
2017-06-01vp8 skin detection: Fix visual studio build failure.Jerome Jiang
Change-Id: I510b755550ebbfa2aaf9b974920d7f1c6454a845
2017-06-01Fix corruption in skin map debugging output yuv.Jerome Jiang
For both vp8 and vp9. BUG=webm:1437 Change-Id: Ifd06f68a876ade91cc2cc27c574c4641b77cce28
2017-05-31vp8: Clean up skin detection.Jerome Jiang
Use only the average of center 2x2 pixels in vp8. Change-Id: I2b23ff19a90827226273e0fca49e90c734eda59b
2017-05-31Merge "comp_avg_pred neon: used by sub pixel avg variance"Johann Koenig
2017-05-31Merge "Write skin map of vp8 skin detection for debug."Jerome Jiang
2017-05-31Merge "Update vpx_highbd_idct4x4_16_add_sse2()"Linfeng Zhang
2017-05-30comp_avg_pred neon: used by sub pixel avg varianceJohann
BUG=webm:1423 Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804
2017-05-30Write skin map of vp8 skin detection for debug.Jerome Jiang
Change-Id: Ica1b4e918aa759cd0ce65920f9d88452bbf9e3b4
2017-05-30Update vpx_highbd_idct4x4_16_add_sse2()Linfeng Zhang
BUG=webm:1412 Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84
2017-05-30Merge "comp_avg_pred: alignment"Johann Koenig
2017-05-30Merge "remove DECLARE_ALIGNED from neon code"Johann Koenig
2017-05-30comp_avg_pred: alignmentJohann
x86 requires 16 byte alignment for some vector loads/stores. arm does not have the same requirement. The asserts are still in avg_pred_sse2.c. This just removes them from the common code. Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6
2017-05-30Merge "Fix vp8 race when build --enable-vp9-highbitdepth."Jerome Jiang
2017-05-26remove DECLARE_ALIGNED from neon codeJohann
Unlike x86 neon only requires type alignment when loading into vectors. Change-Id: I7bbbe4d51f78776e499ce137578d8c0effdbc02f
2017-05-26Merge "subpel variance neon: reduce stack usage"Johann Koenig
2017-05-26Merge "Use vdup instead of vmov"Johann Koenig
2017-05-26Fix vp8 race when build --enable-vp9-highbitdepth.Jerome Jiang
Split vp8/vp9 implementations on yv12_copy_frame_c. Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c. Clean up vp8 codes usage in vp9. BUG=webm:1435 Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d
2017-05-26vp9: SVC: Fix to condiiton on using source_sad.Marco
Fix the condition on usage of source_sad for temporal layers. FIx allows it to be used for the case of 1 temporal layer. Change-Id: I02b1b0ade67a7889d1b93cee66d27c0951131fc3
2017-05-26Merge "vp9: Use source_sad only on top temporal enhancement layer."Marco Paniconi
2017-05-26Merge "vp9: SVC: Enable copy partition for SVC speed >= 7."Marco Paniconi
2017-05-25vp9: Use source_sad only on top temporal enhancement layer.Marco
For 1 pass CBR SVC mode. Change-Id: Ic026740f9d0ec5eee7c5845be9c5b15884fec48d
2017-05-25Refactor: Move vp8 skin detection to new files.Jerome Jiang
Change-Id: If760f28cbbf22beac1cc9bd1546f13831e9dd3f0
2017-05-25vp9: SVC: Enable copy partition for SVC speed >= 7.Marco
Adjust the max_copied_frame setting for temporal layers. Keep the same setting for non-SVC at speed 8. This change also enables copy_partiton for non-SVC at speed 7, but with smaller value of max_copied_frame (=2). ~2% speedup for SVC speed 7, 3 layers, with little/no quality loss. Change-Id: Ic65ac9aad764ec65a35770d263424b2393ec6780
2017-05-24subpel variance neon: reduce stack usageJohann
Unlike x86, arm does not impose additional alignment restrictions on vector loads. For incoming values to the first pass, it uses vld1_u32() which typically does impose a 4 byte alignment. However, as the first pass operates on user-supplied values we must prepare for unaligned values anyway (and have, see mem_neon.h). But for the local temporary values there is no stride and the load will use vld1_u8 which does not require 4 byte alignment. There are 3 temporary structures. In the C, one is uint16_t. The arm saturates between passes but still passes tests. If this becomes an issue new functions will be needed. Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1
2017-05-24Use vdup instead of vmovJohann
Change-Id: Idb6248c1429b55176bb3e9f4e8365ea0ed2be62a
2017-05-24Merge changes Iaab2b9a1,Idfb458d3Johann Koenig
* changes: sub pel avg variance neon: 4x block sizes sub pel variance neon: 4x block sizes
2017-05-24Merge changes I31fa6ef8,I228c6f29Johann Koenig
* changes: sub pel avg variance neon: add neon optimizations sub pel variance neon: normalize variable names