summaryrefslogtreecommitdiff
path: root/vp9/vp9cx.mk
AgeCommit message (Collapse)Author
2020-04-24Revert "Revert "Remove RD code for CONFIG_REALTIME_ONLY in vp9.""Jerome Jiang
Under CONFIG_REALTIME_ONLY flag, map speed < 5 to speed 5. Bug: webm:1684 This reverts commit 85cb983682fe9ca14fd302b50d27d762da05d665. Change-Id: I67b7ed37e8b74417db310ea0c817d3c5a5de9e44
2020-04-20Revert "Remove RD code for CONFIG_REALTIME_ONLY in vp9."Jerome Jiang
This reverts commit da24d35132e80422dc2c33e7c92462f4db7cd83d. BUG=webm:1684 Change-Id: I552c37c7bdc844610879a65cc02038d76a5d32b1
2019-11-15Add simple_encode.cc/hangiebird
Change-Id: I6dff1bda4bea760a32c2f8e38773e5913c830204
2019-10-29Add vp9_get_encoder_config / vp9_get_frame_infoangiebird
Change-Id: Id5c8b2d69a36d218ec04cd504868ce0efebf6b69
2019-09-30namespace ARCH_* definesJames Zern
this prevents redefinition warnings if a toolchain sets one BUG=b/117240165 Change-Id: Ib5d8c303cd05b4dbcc8d42c71ecfcba8f6d7b90c
2019-07-18Add vp9_non_greedy_mv.c/hAngie Chiang
Move vp9_nb_mvs_inconsistency to vp9_non_greedy_mv.c This is to facilitate following SIMD optimizations. Change-Id: I8eb8f820368928e0c4fb287e557cddf0bd2c763e
2019-05-31Remove RD code for CONFIG_REALTIME_ONLY in vp9.Jerome Jiang
This reduces vp9 only binary size by ~5.7%. Change-Id: I57e46baf591d68b0a0cecbc9319a1190df8b0457
2019-03-25Remove deprecated code for vp9_fdct8x8_quant()Jingning Han
Change-Id: If146bbf24f446f71be9147402e6d30533eee99d1
2019-03-12Remove highbd_temporal_filter_sse4.c when REAL_TIME_ONLY is onchiyotsai
Change-Id: I3ca7442b10cbea4dd5dbabe147687d1cb3cce4d8
2019-03-04Add SSE4_1 highbd version of temporal filterchiyotsai
The SSE4_1 version of temporal filter does not distinguish between bd 10 and bd 12. Speed up: Function Level: | !SS_X | SS_X !SS_Y | 6.44X | 6.37X SS_Y | 6.56X | 6.63X Video Level: 2.5% speed up on basketballpass_240p over 150 frames on speed 1, bitdepth 10, auto-alt-ref=1 BUG=webm:1591 Change-Id: I49aa2ed4acfe80a8d627038322de66cbe691296e
2019-01-25Merge "Add SSE4 version of new apply_temporal_filter"Chi Yo Tsai
2019-01-24Add SSE4 version of new apply_temporal_filterchiyotsai
This adds a preliminary version of vp9_apply_temporal_filter in SSE4.1. This patch merely adds the function and does not enable it yet. Speed Up: | ss_x=1 | ss_x=0 | ss_y=1 | 19.80X | 19.04X | ss_y=0 | 21.09X | 20.21X | BUG=webm:1591 Change-Id: If590f1ccf1d0c6c3b47410541d54f2ce37d8305b
2019-01-23mips: resolve missing declarationsJohann
Exclude low bit depth optimizations from high bit depth builds. BUG=webm:1584 Change-Id: I86a7ebafa557d262257358e1e055a06d52659977
2019-01-07vp9_get_blockiness: resolve missing declarationJohann
BUG=webm:1584 Change-Id: I719c64734f4eae07def2d700006834a2420891a7
2018-09-04Move partition search ML models to a seperate fileHui Su
Clean up vp9_encodeframe.c. Change-Id: I4035fee94da746c74d72f71ca8334f91c5d10116
2018-06-11VSX Version of vp9_quantize_fpLuc Trudeau
Low bit depth version only. Passes the VP9QuantizeTest test suite. VP9QuantizeTest Speed Test (POWER8 Model 2.1) 4x4 C time = 86.3 ms (±0.7 ms), VSX time = 18.2 ms (±0.0 ms) [ 4.7x] 8x8 C time = 57.7 ms (±0.3 ms), VSX time = 7.6 ms (±0.0 ms) [ 7.6x] 16x16 C time = 50.7 ms (±0.1 ms), VSX time = 4.9 ms (±0.0 ms) [10.3x] Change-Id: Ic09bc786c57cc89bba14624064216b52996075eb
2018-01-18vp9_quantize_fp_avx2()Scott LaVarnway
Started from vp9_quantize_fp_sse2 and tweaked to use avx2. Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
2017-09-07Add 2 to 1 scaling NEON optimizationLinfeng Zhang
BUG=webm:1419 Change-Id: I99c954ffa50a62ccff2c4ab54162916141826d9b
2017-08-14vp9: strip temporal filter codeScott LaVarnway
when CONFIG_REALTIME_ONLY is enabled. BUG=webm:1446 Change-Id: Id547783ec75383966c40ab5cf6abb4a0f7984f52
2017-08-11vp9: strip mb graph codeScott LaVarnway
when CONFIG_REALTIME_ONLY is enabled. BUG=webm:1446 Change-Id: I4b1b8e9a456830ba1b1bd3a8882e038d37ee7903
2017-07-11remove vp9_firstpass.c w/CONFIG_REALTIME_ONLYJames Zern
BUG=webm:1446 Change-Id: I6e0ea9342c715d354c641109737172afa649b85b
2017-05-06Merge "vp9: Neon optimization for denoiser. Add unit tests."Jerome Jiang
2017-05-05vp9: Neon optimization for denoiser. Add unit tests.Jerome Jiang
Denoiser on Neon is 5x faster than C code. BUG=webm:1420 Change-Id: I805ab64f809ff2137354116be6213e7ec29c1dcb
2017-05-01move vp9_error_intrin_avx2.cJohann
There is only one avx2 implementation. Drop '_intrin' Change-Id: I887a0d27d58567eaad49f749f127eca61313f312
2017-04-26vp9 temporal filter: sse4 implementationJohann
Approximates division using multiply and shift. Speeds up both sizes (8x8 and 16x16) by 30 times. Fix the call sites to use the RTCD function. Delete sse2 and mips implementation. They were based on a previous implementation of the filter. It was changed in Dec 2015: ece4fd5d2247c9512b31a93dd593de567beaf928 BUG=webm:1378 Change-Id: I0818e767a802966520b5c6e7999584ad13159276
2017-03-08move vp9_scale_and_extend_frame_c to vp9_frame_scale.cJames Zern
this is similar to the x86 configuration and helps mitigate an issue with a circular dependency between this function and the ssse3 variant causing an outsized increase in binary size (~300K for chrome) chrome.dll: .text 255B000 -> 252B000 .data 7B000 -> 75000 -221184 bytes BUG=chromium:697956 Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
2017-02-24Merge "Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8."Jerome Jiang
2017-02-24consolidate block_error functionsJohann
vp9_highbd_block_error_8bit_c was a very simple wrapper around vp9_block_error_c. The SSE2 implemention was practically identical to the non-HBD one. It was missing some minor improvements which only went into the original version. In quick speed tests, the AVX implementation showed minimal improvement over SSE2 when it does not detect overflow. However, when overflow is detected the function is run a second time. The OperationCheck test seems to trigger this case and reverses any speed benefits by running ~60% slower. AVX2 on the other hand is always 30-40% faster. Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
2017-02-23Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8.Jerome Jiang
Only works for bitdepth = 8 when compiled with high bitdepth flag. 4x speed ups for handling 1:2 down/upsampling. Validated manually for: 1) Dynamic resize for a single layer encoding 2) SVC encoding with 3 spatial layers Results are bitexact with the patch and the speed gain (~4x) in the scaling was verified. BUG=webm:1371 Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
2017-02-14vp9 fdct higbd neon: connect existing highbd callsJohann
Change-Id: Ia8f822bd6e70b3911bc433a5a750bfb6f9a3a75c
2017-02-07block_error_fp highbd sse2: use tran_low_t for coeffJohann
BUG=webm:1365 Change-Id: Id2ed3ebaaaa6a4b68628c23e08b64ea5f1341761
2017-01-24Multi-threading of first pass stats collectionRanjit Kumar Tulabandu
(yunqingwang) 1. Rebased the patch. Incorporated recent first pass changes. 2. Turned on the first pass unit test. Change-Id: Ia2f7ba8152d0b6dd6bf8efb9dfaf505ba7d8edee
2016-08-27Move vp9_alt_ref_aq_private.h to vp9_alt_ref_aq.cYury Gitman
+ add a temporary dummy element to ALT_REF_AQ to avoid a warning about an empty struct Change-Id: Ib6e5c39ff62ad96eb4e3686d4882228a42b3843f
2016-08-25Create interface for the ALT_REF_AQ classYury Gitman
Current commit is just an API template for the rest of the code, and I will add inner logic later. Altref frames generate a lot of bitrate and at the same time other frames refer to them a lot, so it makes sense to apply special compensation-based adaptive quantization scheme for altref frames. E.g., for blocks that are good predictors for the future apply rate-control chosen quantizer while for bad predictors apply worse one. Change-Id: Iba3f8ec349470673b7249f6a125f6859336a47c8
2016-06-29vp9: remove x86inc.asm distinctionJohann
BUG=b:29583530 Change-Id: I952da3fc0d4716dec897be0d2e9806af6612722b
2016-06-07Revert "remove vp9_diamond_search_sad_avx.c"Scott LaVarnway
This reverts commit be12fefa4b7d224e9f39275a6bb4fab01b8bae3b and commit 057c1c4034ba5b9bf360c5c1f600ebc6d0718c3a. Also, the mismatch between the avx version and the c version has been fixed. BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168 For a rt encode using 1080p@60fps material, up to 11% performance improvement overall was seen. Change-Id: Icd1f216209ebc6fc0b8da885f32f356fa4355ed0
2016-05-27Merge "Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10."Linfeng Zhang
2016-05-27Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.Linfeng Zhang
Function level timing test shows about 27% time saving on a Xeon E5-2680 v2 desktop. Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid duplicate basenames. Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2() are identical. TODO: They should be unified later if there is no intention to keep a duplicate. Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
2016-05-24remove vp9_diamond_search_sad_avx.cJames Zern
vp9_diamond_search_sad_avx was disabled in: 057c1c4 disable vp9_diamond_search_sad_avx this removes a missing prototype warning as the prototype is no longer included in vp9_rtcd.h. the file can be restored if someone gets around to fixing the issue. BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168 Change-Id: Ia9fda4b81c53dc5fba7c31d780d761f886940b52
2016-02-08BUG FIX: undefined reference to `vp9_scale_and_extend_frame_c'Scott LaVarnway
See https://bugs.chromium.org/p/webm/issues/detail?id=1145 Change-Id: I778ee07dc39a524e3f729bef47a7abeed51e0cee
2016-02-04Vidyo patch: Optimization for 1-to-2 downsampling and upsampling.Scott LaVarnway
Change-Id: I9cc9780f506e025aea57485a9e21f0835faf173c
2016-01-13Adding an aq mode for 360 videosDebargha Mukherjee
Different quality levels are used for different regions in the frame depending on how far they are vertically from the center. Specifically, three segments are used based on the mi_row index with respect number to the number of mi_rows in the frame. Change-Id: Ifc8b777bc58ea8521dffc4640360c67d99f8d381
2015-12-14move vp9_avg to vpx_dspJames Zern
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-11-11Add AVX vectorized vp9_diamond_search_sadGeza Lore
This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
2015-11-06Revert "Add AVX vectorized vp9_diamond_search_sad"James Zern
This reverts commit f1342a7b070ef61b9fbdf03e899ac2107cfcb6bd. This breaks 32-bit builds: runtime error: load of misaligned address 0xf72fdd48 for type 'const __m128i' (vector of 2 'long long' values), which requires 16 byte alignment + _mm_set1_epi64x is incompatible with some versions of visual studio Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
2015-11-05Merge "Add AVX vectorized vp9_diamond_search_sad"Yunqing Wang
2015-11-05Add AVX vectorized vp9_diamond_search_sadGeza Lore
This function now has an AVX intrinsics version which is about 80% faster compared to the C implementation. This provides a 2-4% total speed-up for encode, depending on encoding parameters. The function utilizes 3 properties of the cost function lookup table, constructed in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'. For the joint cost: - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3] For the component costs: - For all i: mvsadcost[0][i] == mvsadcost[1][i] (equal per component cost) - For all i: mvsadcost[0][i] == mvsadcost[0][-i] (Cost function is even) These must hold, otherwise the AVX version of the function cannot be used. Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
2015-11-02Move noise level estimate outside denoiser.Marco
Source noise level estimate is also useful for setting variance encoder parameters (variance thresholds, qp-delta, mode selection, etc), so allow it to be used also if denoising is not on. Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
2015-10-21Optimize vp9_highbd_block_error_8bit assembly.Geza Lore
A new version of vp9_highbd_error_8bit is now available which is optimized with AVX assembly. AVX itself does not buy us too much, but the non-destructive 3 operand format encoding of the 128bit SSEn integer instructions helps to eliminate move instructions. The Sandy Bridge micro-architecture cannot eliminate move instructions in the processor front end, so AVX will help on these machines. Further 2 optimizations are applied: 1. The common case of computing block error on 4x4 blocks is optimized as a special case. 2. All arithmetic is speculatively done on 32 bits only. At the end of the loop, the code detects if overflow might have happened and if so, the whole computation is re-executed using higher precision arithmetic. This case however is extremely rare in real use, so we can achieve a large net gain here. The optimizations rely on the fact that the coefficients are in the range [-(2^15-1), 2^15-1], and that the quantized coefficients always have the same sign as the input coefficients (in the worst case they are 0). These are the same assumptions that the old SSE2 assembly code for the non high bitdepth configuration relied on. The unit tests have been updated to take this constraint into consideration when generating test input data. Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
2015-10-08Optimization of 8bit block error for high bitdepthGeza Lore
If high bit depth configuration is enabled, but encoding in profile 0, the code now falls back on optimized SSE2 assembler to compute the block errors, similar to when high bit depth is not enabled. Change-Id: I471d1494e541de61a4008f852dbc0d548856484f