summaryrefslogtreecommitdiff
path: root/vp9/encoder/x86
AgeCommit message (Collapse)Author
2015-08-06Move VP9 SSIM metrics to vpx_dsp.Alex Converse
Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e
2015-07-31Factor inverse transform functions into vpx_dspJingning Han
This commit moves the module inverse transform functions from vp9 to vpx_dsp folder. The hybrid transform wrapper functions stay in the vp9 folder, since it involves codec-specific data structures. Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
2015-07-28Replace vp9_ prefix in 2D-DCT functions with vpx_Jingning Han
Clean up the forward 2D-DCT function names in vpx_dsp. Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25
2015-07-28Move DC only forward 2D-DCT functions to vpx_dspJingning Han
This completes the forward transform functions layout refactoring. Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd
2015-07-28Factor 32x32 fwd DCT to vpx_dsp folderJingning Han
Move the 32x32 2D-DCT implementations from vp9/ to vpx_dsp/. Change-Id: Id3980696f8b69906ff7a59ff9fb2b9013d60047d
2015-07-27Move forward dct sse2 header file to vpx_dspJingning Han
Change-Id: Iba03852ce778c956200818e3473cfb2b48cf8d8e
2015-07-27Replace vp9_idct.h for precise dependencyJingning Han
This commit replaces vp9_idct.h with txfm_common.h in many SIMD implementation files for precise file dependency. Change-Id: If73dd726bb16537e7494f28538b0a169810f9756
2015-07-26Refactor vp9_idct.h fileJingning Han
Separate the common coefficient constant into vpx_dsp/txfm_common.h. Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h. This clears the use case of vp9_idct.h in vpx_dsp folder. Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
2015-07-24Remove redundant function definitions in vp9_dct_sse2.hJingning Han
Change-Id: I283d364a4e65ca9bf6ff581da1d0b498433c5402
2015-07-22Factor forward 2D-DCT transforms into vpx_dspJingning Han
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward transform operations into vpx_dsp folder. Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
2015-07-20Clean up vp9_dct32x32_sse2_impl.h header filesJingning Han
Remove redundant file dependency. Change-Id: I4708218157617dabe00e2e33e237be2838c16603
2015-07-20Unify the high bit-depth forward hybrid transformsJingning Han
The SSE2 version high bit-depth forward hybrid transforms are essentially using the C functions via cross referencing to 1-D functions in vp9_dct.c. This commit unifies the two versions and removes the unnecessary dependency. Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
2015-07-17Migrate quantization functions from vp9/ to vpx_dsp/Yunqing Wang
The following quantization functions were moved: vp9_quantize_b vp9_quantize_b_32x32 vp9_highbd_quantize_b vp9_highbd_quantize_b_32x32 vp9_quantize_dc vp9_quantize_dc_32x32 vp9_highbd_quantize_dc vp9_highbd_quantize_dc_32x32 The purpose of doing that was to allow these functions to be shared by multiple codecs. Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
2015-07-08Remove clamp operations.Yaowu Xu
The clamp calls with INT32_MIN and INT32_MAX have no effect at all on int values passed in, therefore this commit removes those effectless clamps and also adds more const intermediate results to make the code more readable. Change-Id: I66d8811f58bb74ec31cbec9a6c441983a662352e
2015-07-08Clean out more MSVC warningsYaowu Xu
Change-Id: I1bab0c104df2ec4825d050cd516e26ab635a7b3e
2015-07-07Move sub pixel variance to vpx_dspJohann
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
2015-07-06Move subtract functions from vp9 to vpx_dspJingning Han
Factor out the subtraction operator as common function. Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
2015-06-12Fix potential overflow issue in hadamard_16x16()Jingning Han
This commit fixes a potential integer overflow issue in function hadamard_16x16. It adds corresponding dynamic range comment. Change-Id: Iec22f3be345fb920ec79178e016378e2f65b20be
2015-06-03Make vp9 subpixel match vp8Johann
The only difference between the two was that the vp9 function allowed for every step in the bilinear filter (16 steps) while vp8 only allowed for half of those. Since all the call sites in vp9 (<< 1) the input, it only ever used the same steps as vp8. This will allow moving the subpel variance to vpx_dsp with the rest of the variance functions. Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75
2015-05-26Move variance functions to vpx_dspJohann
subpel functions will be moved in another patch. Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
2015-05-15rename vp9_dct_impl_sse2.c to vp9_dct_sse2_impl.hJames Zern
this file shouldn't be built directly, it is included in vp9_dct_sse2.c to create a non-high-bitdepth and a high-bitdepth version silences missing prototype warnings for the unused FDCT* functions Change-Id: Ide6ff8c24ab31bdb0f833260505ae33660a1ad5b
2015-05-15rename vp9_dct32x32_sse2.c to vp9_dct32x32_sse2_impl.hJames Zern
this file shouldn't be built directly, it is included in vp9_dct_sse2.c to create a non-high-bitdepth and a high-bitdepth version silences missing prototype warnings for the unused FDCT32x32* functions Change-Id: I0e38f16dae5ea1728de184ee2c89287d48675c51
2015-05-15rename vp9_dct32x32_avx2.c to vp9_dct32x32_avx2_impl.hJames Zern
this file shouldn't be built directly, it is included in vp9_dct_avx2.c to create a non-high-bitdepth and a high-bitdepth version silences missing prototype warnings for the unused FDCT32x32* functions Change-Id: I4c19935c0e035b393be513bde735e9a78064a494
2015-05-15vp9 intrinsics: add vp9_rtcd includeJames Zern
silences a missing declaration warning Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1
2015-05-15vp9_variance_sse2: sync function signaturesJames Zern
+ include vp9_rtcd.h silences missing prototype warnings Change-Id: I77902f07a454029baad4fe5fe6fc37c65644e6f7
2015-05-15vp9_dct_sse2: make some functions staticJames Zern
silences missing prototype warnings Change-Id: I773b6a6b5bd7c57db18c3b17c519534f80e131de
2015-05-13Relocate memory operations for common codeJohann
With the sad functions, and hopefully the variance functions soon, moving to the vpx_dsp location, place the defines used in the reference C code in a common location. Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
2015-05-07replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNEDJames Zern
this macro was used inconsistently and only differs in behavior from DECLARE_ALIGNED when an alignment attribute is unavailable. this macro is used with calls to assembly, while generic c-code doesn't rely on it, so in a c-only build without an alignment attribute the code will function as expected. Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
2015-05-06Move shared SAD code to vpx_dspJohann
Create a new component, vpx_dsp, for code that can be shared between codecs. Move the SAD code into the component. This reduces the size of vpxenc/dec by 36k on x86_64 builds. Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
2015-04-28vpx_mem: remove vpx_memsetJames Zern
vestigial. replace instances with memset() which they already were being defined to. Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201
2015-04-28vpx_mem: remove vpx_memcpyJames Zern
vestigial. replace instances with memcpy() which they already were being defined to. Change-Id: Icfd1b0bc5d95b70efab91b9ae777ace1e81d2d7c
2015-04-16Revert "Revert "Force_split on 16x16 blocks in variance partition.""Marco Paniconi
This reverts commit 004b9d83e37d355f590a6976a27b7b845d19a869 Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743
2015-04-14Revert "Force_split on 16x16 blocks in variance partition."Yunqing Wang
This reverts commit eb8c667570aa83134c7db0690de9dbdde4d90291. The patch caused mismatch while using multi-threads. Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be
2015-04-13Force_split on 16x16 blocks in variance partition.Marco
Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks. Also increase variance threshold for 32x32, and add exit condiiton in choose_partition (with very safe threshold) based on sad used to select reference frame. Some visual improvement near moving boundaries. Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%. Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip. Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577
2015-04-09Merge "SSSE3 assembly implementation of 8x8 Hadamard transform"Jingning Han
2015-04-04SSSE3 assembly implementation of 8x8 Hadamard transformJingning Han
It uses about 10% less CPU cycles than the SSE2 intrinsic implementation. Change-Id: I91017c0c068679a214b98cdd4cff3a6facfb7499
2015-04-03Merge "Tune SSSE3 assembly implementation to improve quantization speed"Jingning Han
2015-04-01Merge "Reduce required xmm number by one in block_error_fp"Jingning Han
2015-04-01Tune SSSE3 assembly implementation to improve quantization speedJingning Han
Change-Id: If0ca8b25b4800d4336e6cbc97194cd9b01c5b5a3
2015-04-01Merge "Optimize quantization simd implementation"Jingning Han
2015-04-01Reduce required xmm number by one in block_error_fpJingning Han
Use 6 xmms instead of 8. Change-Id: If976ad85d09191d2fb0565399d690f2869dbbcc7
2015-04-01Refactor block_yrd function for RTC coding modeJingning Han
This commit separates Hadamard transform/quantization operations from rate and distortion computation in block_yrd. This allows one to skip SATD computation when all transform blocks are quantized to zero. It also uses a new block error function that skips repeated computation of sum of squared residuals. It reduces the CPU cycles spent on block error calculation in block_yrd by 40%. Change-Id: I726acb2454b44af1c3bd95385abecac209959b10
2015-04-01Optimize quantization simd implementationJingning Han
This commit allows the quantizer to compare the AC coefficients to the quantization step size to determine if further multiplication operations are needed. It makes the quantization process 20% faster without coding statistics change. Change-Id: I735aaf6a9c0874c82175bb565b20e131464db64a
2015-03-31Use aligned copy in 8x8 Hadamard transform SSE2Jingning Han
This reduces the 8x8 Hadamard transform cycles by 20%. Change-Id: If34c5e02f3afa42244c6efabe121f7cf5d2df41b
2015-03-30Fix 8x8 Hadamard SSE2 implementationJingning Han
This commit fixes the SSE2 version 8x8 Hadamard transform alignment and makes it consistent with the C version. Change-Id: I1304e5f97e0e5ef2d798fe38081609c39f5bfe74
2015-03-30Enable 16x16 Hadamard transform in SATD based mode decisionJingning Han
This commit replaces the 16x16 2D-DCT transform with Hadamard transform for RTC coding mode. It reduces the CPU cycles cost on 16x16 transform by 5X. Overall it makes the speed -6 encoding speed 1.5% faster without compromise on compression performance. Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b
2015-03-30Hadamard transform based coding mode decision processJingning Han
This commit uses Hadamard transform based rate-distortion cost estimate for rtc coding mode decision. It improves the compression performance of speed -6 for many hard clips at lower bit-rates. For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for niklas720p. This will introduce extra encoding cycle costs at this point. Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375
2015-03-18vp9_fdct8x8_quant_ssse3: quiet a static analysis warningJames Zern
add an assert to validate 'in' array size Change-Id: Ie5a24275c066d9dd59714f6104510abbd4850dc5
2015-03-18vp9_fdct8x8_quant_sse2: quiet a static analysis warningJames Zern
add an assert to validate 'in' array size Change-Id: Ib72946a86f34e1ce8a69954e8e3e4fe1a0f18a91
2015-03-16Refactor column integral projection computationJingning Han
Move the scaling factor outside column projection. This avoids repeated calculation of the same scaling factor. Profiling shows that the percentage of vp9_int_pro_col_sse2 of overall cycles goes from 2.29% down to 1.88%. Change-Id: I5ac4e324ab2d7f33ba2de66dd2a12e04e04dfd66