summaryrefslogtreecommitdiff
path: root/vp8/encoder/x86
AgeCommit message (Collapse)Author
2012-08-31Encoder denoiser performance improvementYunqing Wang
The denoiser function was modified to reduce the computational complexity. 1. The denoiser c function modification: The original implementation calculated pixel's filter_coefficient based on the pixel value difference between current raw frame and last denoised raw frame, and stored them in lookup tables. For each pixel c, find its coefficient using filter_coefficient[c] = LUT[abs_diff[c]]; and then apply filtering operation for the pixel. The denoising filter costed about 12% of encoding time when it was turned on, and half of the time was spent on finding coefficients in lookup tables. In order to simplify the process, a short cut was taken. The pixel adjustments vs. pixel diff value were calculated ahead of time. adjustment = filtered_value - current_raw = (filter_coefficient * diff + 128) >> 8 The adjustment vs. diff curve becomes flat very quick when diff increases. This allowed us to use only several levels to get a close approximation of the curve. Following the denoiser algorithm, the adjustments are further modified according to how big the motion magnitude is. 2. The sse2 function was rewritten. This change made denoiser filter function 3x faster, and improved the encoder performance by 7% ~ 10% with the denoiser on. Change-Id: I93a4308963b8e80c7307f96ffa8b8c667425bf50
2012-06-11Fix pedantic compiler warningsJohn Koleszar
Allows building the library with the gcc -pedantic option, for improved portabilty. In particular, this commit removes usage of C99/C++ style single-line comments and dynamic struct initializers. This is a continuation of the work done in commit 97b766a46, which removed most of these warnings for decode only builds. Change-Id: Id453d9c1d9f44cc0381b10c3869fabb0184d5966
2012-05-31Fixes a win build issue related to denoising.Stefan Holmer
Change-Id: I912384f526865089aa03ca8875591324e5c1c449
2012-05-31Fixes a clang linking error.Stefan Holmer
Change-Id: I1d2db53129dc6ec068093ad1e5fc0d94110473b3
2012-05-30Added another denoising threshold for finding DC shifts.Stefan Holmer
Compares the sum of differences between the input block and the averaged block. If they differ too much the block will not be filtered. Negligible perfomance hit. Change-Id: Ib1c31a265efd4d100b3abc4a1ea6675038c8ddde
2012-05-23Make libvpx Chromium build friendlyAlpha Lam
Add PRIVATE macro for adding private_extern directive for yasm to hide global symbols. This is only enabled if -DCHROMIUM is used with YASM. Also fixed a small problem with rtcd_defs.sh to guard TEMPORAL_DENOISING. Change-Id: I9027fce3ebddcf20078293e4b86b396f21da7857
2012-05-21Inline Intrinsic optimized DenoiserChristian Duvivier
Faster version of denoiser, cut cost by 1.7x for C path, by 3.3x for SSE2 path. Change-Id: I154786308550763bc0e3497e5fa5bfd1ce651beb
2012-04-17Makes all global data in entropy.c constAttila Nagy
Removes all runtime initialization of global data in entropy.c. Precalculated values are used for initializing all entropy related tabels. First patch in a series to make sure code is reentrant. Change-Id: I9aac91a2a26f96d73c6470d772a343df63bfe633
2012-03-05Move SAD and variance functions to commonJohann
The MFQE function of the postprocessor depends on these Change-Id: I256a37c6de079fe92ce744b1f11e16526d06b50a
2012-02-16Clarify 'max_sad' usageJohann
Depending on implementation the optimized SAD functions may return early when the calculated SAD exceeds max_sad. Change-Id: I05ce5b2d34e6d45fb3ec2a450aa99c4f3343bf3a
2012-02-10Missed some variance castsJohann
Change-Id: I9fb510f9421fb3c317a8e32e3058cee977ddf9fa
2012-02-09Fix variance overflowJohann
In the variance calculations the difference is summed and later squared. When the sum exceeds sqrt(2^31) the value is treated as a negative when it is shifted which gives incorrect results. To fix this we cast the result of the multiplication as unsigned. The alternative fix is to shift sum down by 4 before multiplying. However that will reduce precision. For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and change). PPC change is untested. Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
2012-01-30RTCD: finalize removal of old RTCD systemJohn Koleszar
This is the final commit in the series converting to the new RTCD system. It removes the encoder csystemdependent files and the remaining global function pointers that didn't conform to the old RTCD system. Change-Id: I9649706f1bb89f0cbf431ab0e3e7552d37be4d8e
2012-01-30RTCD: add arnr functionsJohn Koleszar
This commit continues the process of converting to the new RTCD system. It removes the last of the VP8_ENCODER_RTCD struct references. Change-Id: I2a44f52d7cccf5177e1ca98a028ead570d045395
2012-01-30RTCD: add motion search functionsJohn Koleszar
This commit continues the process of converting to the new RTCD system. Change-Id: Ia5828b7ecc80db55b21916704aa3d54cbb98f625
2012-01-30RTCD: add block subtraction functionsJohn Koleszar
This commit continues the process of converting to the new RTCD system. Change-Id: Id8a287fdd4bd050ea4452e1582ad85520f3081be
2012-01-30RTCD: add quantizer functionsJohn Koleszar
This commit continues the process of converting to the new RTCD system. Change-Id: Iba9df4c03a508e51c37201c621be43523fae87d9
2012-01-30RTCD: add FDCT functionsJohn Koleszar
This commit continues the process of converting to the new RTCD system. Change-Id: I3f9c07db65eb206f6363d21bdb80e871570da767
2012-01-30RTCD: add variance functionsJohn Koleszar
This commit continues the process of converting to the new RTCD system. Change-Id: Ie5c1aa480637e98dc3918fb562ff45c37a66c538
2011-12-29Improve SSSE3 fast quantizer functionYunqing Wang
Simplified the EOB calculation in the function. Change-Id: I7422f18be40ae270358f5cb0811d66e64436b56f
2011-11-18Move shared data to shared locationJohann
Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2 file is bad Moving towards allowing --disable-mmx Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742
2011-11-15Added predictor stride argument(s) to subtract functionsScott LaVarnway
Patch set 2: 64 bit build fix Patch set 3: 64 bit crash fix [Tero] Patch set 4: Updated ARMv6 and NEON assembly. Added also minor NEON optimizations to subtract functions. Patch set 5: x86 stride bug fix Change-Id: I1fcca93e90c89b89ddc204e1c18f208682675c15
2011-11-03Change use of eob in the encoderTero Rintaluoma
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and decoder will use eobs[25] array from MACROBLOCKD structure. In future, this will enable use of the decoder side IDCT in the encoder. Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978
2011-09-22Replace vpx_ports/config.h with vpx_config.hAttila Nagy
Just a clean-up. Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a
2011-08-23Use local labels for jumps/loops in x86 assembly.Fritz Koenig
Prepend . to local labels in assembly code. This allows non unique labels within a file. Also makes profiling information more informative by keeping the function name with the loop name. Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
2011-08-22Reclassify optimized ssim calculations as SSE2.Fritz Koenig
Calculations were incorrectly classified as either SSE3 or SSSE3. Only using SSE2 instructions. Cleanup function names and make non-RTCD code work as well. Change-Id: I48ad0218af0cc51c5078070a08511dee43ecfe09
2011-08-22Revert "Reclasify optimized ssim calculations as SSE2."Fritz Koenig
This reverts commit 01376858cd184d820ff4c2d8390361a8679c0e87
2011-08-19Reclasify optimized ssim calculations as SSE2.Fritz Koenig
Calculations were incorrectly classified as either SSE3 or SSSE3. Only using SSE2 instructions. Cleanup function names and make non-RTCD code work as well. Change-Id: I29f5c2ead342b2086a468029c15e2c1d948b5d97
2011-07-25Specify size for argument pushed to stackYunqing Wang
The change fixes building error on Win64. Change-Id: I63d25b26220c4da8a98ca2e36530cbb802468e6b
2011-07-22Preload reference area to an intermediate buffer in sub-pixel motion searchYunqing Wang
In sub-pixel motion search, the search range is small(+/- 3 pixels). Preload whole search area from reference buffer into a 32-byte aligned buffer. Then in search, load reference data from this buffer instead. This keeps data in cache, and reduces the crossing cache- line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux) showed encoder speed improvement: 3.4% at --rt --cpu-used =-4 2.8% at --rt --cpu-used =-3 2.3% at --rt --cpu-used =-2 2.2% at --rt --cpu-used =-1 Test on Atom notebook showed only 1.1% speed improvement(speed=-4). Test on Xeon machine also showed less improvement, since unaligned data access latency is greatly reduced in newer cores. Next, I will apply similar idea to other 2 sub-pixel search functions for encoding speed > 4. Make this change exclusively for x86 platforms. Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
2011-06-14fix --disable-runtime-cpu-detect on x86Johann
Change-Id: Ib8e429152c9a8b6032be22b5faac802aa8224caa
2011-06-09remove one set of 16x16 variance funcationsYaowu Xu
call to this set of functions are replaced by var16x16. Change-Id: I5ff1effc6c1358ea06cda1517b88ec28ef551b0d
2011-06-06remove redundant functionsYaowu Xu
The encoder defined about 4 set of similar functions to calculate sum, variance or sse or a combination of them. This commit removed one set of these functions, get8x8var and get16x16var, where calls to the later function are replaced with var16x16 by using the fact on a 16x16 MB: variance == sse - sum*sum/256 Change-Id: I803eabd1fb3ab177780a40338cbd596dffaed267
2011-05-25Return sse value in vp8_variance SSE2 functionsYunqing Wang
Minor modification. Change-Id: I09511d38fd1451d5c4106a48acdb3f766ce59cb7
2011-05-19Merge "changed configure option name to reduce confusion"John Koleszar
2011-05-10Merge "Use diamond search to replace full search in full-pixel refining search"Yunqing Wang
2011-05-09Use diamond search to replace full search in full-pixel refining searchYunqing Wang
In NEWMV mode, currently, full search is used as the refining search after n-step search. By replacing it with an iterative diamond search of radius 1 largely reduced the computation complexity, but still maintained the same encoding quality since the refining search is done for every macroblock instead of only a small precentage of macroblocks while using full search. Tests on the test set showed a 3.4% encoding speed increase with none psnr & ssim loss. Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
2011-05-09clean up unused variable warningsJohann
Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b
2011-04-29changed configure option name to reduce confusionYaowu Xu
Renamed configure option "enable-psnr" to "enable-internal-stats" to better reflect the purpose of the option and eliminate the confusion reported in http://code.google.com/p/webm/issues/detail?id=35 Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
2011-04-25Merge "keep values in registers during quantization"Johann
2011-04-22Fix overflow in temporal_filter_apply_sse2().Ronald S. Bultje
The accumulator array is an integer array, so use paddd instead of paddw to add values to it. Fixes overflows when using large --arnr-maxframes (>8) values. Change-Id: Iad83794caa02400a65f3ab5760f2517e082d66ae
2011-04-21keep values in registers during quantizationJohann
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm registers instead of proxying through the stack. and as long as we're bumping up, use some ssse3 instructions in the EOB detection (see ssse3 fast quantizer) pick up about a percent on 32bit and about two on 64bit. Change-Id: If15abba0e8b037a1d231c0edf33501545c9d9363
2011-04-19modify SAVE_XMM for potential 64bit useJohann
the win64 abi requires saving and restoring xmm6:xmm15. currently SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow specifying the highest register used and if the stack is unaligned. Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-19Merge "Add save/restore xmm registers in x86 assembly code"Johann
2011-04-18Add save/restore xmm registers in x86 assembly codeJohann
Went through the code and fixed it. Verified on Windows. Where possible, remove dependencies on xmm[67] Current code relies on pushing rbp to the stack to get 16 byte alignment. This broke when rbp wasn't pushed (vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned memory accesses. Revisit this and the offsets in vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM. Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-18Merge "store quant_shift as an unsigned char"Johann
2011-04-18Merge "fixed an overflow in ssim calculation"Yaowu Xu
2011-04-13store quant_shift as an unsigned charJohann
in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant only use 8 bits to store this, instead of 16. will allow saving an xmm register in an updated version of the regular quantize Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
2011-04-11Set cpu_used range to [-16, 16] in real-time modeYunqing Wang
Remove encoding speed limitation in real-time mode. Change-Id: Ib5e35d8bb522b2a25f3e4ad5cfe2788ebebb3617
2011-04-07fixed an overflow in ssim calculationJim Bankoski
This commit fixed an overflow in ssim calculation, added register save and restore to make sure assembly code working for x64 platform. It also changed the sampling points to every 4x4 instead of 8x8 and adjusted the constants in SSIM calculation to match the scale of previous VPXSSIM. Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b