summaryrefslogtreecommitdiff
path: root/vp8/encoder/ethreading.c
AgeCommit message (Collapse)Author
2011-06-08Further activity masking changes:Paul Wilkins
Some further re-structuring of activity masking code. Still has various experimental switches. Supports a metric based on intra encode. Experimental comparison against a fixed activity target rather than a frame average, for altering rd and zbin. Overall the SSIM performance is similar to TT's original code but there is a much smaller PSNR hit of circa 0.5% instead of 3.2% Change-Id: I0fd53b2dfb60620b3f74d7415e0b81c1ac58c39a
2011-06-06Merge "neon fast quantize block pair"Johann
2011-06-02Removed B_MODE_INFOScott LaVarnway
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO. Then removed B_MODE_INFO completely. Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
2011-06-01neon fast quantize block pairTero Rintaluoma
vp8_fast_quantize_b_pair_neon function added to quantize two adjacent blocks at the same time to improve performance. - Additional 3-6% speedup compared to neon optimized fast quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16) Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
2011-05-24Removed unused variable warningsScott LaVarnway
Change-Id: I6e5e921f03dc15a72da89a457848d519647677a3
2011-05-24MODE_INFO size reductionScott LaVarnway
Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO. This reduced the memory footprint by 518,400 bytes for 1080 resolutions. The decoder performance improved by ~4% for the clip used and the encoder showed very small improvements. (0.5%) This reduction was first mentioned to me by John K. and in a later discussion by Yaowu. This is WIP. Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29
2011-05-19Remove unused members of VP8_COMPJohn Koleszar
Various members that were either completely unreferenced or written and not read. Change-Id: Ie41ebac0ff0364a76f287586e4fe09a68907806e
2011-05-13Restructure of activity masking code.Paul Wilkins
This commit restructures the mb activity masking code to better facilitate experimentation using different metrics etc. and also allows for adjustment of the zero bin either for encode only or both the encode and mode selection stages It also uses information from the current frame rather than the previous frame and the default strength has been reduced. Change-Id: Id39b19eace37574dc429f25aae810c203709629b
2011-05-10Merge "fix a bug related to gf_active_flags in multi-threaded encoder"Yaowu Xu
2011-05-06fix a bug related to gf_active_flags in multi-threaded encoderYaowu Xu
Paul pointed out that the pointer to the gf_active_flags is not being properly incremented in multithreaded encoder. This commit fixes the issue by making sure the gf_active_ptr points to the starting of next group of mb rows. Change-Id: I3246e657d23beabb614dfb880733a68a5fd7e34c
2011-05-06Fix semaphore emulation on WindowsAron Rosenberg
The existing emulation of posix semaphores on Windows uses SetEvent() and WaitForSingleObject(), which implements a binary semaphore, not a counting semaphore as implemented by posix. This causes deadlock when used with the expected posix semantics. Instead, this patch uses the CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows 2000) which have the expected behavior. This patch also reverts commit eb16f00, which split a semaphore that was being used with counting semantics into two binary semaphores. That commit is unnecessary with corrected emulation. Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0
2011-05-05Fix rare hang in multi-thread encoder on WindowsYunqing Wang
This patch is to fix a rare hang in multi-thread encoder that was only seen on Windows. Thanks for John's help in debugging the problem. More test is needed. Change-Id: Idb11c6d344c2082362a032b34c5a602a1eea62fc
2011-05-05Merge "Runtime detection of available processor cores."Yunqing Wang
2011-04-01Use full-pixel MV in mvsadcost calculationYunqing Wang
MV sad cost error is only used in full-pixel motion search, which only need full-pixel resolution instead of quarter-pixel resolution. This change reduced mvsadcost table size, and removed unneccessary pamameter passing since this table is constant once it is generated. Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
2011-03-31Runtime detection of available processor cores.Attila Nagy
Detect the number of available cores and limit the thread allocation accordingly. On decoder side limit the number of threads to the max number of token partition. Core detetction works on Windows and Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN. Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078
2011-03-18Fix multithreaded encoding for 1 MB wide frameAttila Nagy
Thread synchronization was not correct when frame width was 1 MB. Number of allocated encoding threads is limited by the sync_range. There is no point having more because each thread lags sync_range MBs behind the thread processing the row above. http://code.google.com/p/webm/issues/detail?id=302 Change-Id: Icaf67a883beecc5ebf2f11e9be47b6997fdf6f26
2011-03-11Encoder loopfilter running in its own threadAttila Nagy
In multithreaded mode the loopfilter is running in its own thread (filter level calculation and frame filtering). Filtering is mostly done in parallel with the bitstream packing. Before starting the packing the loopfilter level has to be calculated. Also any needed reference frame copying is done in the filter thread. Currently the encoder will create n+1 threads, where n > 1 is the number of threads specified by application and 1 is the extra filter thread. With n = 1 the encoder runs in single thread mode. There will never be more than n threads running concurrently. Change-Id: I4fb29b559a40275d6d3babb8727245c40fba931b
2011-02-10Fix relative include pathsJohn Koleszar
Allow compiling without adding vp8/{common,encoder,decoder} to the include paths. Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-09Put more code under #if CONFIG_MULTITHREAD.Gaute Strokkenes
Change-Id: Icf4b692099d7d249fe3553852b1022b027b28e4b
2011-02-01Improved encoder threadingAttila Nagy
Reduce the number of sync points by letting each thread continue imediatly with a new MB row. Better multicore scaling, improves performance by 5-20% on ARM multicore. Change-Id: Ic97e4d1c4886a842c85dd3539a93cb217188ed1b
2010-12-29Fixed encoder crash when mult-threading is enabled.Scott LaVarnway
Happens in real-time mode. Will happen in good quality, speed 1. Change-Id: I3e5b68827b1a5798d0431b088a709256d1ce2c95
2010-12-17Add psnr/ssim tuning optionJohn Koleszar
Add a new encoder control, VP8E_SET_TUNING, to allow the application to inform the encoder that the material will benefit from certain tuning. Expose this control as the --tune option to vpxenc. The args helper is expanded to support enumerated arguments by name or value. Two tunings are provided by this patch, PSNR (default) and SSIM. Activity masking is made dependent on setting --tune=ssim, as the current implementation hurts speed (10%) and PSNR (2.7% avg, 10% peak) too much for it to be a default yet. Change-Id: I110d969381c4805347ff5a0ffaf1a14ca1965257
2010-12-14fix a bug that "optimize" flag is not set for sub-threadsYaowu Xu
The flag for quantization optimization was not properly propagated to mb row encoding threads. Change-Id: Ic561599c35acd94cd5698c9b314bccd596ac2deb
2010-12-10fix a bug in multithreaded encoding with active_map enabledYaowu Xu
Added the initialization of the pointer to active map. Also added the same logic for cyclic refresh in mbrow encoding threads. Change-Id: Ic48d0849dc706b27fba72d07dcc498075725663d
2010-10-12Add simple version of activity masking.Timothy B. Terriberry
This uses MB variance to change the RDO weight for mode decision and quantization. Activity is normalized against the average for the frame, which is currently tracked using feed-forward statistics. This could also be used to adjust the quantizer for the entire frame, but that requires more extensive rate control changes. This does not yet attempt to adapt the quantizer within the frame, but the signaling cost means that will likely only be useful at very high rates. Change-Id: I26cd7c755cac3ff33cfe0688b1da50b2b87b9c93
2010-09-09Use WebM in copyright notice for consistencyJohn Koleszar
Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-09-03Reduced the size of MB_MODE_INFOScott LaVarnway
Moved partition_bmi and partition_count out of MB_MODE_INFO and placed into MACROBLOCK. Also reduced the size of other members of the MB_MODE_INFO struct. For 1080p, the memory was reduced by 1,209,516 bytes. The decoder performance appeared to improve by 3% for the clip used. Note: The main goal for this change is to improve the decoder performance. The encoder will be revisited at a later date for further structure cleanup. Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
2010-08-31Changed above and left context data layoutScott LaVarnway
The main reason for the change was to reduce cycles in the token decoder. (~1.5% gain for 32 bit) This layout should be more cache friendly. As a result of this change, the encoder had to be updated. Change-Id: Id5e804169d8889da0378b3a519ac04dabd28c837 Note: dixie uses a similar layout
2010-08-12Removed unnecessary MB_MODE_INFO copiesScott LaVarnway
These copies occurred for each macroblock in the encoder and decoder. Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD. As a result, a large number compile errors had to be fixed. Change-Id: I4cf0ffae3ce244f6db04a4c217d52dd256382cf3
2010-08-11Moved gf_active code to encoder onlyScott LaVarnway
The gf_active code is only used by the encoder, so it was moved from common and decoder. Change-Id: Iada15acd5b2b33ff70c34668ca87d4cfd0d05025
2010-07-23Swap alt/gold/new/last frame buffer ptrs instead of copying.Fritz Koenig
At the end of the decode, frame buffers were being copied. The frames are not updated after the copy, they are just for reference on later frames. This change allows multiple references to the same frame buffer instead of copying it. Changes needed to be made to the encoder to handle this. The encoder is still doing frame buffer copies in similar places where pointer reference could be done. Change-Id: I7c38be4d23979cc49b5f17241ca3a78703803e66
2010-07-23Make the quantizer exact.Timothy B. Terriberry
This replaces the approximate division-by-multiplication in the quantizer with an exact one that costs just one add and one shift extra. The asm versions have not been updated in this patch, and thus have been disabled, since the new method requires different multipliers which are not compatible with the old method. Change-Id: I53ac887af0f969d906e464c88b1f4be69c6b1206
2010-06-24Redo the forward 4x4 dctYaowu Xu
The new fdct lowers the round trip sum squared error for a 4x4 block ~0.12. or ~0.008/pixel. For reference, the old matrix multiply version has average round trip error 1.46 for a 4x4 block. Thanks to "derf" for his suggestions and references. Change-Id: I5559d1e81d333b319404ab16b336b739f87afc79
2010-06-18cosmetics: trim trailing whitespaceJohn Koleszar
When the license headers were updated, they accidentally contained trailing whitespace, so unfortunately we have to touch all the files again. Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
2010-06-08minor cleanup of quantizer and fdct codeYaowu Xu
Change-Id: I7ccc580410bea096a70dce0cc3d455348d4287c5
2010-06-07Remove duplicate and unused functionsYaowu Xu
Change-Id: I944035e720ef834561a9da0d723879a4f787312c
2010-06-04LICENSE: update with latest textJohn Koleszar
Change-Id: Ieebea089095d9073b3a94932791099f614ce120c
2010-05-18Initial WebM releaseJohn Koleszar