Age | Commit message (Collapse) | Author |
|
There were multiple implementations of CHECK_MEM_ERROR across the
library that take different arguments and used in different places.
This CL will unify them and have only one implementation that takes
vpx_internal_error_info.
Change-Id: I2c568639473815bc00b1fc2b72be56e5ccba1a35
|
|
Added AVX2 intrinsic optimization for the following functions
1. vpx_idct16x16_256_add
2. vpx_idct32x32_1024_add
3. vpx_idct32x32_135_add
The module level scaling w.r.t C function (timer based) for
existing (SSE2) and new AVX2 intrinsics:
Scaling
Function Name SSE2 AVX2
vpx_idct32x32_1024_add 3.62x 7.49x
vpx_idct32x32_135_add 4.85x 9.41x
vpx_idct16x16_256_add 4.82x 7.70x
This is a bit-exact change.
Change-Id: Id9dda933aa1f5093bb6b35ac3b8a41846afca9d2
|
|
this fixes a leak when using MFQE
BUG=webm:1692
Change-Id: I19fb2f07155769f59924e0843989b3d3f8899bf6
|
|
We will init and update current_video_frame and
current_frame_coding_index in the functions.
So it's easier to keep track of when the frame indexes are updated.
Change-Id: Id6ba46643f8923348bb4f81c5dd9ace553244057
|
|
Adds conditional wait/signal instead of sched_yield.
Change-Id: I49a760eacdd6b6ac690e797ea5f10febf6a1a084
|
|
This reverts commit 06983668cf41f66765528db044419f954e5a5d64.
Fixes Visual Studio build errors introduced by earlier row mt commit
BUG=webm:1587
Change-Id: I792df86e8254cd6b2a511955b691af619a569cd0
|
|
This reverts commit 02b3ef7faee5be5ee519856fbb3523d3ab49f6e7.
Reason for revert: fails to build under visual studio
Original change's description:
> Add Tile-SB-Row based Multi-threading in Decoder
>
> Add the multi-thread function that decodes a video row by row instead
> of a tile at a time. Create a job queue for queueing all parse and recon jobs.
> Each SB row of a tile is a job.
>
> Performance Improvement:
>
> Platform Resolution 3 Threads 4 Threads
> ARM 720p 36.81% 18.37%
> 1080p 32.27% 14.76%
>
> ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz
>
> Change-Id: I3d4dd7a932fc2904c90d9546b2de99c809afd29e
BUG=webm:1587
Change-Id: Ia4c8f5128922a205cd9fd83aaef8a2e73764d4a7
|
|
|
|
Add the multi-thread function that decodes a video row by row instead
of a tile at a time. Create a job queue for queueing all parse and recon jobs.
Each SB row of a tile is a job.
Performance Improvement:
Platform Resolution 3 Threads 4 Threads
ARM 720p 36.81% 18.37%
1080p 32.27% 14.76%
ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz
Change-Id: I3d4dd7a932fc2904c90d9546b2de99c809afd29e
|
|
There was no setjmp on vpx_internal_error when there is no available
frame buffer, ready_for_new_data is not reset to 1.
BUG=webm:1571
Change-Id: I4f8efffb7d6fed3085b1f0229d0d1071a056b6c6
|
|
Row based multi-thread needs extra memory to store the parsed
co-efficients, partitions and eob. This commit adds memory for the same.
Change-Id: I13fa4a6ada2ec3048bc973e465055b832429388f
|
|
Always allocate cpi->common.postproc_state.limits using unscaled width.
With ./configure --enable-pic --enable-decode-perf-tests
--enable-encode-perf-tests --enable-encode-perf-tests
--enable-vp9-highbitdepth --enable-better-hw-compatibility
--enable-internal-stats --enable-postproc --enable-vp9-postproc
--enable-error-concealment --enable-coefficient-range-checking
--enable-postproc-visualizer --enable-multi-res-encodin
--enable-vp9-temporal-denoising --enable-webm-io --enable-libyuv
segfaults tend to occur in VP9/DatarateOnePassCbrSvcSingleBR.* tests.
This is an analogue to issue
https://bugs.chromium.org/p/webm/issues/detail?id=1374
where a buffer allocated using a scaled width is reused after scaling
back to the original size. Unfortunately, in this case the unscaled
width doesn't appear to be known in the immediated context of the
allocation, so the the signature of vp9_post_proc_frame needs to be
changed to provide that information in order to provide a similar fix
as in #1374.
Signed-off-by: Matthias Räncker <theonetruecamper@gmx.de>
Change-Id: I6f943aafbb3484ee94c5b38d7fcdd9d53fce3e5f
|
|
The previous enc/dec mismatch detection assumes the previously
reconstructed frame would always stay at frame buffer pool index
at 0. It could hence cause certain delay in enc/dec mismatch
detection when the immediate reconstruction frame is not yet
propagated to index 0 in the buffer map pool.
This change always keeps the latest decoded show frame buffer
index and directly gets the reconstructed frame from encoder and
decoder buffer pools to check for mismatch.
Change-Id: If53092cbc42ab78d55af5b83f12a489fc362f3ae
|
|
libaom commit 80a5b09337a80093e1e7ae5eb540020a22949805:
dec_free_mi: Reset cm->mi_alloc_size.
libaom commit fb0dd0bb80fc95ef016f1421b105a52fffa32816:
Clear cm->width and cm->height on alloc failure.
libaom commit ccb27264089a8cfa1334391ebbcb6a11b8dff442:
Misc. resize fixes along with the resize test
Note: only the change to enc_free_mi in av1/encoder/encoder.c
is merged.
Change-Id: I602813230d40125e59608fa013085dca3e160c33
|
|
Release frame buffers for non-ref when the decoder is destroyed.
Enable the non ref test.
BUG=b/68819248
Change-Id: Id87ef3b0a62318f9812e927cd957c05c859047fa
|
|
the file was empty after the struct removal. the only remaining use was
within vp9_dx_iface, but the wrapper became unnecessary after the
removal of frame_parallel_decode.
BUG=webm:1395
Change-Id: I515ab585d701e77d388d12b2802d844c424f9bcd
|
|
there is no threaded access to this pool after the removal of
frame_parallel_decode
BUG=webm:1395
Change-Id: I710769b87102edc898c59eb9a2e7a91d8c49107f
|
|
this has been 0 since the removal of frame_parallel_decode in
vp9_dx_iface.
BUG=webm:1395
Change-Id: I3f579766ecfa4777395b99686738e1c5610f86ef
|
|
Split vp8/vp9 implementations on yv12_copy_frame_c.
Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c.
Clean up vp8 codes usage in vp9.
BUG=webm:1435
Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d
|
|
Change-Id: Ic38ea06c7b2fb3e8e94a4c0910e82672a1acaea7
|
|
Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30
Change-Id: I4cf8f23b86d7ebd85ffd2630dcfbd799c0b88101
|
|
This commit change to promote uint8_t explicitly to uint32_t before
left shift operation.
BUG=https://bugs.chromium.org/p/chromium/issues/detail?id=612021
Change-Id: Id7059154efb5bdfa45889dabe72aaafd46d79f23
|
|
|
|
Change-Id: I83536734a54ef7b85f90f56a51878d94fac7ff22
|
|
It would otherwise result in an infinite loop.
Change-Id: Ic03fb220cc048538bd62dee599653187f2093079
|
|
Change-Id: Ieec4a7be5945dc6de192e2d8292ab978baf47f53
(cherry picked from commit 2096296421c7fa56abb49470c0fbe7c4337b8a71)
|
|
fixes crash on error
Change-Id: Ibb1ef5565fb833cdee1a49335473d98f1187ef43
|
|
unnecessary since: 86f4a3d Remove tile param
Change-Id: Iff75d3acf6c5aade833ea0a214c919279403cf97
|
|
Use system_state.h in vpx_dsp and remove unneeded includes of
vp9_systemdependent.h.
Change-Id: I92557ec6dd5aa790160b4f31fe7967db0d7ec3c4
|
|
Change the dir name to include more util tools.
Change-Id: Id5b16062803ce5eed872fe2edb36d7e56b32eed8
|
|
Replace vp9_ prefix with vpx_ for common multi-threading functions.
Change-Id: I941a5ead9bfe8213fdad345511d2061b07797b55
|
|
This commit moves the primitive multi-threading files from vp9
folder to vpx_thread, which will be accessible by all vpx codec.
Change-Id: Ib51e66e9c69801c10631fab56d35a0c0aaed5883
|
|
|
|
Change-Id: If28b59b9521204a6e3aecedcf75932d76a752567
|
|
1. Check existing buffer sizes when re-allocate context buffers.
2. Don't need to set mi buffers to 0 during setup_mi.
Change-Id: I6b48b0e077a4d804312b605ad0dc34aec5795a6d
|
|
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.
This reduces the size of vpxenc/dec by 36k on x86_64 builds.
Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
|
|
vestigial. replace instances with memset() which they already were being
defined to.
Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201
|
|
(see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6)
For the test clip used, the decoder performance improved by ~2%.
This is also an intermediate step towards adding back the
mode_info streams.
Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d
|
|
Change-Id: Ib1e17d8aae9b713b87f560ab5e49952ee2bfdcc2
|
|
Change-Id: I00621ff7165bbe86a18794b4a816976c9effaf78
|
|
cm->frame_bufs[].idx values were made consistent in:
61c5e94 Use -1 consistently as invalid buffer idx
update the initialization in swap_frame_buffers() to match.
additionally:
- remove some shadowed variables in the former and marked them volatile
Change-Id: Ie3f9636c405bd822112bb56bd22d28024ae98909
|
|
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.
Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.
There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5.
Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
|
|
master branch."
This reverts commit bde04ce5039cbcf86c8b34bdb4127e18d7e1d0c7
Change-Id: I053dae04c761b04a36dc239558503905a14d2470
|
|
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
|
|
1. Added row-based loopfilter in encoder;
2. Moved common multi-threaded loopfilter functions from decoder
to common;
3. Merged multi-threaded loopfilter code, and made encoder/
decoder call same function to reduce code duplication.
Encoder tests showed that 1% - 2% speedup was seen for good-quality
2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6%
speedup using 4 threads were seen for real-time mode(at speed 7).
Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
|
|
|
|
Instead of mixed use of both -1 and INT_MAX.
This also fixes a vp9 fuzzing test failure.
Change-Id: I950ea94b44ec7cdb5232773bee30b104e342f52a
|
|
Local variables used at the setjmp() site need to be marked volatile.
Relevant excerpt from the 'man longjmp':
===============
The values of automatic variables are unspecified after a call to
longjmp() if they meet all the following criteria:
· they are local to the function that made the corresponding setjmp(3) call;
· their values are changed between the calls to setjmp(3) and longjmp(); and
· they are not declared as volatile.
===============
Change-Id: I093e6eeeedbf5f781d202248ca701ba2c29d3064
|
|
Change-Id: Id2ad4fb24242f7ca8fa7a152f0889fded4113613
|
|
Change-Id: If2d0888d13ebe52bc7c3b16f16319408a86ab6de
|