Published May 29, 2019
| Version v1
Publication
Sample-Parallel Execution of EBCOT in Fast Mode
Description
JPEG 2000's most computationally expensive building
block is the Embedded Block Coder with Optimized Truncation
(EBCOT). This paper evaluates how encoders targeting a parallel
architecture such as a GPU can increase their throughput in use
cases where very high data rates are used. The compression
efficiency in the less significant bit-planes is then often poor and
it is beneficial to enable the Selective Arithmetic Coding Bypass
style (fast mode) in order to trade a small loss in compression
efficiency for a reduction of the computational complexity. More
importantly, this style exposes a more finely grained parallelism
that can be exploited to execute the raw coding passes, including
bit-stuffing, in a sample-parallel fashion. For a latency- or
memory critical application that encodes one frame at a time,
EBCOT's tier-1 is sped up between 1.1x and 2.4x compared to an
optimized GPU-based implementation. When a low GPU
occupancy has already been addressed by encoding multiple
frames in parallel, the throughput can still be improved by 5%
for high-entropy images and 27% for low-entropy images. Best
results are obtained when enabling the fast mode after the fourth
significant bit-plane. For most of the test images the compression
rate is within 1% of the original.
Additional details
Identifiers
- URL
- https://idus.us.es/handle//11441/86950
- URN
- urn:oai:idus.us.es:11441/86950
Origin repository
- Origin repository
- USE