references_to_non_static_model-entropy-arithmetic-coders

Developments on various compression techniques, mostly non static model coders like entropy, arithmetic coders and related 

These are unsorted references, quotes, links from archives and websites papers on a variety of theories related, not advocating or advertising any techniques mentioned here..

Many quotes and references are from articles posten on the excellent : http://fastcompression.blogspot.nl/

understanding ANS: http://cbloomrants.blogspot.nl/2014/02/02-02-14-understanding-ans-4.html

paper : Arithmetic coding http://read.pudn.com/downloads75/sourcecode/zip/276150/image_compresion_by_sfj/reference/arithmetic/ibmrd2302G.pdf

ANS model - Jarek Duda paper on ANS/entropy encoding   (local arXiv copy of https://dl.dropboxusercontent.com/u/12405967/ans.pdf ) : Asymmetric numeral systems: entropy coding combining speed of Huffman coding with complession rate of arithmic coding. 

- combining tAns with encryption using a PRNG

(http://arxiv.org/abs/1311.2540 | http://arxiv.org/pdf/1311.2540v2.pdf )

Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding

Jarek Duda

(Submitted on 11 Nov 2013 (v1), last revised 6 Jan 2014 (this version, v2))

The modern data compression is mainly based on two approaches to entropy coding: Huffman (HC) and arithmetic/range coding (AC). The former is much faster, but approximates probabilities with powers of 2, usually leading to relatively low compression rates. The latter uses nearly exact probabilities - easily approaching theoretical compression rate limit (Shannon entropy), but at cost of much larger computational cost. 

Asymmetric numeral systems (ANS) is a new approach to accurate entropy coding, which allows to end this trade-off between speed and rate: the recent implementation [1] provides about 50% faster decoding than HC for 256 size alphabet, with compression rate similar to provided by AC. This advantage is due to being simpler than AC: using single natural number as the state, instead of two to represent a range. Beside simplifying renormalization, it allows to put the entire behavior for given probability distribution into a relatively small table: defining entropy coding automaton. The memory cost of such table for 256 size alphabet is a few kilobytes. There is a large freedom while choosing a specific table - using pseudorandom number generator initialized with cryptographic key for this purpose allows to simultaneously encrypt the data. 

This article also introduces and discusses many other variants of this new entropy coding approach, which can provide direct alternatives for standard AC, for large alphabet range coding, or for approximated quasi arithmetic coding.

Finite State Entropy

http://fastcompression.blogspot.fr/2013/12/finite-state-entropy-new-breed-of.html (Development blog on compression algorithm)

"In compression theory, the entropy encoding stage is typically the last stage of a compression algorithm, the one where the gains from the model are realized.

The purpose of the entropy stage is to reduce a set of flags/symbol to their optimal space given their probability. As a simple example, if a flag has 50% to be set, you want to encode it using 1 bit. If a symbol has 25% probability to have value X, you want to encode it using 2 bits, and so on.

The optimal size to encode a probability is proven, and known as the Shannon limit. You can't beat that limit, you can only get close to it." - 

A new method to perform entropy compression is presented in this article.

It features the speed of Huffman compression, but the accuracy of Arithmetic compression,

and is suitable for low-power CPU since it only uses additions, shifts and masks.

http://fastcompression.blogspot.fr/p/huff0-range0-entropy-coders.html

"ANS theory is a bit special, since it requires encoding and decoding to be performed in reverse directions.

The way FSE deals with it is that it encodes in backward direction and output in forward direction,

while decoding is performed in forward direction while reading the compressed input in backward direction."

https://github.com/Cyan4973/FiniteStateEntropy

-The advantage is that ANS doesn't longer approximate probabilities with a power of 2, but with "old state/new state" - a rational number, very close to the expected probability

Models for adaptive arithmetic coding

https://fgiesen.wordpress.com/2015/05/26/models-for-adaptive-arithmetic-coding/

http://encode.ru/threads/2206-new-compressor-LZNA-quot-LZ-nibbled-ANS-quot-quot-Oodle-1-45-quot?s=6d541fda056392981238346ed805a027

new compressor LZNA = "LZ-nibbled-ANS" - "Oodle 1.45"

http://cbloomrants.blogspot.de/2015/...odle-lzna.html

---

LZNA is a high compression LZ (usually a bit more than 7z/LZMA) with better decode speed. Around 2.5X faster to decode than LZMA.

Anyone who needs LZMA-level compression and higher decode speeds should consider LZNA. Currently LZNA requires SSE2 to be fast, so it only runs full speed on modern platforms with x86 chips.

LZNA gets its speed from two primary changes. 1. It uses RANS instead of arithmetic coding. 2. It uses nibble-wise coding instead of bit-wise coding, so it can do 4x fewer coding operations in some cases. The magic sauce that makes these possible is Ryg's realization about mixing cumulative probability distributions . That lets you do the bitwise-style shift update of probabilities (keeping a power of two total), but on larger alphabets.

LZNA usually beats LZMA compression on binary, slightly worse on text. LZNA is closer to LZHAM decompress speeds.

---

http://cbloomrants.blogspot.de/2015/05/05-09-15-oodle-lzna.html

http://cbloomrants.blogspot.de/2015/05/05-09-15-oodle-lzna.html

05-09-15 - Oodle LZNA

Oodle 1.45 has a new compressor called LZNA. (LZ-nibbled-ANS)

LZNA is a high compression LZ (usually a bit more than 7z/LZMA) with better decode speed. Around 2.5X faster to decode than LZMA.

Anyone who needs LZMA-level compression and higher decode speeds should consider LZNA. Currently LZNA requires SSE2 to be fast, so it only runs full speed on modern platforms with x86 chips.

LZNA gets its speed from two primary changes. 1. It uses RANS instead of arithmetic coding. 2. It uses nibble-wise coding instead of bit-wise coding, so it can do 4x fewer coding operations in some cases. The magic sauce that makes these possible is Ryg's realization about mixing cumulative probability distributions . That lets you do the bitwise-style shift update of probabilities (keeping a power of two total), but on larger alphabets.

LZNA usually beats LZMA compression on binary, slightly worse on text. LZNA is closer to LZHAM decompress speeds.

https://github.com/richgel999/lzham_codec_devel

- variants using compression techniques in the spatial domain

- Time-frequency analysis in data compression

Tadepalli, Hari K hari.k.tadepalli@intel.com via nist.gov 

On NIST website (http://csrc.nist.gov/groups/ST/hash/sha-3/sha-3_standardization.html) , the following announcement was placed in April 2014 concerning the standardization of the SHA-3 algorithm.

" Draft FIPS 202, SHA-3 Standard: Permutation-Based Hash And Extendable-Output Functions

NIST published a Federal Register Notice, FRN 2014-12336, on May 28, 2014 to announce the publication of Draft FIPS 202, SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions, and Draft Revision of the Applicability Clause of FIPS 180-4, Secure Hash Standard, and request for public comments. A 90-day public comment period is provided. Comments must be received by NIST on or before August 26, 2014 to be considered. Details for how to submit public comments are available in the FRN."

What is the current status FIPS 202, (now that the public comment phase is over ? When can we expect the kfinal version?

Thanks,

Hari Tadepalli

............

Intel Corporation

Chandler, AZ

Chang, Shu-jen H. <shu-jen.chang@nist.gov>

The SHA-3 package (FIPS 202 and the required paperwork) is currently at

the Commerce Secretary¹s office waiting for the Secretary¹s signature.

Upon her approval, FIPS 202 will be announced in the Federal Register and

made publicly available. NIST will notify the hash forum when that happens.

Thanks,

Shu-jen