view doc/PCM8-conversions @ 235:0ee1a66c1846

doc/PCM8-conversions: beginning of document
author Mychaela Falconia <falcon@freecalypso.org>
date Mon, 08 May 2023 00:45:26 +0000
parents
children 4c7d0dc1eecb
line wrap: on
line source

What is the authoritatively correct, officially endorsed bidirectional mapping
between G.711 A-law and mu-law encodings on one side and 16-bit 2's complement
linear PCM on the other side?  Surprisingly, there is no official answer to this
problem anywhere in the specs!  Instead the specs provide the following partial
answers:

* The G.711 spec itself provides one mapping from A-law code octets to linear
  numeric values in range [-4032,4032] and another mapping from mu-law code
  octets to linear numeric values in range [-8031,8031].  The output from each
  of these mapping is given in "pure mathematical" form, without specifying any
  bit-level encoding, and furthermore, mu-law decoder output in its pure
  "conceptual" form has both +0 and -0 values.  (The same signed zero problem
  does not occur in A-law because it's a mid-riser code rather than mid-tread,
  and thus has no quantized values equal to 0.)

* If one takes the "pure mathematical" output from the spec-prescribed G.711
  decoder and represents it in 2's complement form, squashing +0 and -0 outputs
  from the canonical mu-law decoder into "plain 0" at this step, the result is
  a 13 bits wide 2's complement value for A-law decoding and a 14 bits wide 2's
  complement value for mu-law.

* All GSM speech encoders take 13-bit 2's complement linear PCM samples as their
  input.  How should this 13-bit GSM codec input be derived from A-law or mu-law
  code octets?  GSM specs refer to ITU's G.726 spec for ADPCM - it just so
  happens that inside the ADPCM algorithm of G.726 (a totally unrelated codec of
  no relevance to GSM codec work outside of this reference) there is a pair of
  functions for expanding A-law and mu-law to linear PCM and compressing linear
  PCM back to A-law or mu-law.

* Following this obscure G.726 reference, we eventually conclude that in the
  case of A-law, GSM specs call for the obvious treatment: take the "natural"
  output from the canonical A-law decoder, represent it in 2's complement form,
  the result is 13 bits wide, and just feed that 13-bit 2's complement form to
  the input of GSM speech encoders.  However, in the case of mu-law the
  "natural" G.711 decoder output is one sign bit plus 13 bits of magnitude,
  requiring 14 bits in 2's complement representation - and none of the specs I
  could find says anything about exactly how this 14-bit input should be reduced
  to 13 bits for feeding to GSM speech encoders.  Canonical C implementations
  of all GSM speech encoders take their input in 16-bit words and clear the 3
  least significant bits as their first step; if the 14-bit mu-law decoder
  output is represented in 16-bit words by padding 2 zero bits on the right and
  this output is then fed to GSM speech encoder functions, the end effect is
  that the least-significant bit of the 14-bit decoder output is simply cut off.
  This form of mu-law-to-GSM transcoder implementation is consistent with
  TESTx-U.INP and TESTx-U.COD sequences provided in the GSM 06.54 package for
  EFR.

Based on the above considerations, we have our answer for how we should convert
from G.711 to 16-bit 2's complement linear PCM:

* For A-law, we emit the "natural" output in 13-bit 2's complement form and
  append 3 zero bits on the right; this transformation is fully lossless.

* For mu-law, we emit the "natural" output in 14-bit 2's complement form and
  append 2 zero bits on the right.  This transformation is almost lossless,
  with just one exception: the "pure" decoder's -0 output (resulting from PCMU
  octet 0x7F) is squashed to "plain 0", and will be re-emitted as PCMU octet
  0xFF rather than 0x7F on subsequent re-encoding to G.711 PCMU.

For anyone needing a G.711 to 16-bit linear PCM decoder, the present package
provides ready-made decoding tables (following the above rules) in
dev/a2s-regen.out and dev/u2s-regen.out, generated by dev/a2s-regen.c and
dev/u2s-regen.c programs.

Now for the opposite problem: what is the most correct way to compress 16-bit
2's complement linear PCM to A-law or mu-law?  In this direction the official
specs leave even more ambiguity than in the G.711 decoding direction:

* The G.711 spec itself says: "The conversion to A-law or mu-law values from
  uniform PCM values corresponding to the decision values, is left to the
  individual equipment specification."  The specific implementation used in the
  guts of G.726 ADPCM codec is referred to only as a non-normative example.

* GSM specs likewise refer to this G.726 section 4.2.8 (for compression of
  13-bit speech decoder output to G.711) with language that suggests a
  non-normative example.

After painstakingly comparing the C implementation of G.726 in the ITU-T G.191
STL against the language of G.726 spec itself and convincing myself that they
really do match, and then painstakingly comparing this approach against the one
implemented in the same G.191 STL for G.711 in alaw_compress() and
ulaw_compress() and against the table lookup method implemented in libgsm/toast
(my first reference, before I went down the rabbit hole of tracking down
official specs), I reached the following conclusions:

* For A-law encoding all 3 parties (G.191 STL alaw_compress() function, G.726
  "compress" block and toast_alaw.c) agree on the same mapping.  In this
  mapping only the most significant 12 bits of the 2's complement input word
  (equivalent to one sign bit and 11 bits of magnitude) are relevant, leading
  to the following two interesting properties:

  - the least-significant bit of GSM speech decoder output is always discarded
    when converting to A-law;

  - conversion can be easily implemented with a 4096-byte look-up table based
    on the upper 12 bits of input, exactly as was done in toast_alaw.c in the
    venerable libgsm source.

* Mu-law encoding is the real hair-raiser: if the input to the to-be-implemented
  encoder has 14 or more bits (including the most practical problem of 16-bit
  2's complement input), there are no less than 3 different ways to implement
  this encoder!