view doc/HR-codec-library @ 632:7fc57e2a6784

beginning of GSM-HR documentation
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 19 Mar 2026 04:13:45 +0000
parents
children 3ab76caba41c
line wrap: on
line source

Themyscira libgsmhr1: library for GSM-HR codec
==============================================

The present library provides the following functionalities related to GSM-HR
speech codec, also known as HRv1:

* Stateful speech encoder and decoder engines based on GSM 06.06 reference code
  from ETSI;

* TS 28.062 section C.3.2.1.1 stateful TFO transform for this codec;

* A rich set of stateless utility functions for format conversion and other
  common manipulations.

Compared to libgsmhr alternative implemented as part of Osmocom gapk, our
implementation provides the following advantages:

* In our librification of ETSI GSM-HR code, speech encoder and decoder engines
  have been outfitted with proper state structures, as opposed to the hack of
  treating the entire bss segment with global variables as a poor man's state
  structure.

* The Rx front end has been factored out of the speech decoder and can also be
  used as a TFO transform.  Because HRv1 codec falls chronologically between
  FRv1 and EFR, our TFO transform for HRv1 serves as a stepping stone toward
  future work on TFO transform for EFR.

* We made a slight extension to this GSM-HR Rx front end, applicable to both
  full decoder and TFO transform configurations, to support BFI without payload
  bits.  This condition occurs in the case of FACCH stealing, packet loss in IP
  transport, or the non-modifiable DSP PHY implementation in sysmoBTS that does
  not provide erroneous payload bits along with BFI.

* We added many format conversion and other utility functions that are
  Themyscira original work, not from ETSI GSM-HR code.

However, because of very limited practical utility of GSM-HR codec, almost no
work has been done to speed up any of the grossly inefficient code that
originates from ETSI - see HR-codec-limits article.

GSM-HR codec frame formats
==========================

Our speech encoder, speech decoder and TFO transform engines operate on the
same canonical formats as the original reference code from ETSI:

* The output from our speech encoder engine for each frame is an array of 20
  16-bit words (18 codec parameters followed by VAD and SP flags) that matches
  ETSI *.cod format.

* The input format to our speech decoder engine is an array of 22 16-bit words
  (18 codec parameters followed by BFI, UFI, SID and TAF) as represented in
  ETSI *.dec format.

* The input format to our TFO transform implementation is a decoder input frame
  (*.dec), and the output mimics an encoder output (*.cod) frame complete with
  VAD and SP flags.  (The latter outputs are VAD=1 SP=1 in the case of DTXd=0,
  or VAD=0 dummy and correct SP output in the case of DTXd=1.)

All other standard formats for GSM-HR codec frames, namely TS 101 318, RFC 5993
and TW-TS-002, are supported via stateless format conversion functions.

Representation and handling of BFI
----------------------------------

If a decoder input frame in the canonical 22-word *.dec format has BFI=1 and
SID=0, it is a non-SID BFI frame, also called an unusable frame in GSM 06.41
spec.  If such frame arrives in comfort noise insertion state, all parameters
are ignored.  On the other hand, if such frame arrives outside of DTX state,
when ECU logic is applied instead, our version retains the logic from ETSI
reference code in that codevector parameters from BFI frames are used if the
voiced vs unvoiced mode matches between the BFI frame and the saved frame used
by the ECU.  This logic resides in the Rx front end that is shared between full
decoder and TFO transform implementations.

There is, however, an extension to this logic original to Themyscira: if the
BFI word in the 22-word decoder input frame equals 2 instead of 1 (not allowed
in ETSI reference code where BFI is strictly a binary flag), the frame is
treated as BFI-no-data and no parameter bits are ever used in any state.
Higher-level RTP input functions, described later in this article, feed this
BFI=2 code to the speech decoder or TFO transform engine when RTP input is BFI
without payload bits.

Representation and handling of invalid SID
------------------------------------------

If a decoder input frame in the canonical 22-word *.dec format has indicators
set to either SID=1 (irrespective of other flags) or SID=2 with either BFI or
UFI nonzero, that input frame is invalid SID per GSM 06.41.  All 18 speech
parameters are fully ignored in such frames, always.

In RTP transport these invalid SID frames can be represented only in TW-TS-002,
not in either of the two non-Themyscira standards.  TW-TS-002 offers the option
of either including or omitting payload bits in invalid SID packets - however,
if invalid SID payload bits are included, they are ignored by our speech decoder
and TFO transform engines.

Libgsmhr1 general usage
=======================

The external public interface to Themyscira libgsmhr1 consists of a single
header file <tw_gsmhr.h>; it should be installed in some system include
directory.

The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function
prototypes), const qualifier is used where appropriate, and the interface is
defined in terms of <stdint.h> types; <tw_gsmhr.h> includes <stdint.h>.

State allocation and freeing
============================

In order to use the speech encoder, you will need to allocate an encoder state
structure, and to use the speech decoder, you will need to allocate a decoder
state structure.  The same goes for the stateful TFO transform.  The necessary
state allocation functions are:

struct gsmhr_encoder_state *gsmhr_encoder_create(int dtx);
struct gsmhr_decoder_state *gsmhr_decoder_create(void);
struct gsmhr_rxfe_state *gsmhr_rxfe_create(void);	/* TFO transform */

struct gsmhr_encoder_state, struct gsmhr_decoder_state and struct
gsmhr_rxfe_state are opaque structures to library users: you only get pointers
which you remember and pass around, but <tw_gsmhr.h> does not give you full
definitions of these structs.  As a library user, you ordinarily don't even
need to know the size of these structs, hence the necessary malloc() operation
happens inside gsmhr_encoder_create(), gsmhr_decoder_create() and
gsmhr_rxfe_create() functions.  (But see the following section regarding
alternative memory allocation schemes.)  However, each structure is malloc'ed
as a single chunk, hence when you are done with it, simply call free() to
relinquish each encoder, decoder or TFO state instance.

gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions
can fail if the malloc() call inside fails, in which case these libgsmhr1
functions return NULL.

The dtx argument to gsmhr_encoder_create() is a Boolean flag represented as an
int; it tells the GSM-HR speech encoder whether it should operate with DTX
enabled (run GSM 06.42 VAD and emit SID frames instead of speech frames per
GSM 06.41) or DTX disabled (skip VAD and always emit speech frames).

It should be noted that the original GSM-HR speech encoder from ETSI always runs
GSM 06.42 VAD algorithm, whether DTX is enabled or disabled; if DTX is disabled,
VAD flag is forced to 1, but all VAD and Tx DTX logic still executes and burns
up CPU cycles.  Due to work scope limits described in HR-codec-limits article,
this poor design from ETSI has been retained at the present.  However, the API
design allows DTX enable-or-disable flag to be changed only with a full reset
of the speech encoder, rather than per frame - this API design thus prepares
for the possibility of cleaning up this implementation in the future and
executing VAD and Tx DTX code only when DTX is enabled in the speech encoding
direction.

State reset functions
=====================

void gsmhr_encoder_reset(struct gsmhr_encoder_state *st, int dtx);
void gsmhr_decoder_reset(struct gsmhr_decoder_state *st);
void gsmhr_rxfe_reset(struct gsmhr_rxfe_state *st);

Each of these functions resets the state of the corresponding element to its
initial or "home" state.  Home states for the standard speech encoder and
decoder are given in GSM 06.20 sections 5.5 and 5.6, respectively; the home
state for the separated-out RxFE block (used for TFO transform) is a subset of
the full speech decoder home state as relevant to the reduced functionality.

The following const "variables" are exported by the library in order to
facilitate alternative memory allocation schemes:

extern const unsigned gsmhr_encoder_state_size;
extern const unsigned gsmhr_decoder_state_size;
extern const unsigned gsmhr_rxfe_state_size;

Using these const "variables", an application can allocate buffers of the
correct size for each state structure, and then initialize each newly allocated
state structure with gsmhr_*_reset(), as an alternative to gsmhr_*_create()
functions.  Each of the standard gsmhr_*_create() functions allocates a buffer
of the correct size using standard malloc(), then initializes it with the
corresponding gsmhr_*_reset() function - hence the lower-level approach can be
used by applications that desire some other memory allocation scheme than
standard malloc().

Using the speech encoder
========================

To encode one 20 ms audio frame per GSM-HR, call gsmhr_encode_frame():

void gsmhr_encode_frame(struct gsmhr_encoder_state *st, const int16_t *pcm,
			int16_t *param);

You need to provide an encoder state structure allocated earlier with
gsmhr_encoder_create(), a block of 160 linear PCM samples, and an output buffer
of 20 (GSMHR_NUM_PARAMS_ENC) 16-bit words into which the encoded GSM-HR frame
will be written.  The encoded frame format emitted by this function is the same
as in the reference implementation from ETSI: 18 words of speech parameters
followed by VAD and SP flags.  Stateless format conversion functions described
later in this document can be used to emit more commonly used RTP formats.

The mandatory encoder homing function is included: if the input frame matches
the encoder homing frame, the encoder state is reset to the home state at the
end of gsmhr_encode_frame() processing.

Using the speech decoder
========================

Our speech decoder main function is:

void gsmhr_decode_frame(struct gsmhr_decoder_state *st, const int16_t *param,
			int16_t *pcm);

The input frame format is the canonical one from ETSI: 22 (GSMHR_NUM_PARAMS_DEC)
16-bit words providing speech parameters followed by BFI, UFI, SID and TAF
metadata flags.  In normal operation, this internal canonical form of speech
decoder input will be provided by one of the stateless format conversion
functions in the same libgsmhr1.

Important note: the parameter frame input to this function is expected to be
valid, i.e., it is NOT subjected to explicit validation checks!  If your
application reads *.dec files or otherwise receives this format directly from
some external source (as opposed to output from one of our own format conversion
functions), you need to validate these bits with gsmhr_check_decoder_params()
before feeding them to the decoder engine!

This speech decoder function includes all mandatory logic for decoder homing:
special handling of the homed state, decoder homing frame checks of both full
and partial (to the first subframe) kinds, internal state reset and EHF output.

TFO transform function
======================

To operate our TFO transform for GSM-HR codec, create a standalone RxFE state
structure (use gsmhr_rxfe_create() or your own allocation followed by
gsmhr_rxfe_reset()) and then call this function for every frame to be processed:

void gsmhr_tfo_xfrm(struct gsmhr_rxfe_state *st, int dtxd, const int16_t *ul,
		    int16_t *dl);

UL Rx input has the same form as input to gsmhr_decode_frame(), and the same
caveats apply in terms of validation checks.  DL Tx output is emitted in the
same form as gsmhr_encode_frame() output, complete with VAD and SP flags.  VAD
output should be considered a dummy, but SP output flag is valid: 1 in the case
of a speech frame or 0 in the case of a SID frame; the latter is possible only
when DTXd is enabled.

DTXd control: dtxd argument to gsmhr_tfo_xfrm() tells the transform if it should
emit both speech and SID frames or speech frames only, corresponding to DTXd
flag in TRAU-UL frames that affects both speech encoder and TFO transform
functions in traditional TRAUs.  This flag can be changed mid-session: as
explained in HR-codec-Rx-logic article, our implementation of TFO transform
proceeds by applying classic Rx front end processing that only emits speech
frames, and then replacing output with SID frames under certain conditions if
DTXd is enabled.