diff doc/HR-codec-library @ 632:7fc57e2a6784

beginning of GSM-HR documentation
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 19 Mar 2026 04:13:45 +0000
parents
children 3ab76caba41c
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/HR-codec-library	Thu Mar 19 04:13:45 2026 +0000
@@ -0,0 +1,250 @@
+Themyscira libgsmhr1: library for GSM-HR codec
+==============================================
+
+The present library provides the following functionalities related to GSM-HR
+speech codec, also known as HRv1:
+
+* Stateful speech encoder and decoder engines based on GSM 06.06 reference code
+  from ETSI;
+
+* TS 28.062 section C.3.2.1.1 stateful TFO transform for this codec;
+
+* A rich set of stateless utility functions for format conversion and other
+  common manipulations.
+
+Compared to libgsmhr alternative implemented as part of Osmocom gapk, our
+implementation provides the following advantages:
+
+* In our librification of ETSI GSM-HR code, speech encoder and decoder engines
+  have been outfitted with proper state structures, as opposed to the hack of
+  treating the entire bss segment with global variables as a poor man's state
+  structure.
+
+* The Rx front end has been factored out of the speech decoder and can also be
+  used as a TFO transform.  Because HRv1 codec falls chronologically between
+  FRv1 and EFR, our TFO transform for HRv1 serves as a stepping stone toward
+  future work on TFO transform for EFR.
+
+* We made a slight extension to this GSM-HR Rx front end, applicable to both
+  full decoder and TFO transform configurations, to support BFI without payload
+  bits.  This condition occurs in the case of FACCH stealing, packet loss in IP
+  transport, or the non-modifiable DSP PHY implementation in sysmoBTS that does
+  not provide erroneous payload bits along with BFI.
+
+* We added many format conversion and other utility functions that are
+  Themyscira original work, not from ETSI GSM-HR code.
+
+However, because of very limited practical utility of GSM-HR codec, almost no
+work has been done to speed up any of the grossly inefficient code that
+originates from ETSI - see HR-codec-limits article.
+
+GSM-HR codec frame formats
+==========================
+
+Our speech encoder, speech decoder and TFO transform engines operate on the
+same canonical formats as the original reference code from ETSI:
+
+* The output from our speech encoder engine for each frame is an array of 20
+  16-bit words (18 codec parameters followed by VAD and SP flags) that matches
+  ETSI *.cod format.
+
+* The input format to our speech decoder engine is an array of 22 16-bit words
+  (18 codec parameters followed by BFI, UFI, SID and TAF) as represented in
+  ETSI *.dec format.
+
+* The input format to our TFO transform implementation is a decoder input frame
+  (*.dec), and the output mimics an encoder output (*.cod) frame complete with
+  VAD and SP flags.  (The latter outputs are VAD=1 SP=1 in the case of DTXd=0,
+  or VAD=0 dummy and correct SP output in the case of DTXd=1.)
+
+All other standard formats for GSM-HR codec frames, namely TS 101 318, RFC 5993
+and TW-TS-002, are supported via stateless format conversion functions.
+
+Representation and handling of BFI
+----------------------------------
+
+If a decoder input frame in the canonical 22-word *.dec format has BFI=1 and
+SID=0, it is a non-SID BFI frame, also called an unusable frame in GSM 06.41
+spec.  If such frame arrives in comfort noise insertion state, all parameters
+are ignored.  On the other hand, if such frame arrives outside of DTX state,
+when ECU logic is applied instead, our version retains the logic from ETSI
+reference code in that codevector parameters from BFI frames are used if the
+voiced vs unvoiced mode matches between the BFI frame and the saved frame used
+by the ECU.  This logic resides in the Rx front end that is shared between full
+decoder and TFO transform implementations.
+
+There is, however, an extension to this logic original to Themyscira: if the
+BFI word in the 22-word decoder input frame equals 2 instead of 1 (not allowed
+in ETSI reference code where BFI is strictly a binary flag), the frame is
+treated as BFI-no-data and no parameter bits are ever used in any state.
+Higher-level RTP input functions, described later in this article, feed this
+BFI=2 code to the speech decoder or TFO transform engine when RTP input is BFI
+without payload bits.
+
+Representation and handling of invalid SID
+------------------------------------------
+
+If a decoder input frame in the canonical 22-word *.dec format has indicators
+set to either SID=1 (irrespective of other flags) or SID=2 with either BFI or
+UFI nonzero, that input frame is invalid SID per GSM 06.41.  All 18 speech
+parameters are fully ignored in such frames, always.
+
+In RTP transport these invalid SID frames can be represented only in TW-TS-002,
+not in either of the two non-Themyscira standards.  TW-TS-002 offers the option
+of either including or omitting payload bits in invalid SID packets - however,
+if invalid SID payload bits are included, they are ignored by our speech decoder
+and TFO transform engines.
+
+Libgsmhr1 general usage
+=======================
+
+The external public interface to Themyscira libgsmhr1 consists of a single
+header file <tw_gsmhr.h>; it should be installed in some system include
+directory.
+
+The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function
+prototypes), const qualifier is used where appropriate, and the interface is
+defined in terms of <stdint.h> types; <tw_gsmhr.h> includes <stdint.h>.
+
+State allocation and freeing
+============================
+
+In order to use the speech encoder, you will need to allocate an encoder state
+structure, and to use the speech decoder, you will need to allocate a decoder
+state structure.  The same goes for the stateful TFO transform.  The necessary
+state allocation functions are:
+
+struct gsmhr_encoder_state *gsmhr_encoder_create(int dtx);
+struct gsmhr_decoder_state *gsmhr_decoder_create(void);
+struct gsmhr_rxfe_state *gsmhr_rxfe_create(void);	/* TFO transform */
+
+struct gsmhr_encoder_state, struct gsmhr_decoder_state and struct
+gsmhr_rxfe_state are opaque structures to library users: you only get pointers
+which you remember and pass around, but <tw_gsmhr.h> does not give you full
+definitions of these structs.  As a library user, you ordinarily don't even
+need to know the size of these structs, hence the necessary malloc() operation
+happens inside gsmhr_encoder_create(), gsmhr_decoder_create() and
+gsmhr_rxfe_create() functions.  (But see the following section regarding
+alternative memory allocation schemes.)  However, each structure is malloc'ed
+as a single chunk, hence when you are done with it, simply call free() to
+relinquish each encoder, decoder or TFO state instance.
+
+gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions
+can fail if the malloc() call inside fails, in which case these libgsmhr1
+functions return NULL.
+
+The dtx argument to gsmhr_encoder_create() is a Boolean flag represented as an
+int; it tells the GSM-HR speech encoder whether it should operate with DTX
+enabled (run GSM 06.42 VAD and emit SID frames instead of speech frames per
+GSM 06.41) or DTX disabled (skip VAD and always emit speech frames).
+
+It should be noted that the original GSM-HR speech encoder from ETSI always runs
+GSM 06.42 VAD algorithm, whether DTX is enabled or disabled; if DTX is disabled,
+VAD flag is forced to 1, but all VAD and Tx DTX logic still executes and burns
+up CPU cycles.  Due to work scope limits described in HR-codec-limits article,
+this poor design from ETSI has been retained at the present.  However, the API
+design allows DTX enable-or-disable flag to be changed only with a full reset
+of the speech encoder, rather than per frame - this API design thus prepares
+for the possibility of cleaning up this implementation in the future and
+executing VAD and Tx DTX code only when DTX is enabled in the speech encoding
+direction.
+
+State reset functions
+=====================
+
+void gsmhr_encoder_reset(struct gsmhr_encoder_state *st, int dtx);
+void gsmhr_decoder_reset(struct gsmhr_decoder_state *st);
+void gsmhr_rxfe_reset(struct gsmhr_rxfe_state *st);
+
+Each of these functions resets the state of the corresponding element to its
+initial or "home" state.  Home states for the standard speech encoder and
+decoder are given in GSM 06.20 sections 5.5 and 5.6, respectively; the home
+state for the separated-out RxFE block (used for TFO transform) is a subset of
+the full speech decoder home state as relevant to the reduced functionality.
+
+The following const "variables" are exported by the library in order to
+facilitate alternative memory allocation schemes:
+
+extern const unsigned gsmhr_encoder_state_size;
+extern const unsigned gsmhr_decoder_state_size;
+extern const unsigned gsmhr_rxfe_state_size;
+
+Using these const "variables", an application can allocate buffers of the
+correct size for each state structure, and then initialize each newly allocated
+state structure with gsmhr_*_reset(), as an alternative to gsmhr_*_create()
+functions.  Each of the standard gsmhr_*_create() functions allocates a buffer
+of the correct size using standard malloc(), then initializes it with the
+corresponding gsmhr_*_reset() function - hence the lower-level approach can be
+used by applications that desire some other memory allocation scheme than
+standard malloc().
+
+Using the speech encoder
+========================
+
+To encode one 20 ms audio frame per GSM-HR, call gsmhr_encode_frame():
+
+void gsmhr_encode_frame(struct gsmhr_encoder_state *st, const int16_t *pcm,
+			int16_t *param);
+
+You need to provide an encoder state structure allocated earlier with
+gsmhr_encoder_create(), a block of 160 linear PCM samples, and an output buffer
+of 20 (GSMHR_NUM_PARAMS_ENC) 16-bit words into which the encoded GSM-HR frame
+will be written.  The encoded frame format emitted by this function is the same
+as in the reference implementation from ETSI: 18 words of speech parameters
+followed by VAD and SP flags.  Stateless format conversion functions described
+later in this document can be used to emit more commonly used RTP formats.
+
+The mandatory encoder homing function is included: if the input frame matches
+the encoder homing frame, the encoder state is reset to the home state at the
+end of gsmhr_encode_frame() processing.
+
+Using the speech decoder
+========================
+
+Our speech decoder main function is:
+
+void gsmhr_decode_frame(struct gsmhr_decoder_state *st, const int16_t *param,
+			int16_t *pcm);
+
+The input frame format is the canonical one from ETSI: 22 (GSMHR_NUM_PARAMS_DEC)
+16-bit words providing speech parameters followed by BFI, UFI, SID and TAF
+metadata flags.  In normal operation, this internal canonical form of speech
+decoder input will be provided by one of the stateless format conversion
+functions in the same libgsmhr1.
+
+Important note: the parameter frame input to this function is expected to be
+valid, i.e., it is NOT subjected to explicit validation checks!  If your
+application reads *.dec files or otherwise receives this format directly from
+some external source (as opposed to output from one of our own format conversion
+functions), you need to validate these bits with gsmhr_check_decoder_params()
+before feeding them to the decoder engine!
+
+This speech decoder function includes all mandatory logic for decoder homing:
+special handling of the homed state, decoder homing frame checks of both full
+and partial (to the first subframe) kinds, internal state reset and EHF output.
+
+TFO transform function
+======================
+
+To operate our TFO transform for GSM-HR codec, create a standalone RxFE state
+structure (use gsmhr_rxfe_create() or your own allocation followed by
+gsmhr_rxfe_reset()) and then call this function for every frame to be processed:
+
+void gsmhr_tfo_xfrm(struct gsmhr_rxfe_state *st, int dtxd, const int16_t *ul,
+		    int16_t *dl);
+
+UL Rx input has the same form as input to gsmhr_decode_frame(), and the same
+caveats apply in terms of validation checks.  DL Tx output is emitted in the
+same form as gsmhr_encode_frame() output, complete with VAD and SP flags.  VAD
+output should be considered a dummy, but SP output flag is valid: 1 in the case
+of a speech frame or 0 in the case of a SID frame; the latter is possible only
+when DTXd is enabled.
+
+DTXd control: dtxd argument to gsmhr_tfo_xfrm() tells the transform if it should
+emit both speech and SID frames or speech frames only, corresponding to DTXd
+flag in TRAU-UL frames that affects both speech encoder and TFO transform
+functions in traditional TRAUs.  This flag can be changed mid-session: as
+explained in HR-codec-Rx-logic article, our implementation of TFO transform
+proceeds by applying classic Rx front end processing that only emits speech
+frames, and then replacing output with SID frames under certain conditions if
+DTXd is enabled.