view doc/HR-codec-library @ 640:e0e5905261e2 default tip

document tw5b-dump and tw5c-dump
author Mychaela Falconia <falcon@freecalypso.org>
date Fri, 20 Mar 2026 06:43:50 +0000
parents 723265aea9f8
children
line wrap: on
line source

Themyscira libgsmhr1: library for GSM-HR codec
==============================================

The present library provides the following functionalities related to GSM-HR
speech codec, also known as HRv1:

* Stateful speech encoder and decoder engines based on GSM 06.06 reference code
  from ETSI;

* TS 28.062 section C.3.2.1.1 stateful TFO transform for this codec;

* A rich set of stateless utility functions for format conversion and other
  common manipulations.

Compared to libgsmhr alternative implemented as part of Osmocom gapk, our
implementation provides the following advantages:

* In our librification of ETSI GSM-HR code, speech encoder and decoder engines
  have been outfitted with proper state structures, as opposed to the hack of
  treating the entire bss segment with global variables as a poor man's state
  structure.

* The Rx front end has been factored out of the speech decoder and can also be
  used as a TFO transform.  Because HRv1 codec falls chronologically between
  FRv1 and EFR, our TFO transform for HRv1 serves as a stepping stone toward
  future work on TFO transform for EFR.

* We made a slight extension to this GSM-HR Rx front end, applicable to both
  full decoder and TFO transform configurations, to support BFI without payload
  bits.  This condition occurs in the case of FACCH stealing, packet loss in IP
  transport, or the non-modifiable DSP PHY implementation in sysmoBTS that does
  not provide erroneous payload bits along with BFI.

* We added many format conversion and other utility functions that are
  Themyscira original work, not from ETSI GSM-HR code.

However, because of very limited practical utility of GSM-HR codec, almost no
work has been done to speed up any of the grossly inefficient code that
originates from ETSI - see HR-codec-limits article.

GSM-HR codec frame formats
==========================

Our speech encoder, speech decoder and TFO transform engines operate on the
same canonical formats as the original reference code from ETSI:

* The output from our speech encoder engine for each frame is an array of 20
  16-bit words (18 codec parameters followed by VAD and SP flags) that matches
  ETSI *.cod format.

* The input format to our speech decoder engine is an array of 22 16-bit words
  (18 codec parameters followed by BFI, UFI, SID and TAF) as represented in
  ETSI *.dec format.

* The input format to our TFO transform implementation is a decoder input frame
  (*.dec), and the output mimics an encoder output (*.cod) frame complete with
  VAD and SP flags.  (The latter outputs are VAD=1 SP=1 in the case of DTXd=0,
  or VAD=0 dummy and correct SP output in the case of DTXd=1.)

All other standard formats for GSM-HR codec frames, namely TS 101 318, RFC 5993
and TW-TS-002, are supported via stateless format conversion functions.

Representation and handling of BFI
----------------------------------

If a decoder input frame in the canonical 22-word *.dec format has BFI=1 and
SID=0, it is a non-SID BFI frame, also called an unusable frame in GSM 06.41
spec.  If such frame arrives in comfort noise insertion state, all parameters
are ignored.  On the other hand, if such frame arrives outside of DTX state,
when ECU logic is applied instead, our version retains the logic from ETSI
reference code in that codevector parameters from BFI frames are used if the
voiced vs unvoiced mode matches between the BFI frame and the saved frame used
by the ECU.  This logic resides in the Rx front end that is shared between full
decoder and TFO transform implementations.

There is, however, an extension to this logic original to Themyscira: if the
BFI word in the 22-word decoder input frame equals 2 instead of 1 (not allowed
in ETSI reference code where BFI is strictly a binary flag), the frame is
treated as BFI-no-data and no parameter bits are ever used in any state.
Higher-level RTP input functions, described later in this article, feed this
BFI=2 code to the speech decoder or TFO transform engine when RTP input is BFI
without payload bits.

Representation and handling of invalid SID
------------------------------------------

If a decoder input frame in the canonical 22-word *.dec format has indicators
set to either SID=1 (irrespective of other flags) or SID=2 with either BFI or
UFI nonzero, that input frame is invalid SID per GSM 06.41.  All 18 speech
parameters are fully ignored in such frames, always.

In RTP transport these invalid SID frames can be represented only in TW-TS-002,
not in either of the two non-Themyscira standards.  TW-TS-002 offers the option
of either including or omitting payload bits in invalid SID packets - however,
if invalid SID payload bits are included, they are ignored by our speech decoder
and TFO transform engines.

Libgsmhr1 general usage
=======================

The external public interface to Themyscira libgsmhr1 consists of a single
header file <tw_gsmhr.h>; it should be installed in some system include
directory.

The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function
prototypes), const qualifier is used where appropriate, and the interface is
defined in terms of <stdint.h> types; <tw_gsmhr.h> includes <stdint.h>.

State allocation and freeing
============================

In order to use the speech encoder, you will need to allocate an encoder state
structure, and to use the speech decoder, you will need to allocate a decoder
state structure.  The same goes for the stateful TFO transform.  The necessary
state allocation functions are:

struct gsmhr_encoder_state *gsmhr_encoder_create(int dtx);
struct gsmhr_decoder_state *gsmhr_decoder_create(void);
struct gsmhr_rxfe_state *gsmhr_rxfe_create(void);	/* TFO transform */

struct gsmhr_encoder_state, struct gsmhr_decoder_state and struct
gsmhr_rxfe_state are opaque structures to library users: you only get pointers
which you remember and pass around, but <tw_gsmhr.h> does not give you full
definitions of these structs.  As a library user, you ordinarily don't even
need to know the size of these structs, hence the necessary malloc() operation
happens inside gsmhr_encoder_create(), gsmhr_decoder_create() and
gsmhr_rxfe_create() functions.  (But see the following section regarding
alternative memory allocation schemes.)  However, each structure is malloc'ed
as a single chunk, hence when you are done with it, simply call free() to
relinquish each encoder, decoder or TFO state instance.

gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions
can fail if the malloc() call inside fails, in which case these libgsmhr1
functions return NULL.

The dtx argument to gsmhr_encoder_create() is a Boolean flag represented as an
int; it tells the GSM-HR speech encoder whether it should operate with DTX
enabled (run GSM 06.42 VAD and emit SID frames instead of speech frames per
GSM 06.41) or DTX disabled (skip VAD and always emit speech frames).

It should be noted that the original GSM-HR speech encoder from ETSI always runs
GSM 06.42 VAD algorithm, whether DTX is enabled or disabled; if DTX is disabled,
VAD flag is forced to 1, but all VAD and Tx DTX logic still executes and burns
up CPU cycles.  Due to work scope limits described in HR-codec-limits article,
this poor design from ETSI has been retained at the present.  However, the API
design allows DTX enable-or-disable flag to be changed only with a full reset
of the speech encoder, rather than per frame - this API design thus prepares
for the possibility of cleaning up this implementation in the future and
executing VAD and Tx DTX code only when DTX is enabled in the speech encoding
direction.

State reset functions
=====================

void gsmhr_encoder_reset(struct gsmhr_encoder_state *st, int dtx);
void gsmhr_decoder_reset(struct gsmhr_decoder_state *st);
void gsmhr_rxfe_reset(struct gsmhr_rxfe_state *st);

Each of these functions resets the state of the corresponding element to its
initial or "home" state.  Home states for the standard speech encoder and
decoder are given in GSM 06.20 sections 5.5 and 5.6, respectively; the home
state for the separated-out RxFE block (used for TFO transform) is a subset of
the full speech decoder home state as relevant to the reduced functionality.

The following const "variables" are exported by the library in order to
facilitate alternative memory allocation schemes:

extern const unsigned gsmhr_encoder_state_size;
extern const unsigned gsmhr_decoder_state_size;
extern const unsigned gsmhr_rxfe_state_size;

Using these const "variables", an application can allocate buffers of the
correct size for each state structure, and then initialize each newly allocated
state structure with gsmhr_*_reset(), as an alternative to gsmhr_*_create()
functions.  Each of the standard gsmhr_*_create() functions allocates a buffer
of the correct size using standard malloc(), then initializes it with the
corresponding gsmhr_*_reset() function - hence the lower-level approach can be
used by applications that desire some other memory allocation scheme than
standard malloc().

Using the speech encoder
========================

To encode one 20 ms audio frame per GSM-HR, call gsmhr_encode_frame():

void gsmhr_encode_frame(struct gsmhr_encoder_state *st, const int16_t *pcm,
			int16_t *param);

You need to provide an encoder state structure allocated earlier with
gsmhr_encoder_create(), a block of 160 linear PCM samples, and an output buffer
of 20 (GSMHR_NUM_PARAMS_ENC) 16-bit words into which the encoded GSM-HR frame
will be written.  The encoded frame format emitted by this function is the same
as in the reference implementation from ETSI: 18 words of speech parameters
followed by VAD and SP flags.  Stateless format conversion functions described
later in this document can be used to emit more commonly used RTP formats.

The mandatory encoder homing function is included: if the input frame matches
the encoder homing frame, the encoder state is reset to the home state at the
end of gsmhr_encode_frame() processing.

Using the speech decoder
========================

Our speech decoder main function is:

void gsmhr_decode_frame(struct gsmhr_decoder_state *st, const int16_t *param,
			int16_t *pcm);

The input frame format is the canonical one from ETSI: 22 (GSMHR_NUM_PARAMS_DEC)
16-bit words providing speech parameters followed by BFI, UFI, SID and TAF
metadata flags.  In normal operation, this internal canonical form of speech
decoder input will be provided by one of the stateless format conversion
functions in the same libgsmhr1.

Important note: the parameter frame input to this function is expected to be
valid, i.e., it is NOT subjected to explicit validation checks!  If your
application reads *.dec files or otherwise receives this format directly from
some external source (as opposed to output from one of our own format conversion
functions), you need to validate these bits with gsmhr_check_decoder_params()
before feeding them to the decoder engine!

This speech decoder function includes all mandatory logic for decoder homing:
special handling of the homed state, decoder homing frame checks of both full
and partial (to the first subframe) kinds, internal state reset and EHF output.

TFO transform function
======================

To operate our TFO transform for GSM-HR codec, create a standalone RxFE state
structure (use gsmhr_rxfe_create() or your own allocation followed by
gsmhr_rxfe_reset()) and then call this function for every frame to be processed:

void gsmhr_tfo_xfrm(struct gsmhr_rxfe_state *st, int dtxd, const int16_t *ul,
		    int16_t *dl);

UL Rx input has the same form as input to gsmhr_decode_frame(), and the same
caveats apply in terms of validation checks.  DL Tx output is emitted in the
same form as gsmhr_encode_frame() output, complete with VAD and SP flags.  VAD
output should be considered a dummy, but SP output flag is valid: 1 in the case
of a speech frame or 0 in the case of a SID frame; the latter is possible only
when DTXd is enabled.

DTXd control: dtxd argument to gsmhr_tfo_xfrm() tells the transform if it should
emit both speech and SID frames or speech frames only, corresponding to DTXd
flag in TRAU-UL frames that affects both speech encoder and TFO transform
functions in traditional TRAUs.  This flag can be changed mid-session: as
explained in HR-codec-Rx-logic article, our implementation of TFO transform
proceeds by applying classic Rx front end processing that only emits speech
frames, and then replacing output with SID frames under certain conditions if
DTXd is enabled.

Stateless utility functions
===========================

All functions in this section are stateless (no encoder, decoder or RxFE state
structure is needed); they merely manipulate data formats.

void gsmhr_pack_ts101318(const int16_t *param, uint8_t *payload);

This function converts a 112-bit GSM-HR codec frame from an array of speech
parameters (18 16-bit words) into the packed format of ETSI TS 101 318, which
is a buffer of 14 octets with every bit used for payload.  Any extraneous bits
in input 16-bit words (beyond the size of each parameter in bits) are ignored.

void gsmhr_unpack_ts101318(const uint8_t *payload, int16_t *param);

This function converts a 112-bit GSM-HR codec frame from the packed format of
TS 101 318 into an array of 18 speech parameters.

void gsmhr_encoder_twts002_out(const int16_t *param, uint8_t *payload);

This function converts a cod-style frame (output from gsmhr_encode_frame() or
gsmhr_tfo_xfrm(), or read from an ETSI *.cod file) into TW-TS-002 format.  The
output is always 15 octets long (the buffer must have this much room), and is
valid per both RFC 5993 and TW-TS-002 specs.  The only two possible frame types
in this context are good speech and good SID, distinguished by SP flag in the
cod-style input and by FT field in RFC 5993 output.

int gsmhr_decoder_twts002_in(const uint8_t *payload, int16_t *param);

This function reads a super-5993 frame in TW-TS-002 format from a buffer and
converts it into the required form for input to gsmhr_decode_frame() or
gsmhr_tfo_xfrm(), which is an extended form of ETSI's *.dec format.  The input
must be a valid super-5993 in the following sense:

* The first octet in the buffer must be valid ToC per TW-TS-002 section 5.1;

* F bit in this ToC octet must be cleared;

* FT field must equal 0, 1, 2, 6 or 7 per TW-TS-002 section 5.2;

* If FT equals 0, 2 or 6, the ToC octet must be followed by 14 octets of frame
  payload.

If any of these rules are violated, gsmhr_decoder_twts002_in() returns a
negative value (-1 if F bit is set or -2 if FT is invalid) and does not write
anything into the output array.  Otherwise, the function returns 0 (indicating
success) and the output array is filled as follows:

* For frame types 0, 2 and 6, the 18 speech parameters are filled from the
  TS-101-318-like payload portion of super-5993 input.

* For frame types 1 and 7, the 18 speech parameters are set to all zeros, with
  the expectation that gsmhr_decode_frame() or gsmhr_tfo_xfrm() will ignore
  them.  Please note that "verbose" invalid SID bits that may be present in
  TW-TS-002 transport are ignored.

* The 4 metadata flags BFI, UFI, SID and TAF are set based on FT and the
  additional ToC flags defined in TW-TS-002 section 5.3.

* Themyscira extension of BFI=2, described earlier in this document, is used
  to represent FT=7.

* Invalid SID frames (FT=1) are converted to BFI=1 SID=1.

int gsmhr_rtp_in_preen(const uint8_t *rtp_in, unsigned rtp_in_len,
			uint8_t *canon_pl);

This function performs initial processing of RTP input that is expected to be
one of the defined RTP formats for GSM-HR codec.  It accepts all possibilities
of TW-TS-002, RFC 5993 or TS 101 318 (listed in ThemWi order of preference) and
writes canonical TW-TS-002 super-5993 format into a buffer.  The output buffer
must have 15 bytes of space, and the frame written into this buffer will ALWAYS
be a valid input to gsmhr_decoder_twts002_in() function described above.

The input arguments are RTP payload and its length.  The return value is 0 if
RTP input was in a recognized format, or -1 if it is invalid.  In the case of
invalid RTP input, the output is filled with ToC of 0x70 (BFI with no data) -
the output is always valid.

Zero-length RTP payloads are acceptable; if rtp_in_len is 0, then rtp_in pointer
may be NULL.  The output in this case is filled with ToC of 0x70 (BFI with no
data), but the return value is 0, indicating success.  The intent is that truly
invalid RTP payloads are error events which should be counted, while NULL input
is a normal occurrence when ThemWi jitter buffer (twjit) does not hold a
previously received RTP packet that maps to the current tick.  (Actually
transmitted RTP packets with a zero-length payloads are also possible: they are
ThemWi preferred alternative to IETF approach of intentional gaps in the RTP
stream.)

int gsmhr_rtp_in_direct(const uint8_t *rtp_in, unsigned rtp_in_len,
			int16_t *param);

This function is fully equivalent to calling first gsmhr_rtp_in_preen(), then
gsmhr_decoder_twts002_in().  It is however slightly more efficient, as it avoids
the intermediate buffer and some copying.  The return value is the same as
gsmhr_rtp_in_preen(), and just like with that function, the output is always
valid.

Reading *.cod and *.dec files
-----------------------------

The most native representation format for GSM-HR codec frames in libgsmhr1 is
arrays of broken-down speech parameters.  However, unlike TS 101 318 format in
which every possible bit pattern is a plausible GSM-HR codec frame, an array of
broken-down parameters that purports to be a GSM-HR frame can contain garbage.
The additional metadata flags in the canonical decoder input format can also
contain garbage - which our speech decoder and TFO transform engines are NOT
prepared for!  There is no potential for malfunction if these arrays of
parameters and metadata flags come only from libgsmhr1 functions - but if an
application needs to read *.cod or *.dec files, or otherwise accept external
input in any of these formats, then an explicit validation step is required.

int gsmhr_check_common_params(const int16_t *params);

This function examines an array of 18 codec parameters in the int16_t
representation used in this library, and checks if the unused upper bits of
each int16_t word are cleared as they should be.  The return value is 0 if the
frame is valid or -1 if some extraneous high bits are set.

int gsmhr_check_encoder_params(const int16_t *params);

This function examines a frame of 20 int16_t words that corresponds to GSM-HR
encoder output format, and checks if the unused upper bits of each int16_t word
are cleared as they should be.  This function should be used when reading from
ETSI-format *.cod files, to guard against reading garbage or wrong endian.  The
return value is 0 if the frame is valid or -1 if some extraneous high bits are
set.

int gsmhr_check_decoder_params(const int16_t *params);

This function examines a frame of 22 int16_t words that corresponds to GSM-HR
decoder input format, and checks if the unused upper bits of each int16_t word
are cleared as they should be.  This function should be used when reading from
ETSI-format *.dec files, to guard against reading garbage or wrong endian.  The
return value is 0 if the frame is valid or -1 if some extraneous high bits are
set.  Both BFI and SID words are limited to range [0,2], i.e., Themyscira BFI=2
extension is accepted.

SID field manipulation
----------------------

Unlike FR and EFR, GSM-HR codec lacks fixed rules for Rx frame classification
as valid SID, invalid SID or non-SID speech.  The BTS makes this classification
decision according to its internal private rules, and the SID flag then needs
to be carried out of band in Abis, Ater and TFO.  GSM 08.61 and TW-TS-002
(extended 5993) formats provide the necessary out-of-band SID indication, but
the bare format of TS 101 318 does not.  Therefore, the only kind of GSM-HR SID
that can be represented in TS 101 318 format are perfect, 100% error-free SID
frames in which all 79 bits of the SID field are set to 1.

int gsmhr_ts101318_is_perfect_sid(const uint8_t *payload);

This function checks the given TS 101 318 payload for the possibility of
perfect SID.  The return value is 2 (GSM 06.41 code for valid SID) if the frame
is indeed a perfect SID, or 0 (GSM 06.41 code for non-SID speech) otherwise.

void gsmhr_ts101318_set_sid_codeword(uint8_t *payload);

This function sets all 79 bits of the SID field to 1s, forming a perfect SID
frame in the 14-byte buffer.  The first 33 bits that carry R0 and LPC parameters
must already be filled correctly.

void gsmhr_set_sid_cw_params(int16_t *params);

This function fills parameters 4 through 17 of generated SID frames, setting
them to the required SID codeword.  It can also be used to transform a speech
frame into a SID frame with the same R0 and LPC parameters.  It is logically
equivalent to gsmhr_ts101318_set_sid_codeword(), but operates on the array of
parameters form, rather than TS 101 318 packed format.

Public constant definitions
===========================

Our public header file <tw_gsmhr.h> provides these constant definitions, which
should be self-explanatory:

#define	GSMHR_NUM_PARAMS	18	/* actual codec parameters */
#define	GSMHR_NUM_PARAMS_ENC	20	/* output from the encoder */
#define	GSMHR_NUM_PARAMS_DEC	22	/* input to the decoder */

#define	GSMHR_FRAME_LEN_RPF	14	/* raw packed format */
#define	GSMHR_FRAME_LEN_5993	15	/* RFC 5993 and TW-TS-002 */

Public const data items
=======================

The spec-defined decoder homing frame for GSM-HR is provided in both native
(array of parameters) and packed (TS 101 318) formats:

extern const int16_t gsmhr_dhf_params[GSMHR_NUM_PARAMS];
extern const uint8_t gsmhr_dhf_ts101318[GSMHR_FRAME_LEN_RPF];