FreeCalypso > hg > gsm-codec-lib
diff doc/HR-codec-library @ 632:7fc57e2a6784
beginning of GSM-HR documentation
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Thu, 19 Mar 2026 04:13:45 +0000 |
| parents | |
| children | 3ab76caba41c |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/HR-codec-library Thu Mar 19 04:13:45 2026 +0000 @@ -0,0 +1,250 @@ +Themyscira libgsmhr1: library for GSM-HR codec +============================================== + +The present library provides the following functionalities related to GSM-HR +speech codec, also known as HRv1: + +* Stateful speech encoder and decoder engines based on GSM 06.06 reference code + from ETSI; + +* TS 28.062 section C.3.2.1.1 stateful TFO transform for this codec; + +* A rich set of stateless utility functions for format conversion and other + common manipulations. + +Compared to libgsmhr alternative implemented as part of Osmocom gapk, our +implementation provides the following advantages: + +* In our librification of ETSI GSM-HR code, speech encoder and decoder engines + have been outfitted with proper state structures, as opposed to the hack of + treating the entire bss segment with global variables as a poor man's state + structure. + +* The Rx front end has been factored out of the speech decoder and can also be + used as a TFO transform. Because HRv1 codec falls chronologically between + FRv1 and EFR, our TFO transform for HRv1 serves as a stepping stone toward + future work on TFO transform for EFR. + +* We made a slight extension to this GSM-HR Rx front end, applicable to both + full decoder and TFO transform configurations, to support BFI without payload + bits. This condition occurs in the case of FACCH stealing, packet loss in IP + transport, or the non-modifiable DSP PHY implementation in sysmoBTS that does + not provide erroneous payload bits along with BFI. + +* We added many format conversion and other utility functions that are + Themyscira original work, not from ETSI GSM-HR code. + +However, because of very limited practical utility of GSM-HR codec, almost no +work has been done to speed up any of the grossly inefficient code that +originates from ETSI - see HR-codec-limits article. + +GSM-HR codec frame formats +========================== + +Our speech encoder, speech decoder and TFO transform engines operate on the +same canonical formats as the original reference code from ETSI: + +* The output from our speech encoder engine for each frame is an array of 20 + 16-bit words (18 codec parameters followed by VAD and SP flags) that matches + ETSI *.cod format. + +* The input format to our speech decoder engine is an array of 22 16-bit words + (18 codec parameters followed by BFI, UFI, SID and TAF) as represented in + ETSI *.dec format. + +* The input format to our TFO transform implementation is a decoder input frame + (*.dec), and the output mimics an encoder output (*.cod) frame complete with + VAD and SP flags. (The latter outputs are VAD=1 SP=1 in the case of DTXd=0, + or VAD=0 dummy and correct SP output in the case of DTXd=1.) + +All other standard formats for GSM-HR codec frames, namely TS 101 318, RFC 5993 +and TW-TS-002, are supported via stateless format conversion functions. + +Representation and handling of BFI +---------------------------------- + +If a decoder input frame in the canonical 22-word *.dec format has BFI=1 and +SID=0, it is a non-SID BFI frame, also called an unusable frame in GSM 06.41 +spec. If such frame arrives in comfort noise insertion state, all parameters +are ignored. On the other hand, if such frame arrives outside of DTX state, +when ECU logic is applied instead, our version retains the logic from ETSI +reference code in that codevector parameters from BFI frames are used if the +voiced vs unvoiced mode matches between the BFI frame and the saved frame used +by the ECU. This logic resides in the Rx front end that is shared between full +decoder and TFO transform implementations. + +There is, however, an extension to this logic original to Themyscira: if the +BFI word in the 22-word decoder input frame equals 2 instead of 1 (not allowed +in ETSI reference code where BFI is strictly a binary flag), the frame is +treated as BFI-no-data and no parameter bits are ever used in any state. +Higher-level RTP input functions, described later in this article, feed this +BFI=2 code to the speech decoder or TFO transform engine when RTP input is BFI +without payload bits. + +Representation and handling of invalid SID +------------------------------------------ + +If a decoder input frame in the canonical 22-word *.dec format has indicators +set to either SID=1 (irrespective of other flags) or SID=2 with either BFI or +UFI nonzero, that input frame is invalid SID per GSM 06.41. All 18 speech +parameters are fully ignored in such frames, always. + +In RTP transport these invalid SID frames can be represented only in TW-TS-002, +not in either of the two non-Themyscira standards. TW-TS-002 offers the option +of either including or omitting payload bits in invalid SID packets - however, +if invalid SID payload bits are included, they are ignored by our speech decoder +and TFO transform engines. + +Libgsmhr1 general usage +======================= + +The external public interface to Themyscira libgsmhr1 consists of a single +header file <tw_gsmhr.h>; it should be installed in some system include +directory. + +The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function +prototypes), const qualifier is used where appropriate, and the interface is +defined in terms of <stdint.h> types; <tw_gsmhr.h> includes <stdint.h>. + +State allocation and freeing +============================ + +In order to use the speech encoder, you will need to allocate an encoder state +structure, and to use the speech decoder, you will need to allocate a decoder +state structure. The same goes for the stateful TFO transform. The necessary +state allocation functions are: + +struct gsmhr_encoder_state *gsmhr_encoder_create(int dtx); +struct gsmhr_decoder_state *gsmhr_decoder_create(void); +struct gsmhr_rxfe_state *gsmhr_rxfe_create(void); /* TFO transform */ + +struct gsmhr_encoder_state, struct gsmhr_decoder_state and struct +gsmhr_rxfe_state are opaque structures to library users: you only get pointers +which you remember and pass around, but <tw_gsmhr.h> does not give you full +definitions of these structs. As a library user, you ordinarily don't even +need to know the size of these structs, hence the necessary malloc() operation +happens inside gsmhr_encoder_create(), gsmhr_decoder_create() and +gsmhr_rxfe_create() functions. (But see the following section regarding +alternative memory allocation schemes.) However, each structure is malloc'ed +as a single chunk, hence when you are done with it, simply call free() to +relinquish each encoder, decoder or TFO state instance. + +gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions +can fail if the malloc() call inside fails, in which case these libgsmhr1 +functions return NULL. + +The dtx argument to gsmhr_encoder_create() is a Boolean flag represented as an +int; it tells the GSM-HR speech encoder whether it should operate with DTX +enabled (run GSM 06.42 VAD and emit SID frames instead of speech frames per +GSM 06.41) or DTX disabled (skip VAD and always emit speech frames). + +It should be noted that the original GSM-HR speech encoder from ETSI always runs +GSM 06.42 VAD algorithm, whether DTX is enabled or disabled; if DTX is disabled, +VAD flag is forced to 1, but all VAD and Tx DTX logic still executes and burns +up CPU cycles. Due to work scope limits described in HR-codec-limits article, +this poor design from ETSI has been retained at the present. However, the API +design allows DTX enable-or-disable flag to be changed only with a full reset +of the speech encoder, rather than per frame - this API design thus prepares +for the possibility of cleaning up this implementation in the future and +executing VAD and Tx DTX code only when DTX is enabled in the speech encoding +direction. + +State reset functions +===================== + +void gsmhr_encoder_reset(struct gsmhr_encoder_state *st, int dtx); +void gsmhr_decoder_reset(struct gsmhr_decoder_state *st); +void gsmhr_rxfe_reset(struct gsmhr_rxfe_state *st); + +Each of these functions resets the state of the corresponding element to its +initial or "home" state. Home states for the standard speech encoder and +decoder are given in GSM 06.20 sections 5.5 and 5.6, respectively; the home +state for the separated-out RxFE block (used for TFO transform) is a subset of +the full speech decoder home state as relevant to the reduced functionality. + +The following const "variables" are exported by the library in order to +facilitate alternative memory allocation schemes: + +extern const unsigned gsmhr_encoder_state_size; +extern const unsigned gsmhr_decoder_state_size; +extern const unsigned gsmhr_rxfe_state_size; + +Using these const "variables", an application can allocate buffers of the +correct size for each state structure, and then initialize each newly allocated +state structure with gsmhr_*_reset(), as an alternative to gsmhr_*_create() +functions. Each of the standard gsmhr_*_create() functions allocates a buffer +of the correct size using standard malloc(), then initializes it with the +corresponding gsmhr_*_reset() function - hence the lower-level approach can be +used by applications that desire some other memory allocation scheme than +standard malloc(). + +Using the speech encoder +======================== + +To encode one 20 ms audio frame per GSM-HR, call gsmhr_encode_frame(): + +void gsmhr_encode_frame(struct gsmhr_encoder_state *st, const int16_t *pcm, + int16_t *param); + +You need to provide an encoder state structure allocated earlier with +gsmhr_encoder_create(), a block of 160 linear PCM samples, and an output buffer +of 20 (GSMHR_NUM_PARAMS_ENC) 16-bit words into which the encoded GSM-HR frame +will be written. The encoded frame format emitted by this function is the same +as in the reference implementation from ETSI: 18 words of speech parameters +followed by VAD and SP flags. Stateless format conversion functions described +later in this document can be used to emit more commonly used RTP formats. + +The mandatory encoder homing function is included: if the input frame matches +the encoder homing frame, the encoder state is reset to the home state at the +end of gsmhr_encode_frame() processing. + +Using the speech decoder +======================== + +Our speech decoder main function is: + +void gsmhr_decode_frame(struct gsmhr_decoder_state *st, const int16_t *param, + int16_t *pcm); + +The input frame format is the canonical one from ETSI: 22 (GSMHR_NUM_PARAMS_DEC) +16-bit words providing speech parameters followed by BFI, UFI, SID and TAF +metadata flags. In normal operation, this internal canonical form of speech +decoder input will be provided by one of the stateless format conversion +functions in the same libgsmhr1. + +Important note: the parameter frame input to this function is expected to be +valid, i.e., it is NOT subjected to explicit validation checks! If your +application reads *.dec files or otherwise receives this format directly from +some external source (as opposed to output from one of our own format conversion +functions), you need to validate these bits with gsmhr_check_decoder_params() +before feeding them to the decoder engine! + +This speech decoder function includes all mandatory logic for decoder homing: +special handling of the homed state, decoder homing frame checks of both full +and partial (to the first subframe) kinds, internal state reset and EHF output. + +TFO transform function +====================== + +To operate our TFO transform for GSM-HR codec, create a standalone RxFE state +structure (use gsmhr_rxfe_create() or your own allocation followed by +gsmhr_rxfe_reset()) and then call this function for every frame to be processed: + +void gsmhr_tfo_xfrm(struct gsmhr_rxfe_state *st, int dtxd, const int16_t *ul, + int16_t *dl); + +UL Rx input has the same form as input to gsmhr_decode_frame(), and the same +caveats apply in terms of validation checks. DL Tx output is emitted in the +same form as gsmhr_encode_frame() output, complete with VAD and SP flags. VAD +output should be considered a dummy, but SP output flag is valid: 1 in the case +of a speech frame or 0 in the case of a SID frame; the latter is possible only +when DTXd is enabled. + +DTXd control: dtxd argument to gsmhr_tfo_xfrm() tells the transform if it should +emit both speech and SID frames or speech frames only, corresponding to DTXd +flag in TRAU-UL frames that affects both speech encoder and TFO transform +functions in traditional TRAUs. This flag can be changed mid-session: as +explained in HR-codec-Rx-logic article, our implementation of TFO transform +proceeds by applying classic Rx front end processing that only emits speech +frames, and then replacing output with SID frames under certain conditions if +DTXd is enabled.
