FreeCalypso > hg > gsm-codec-lib
view doc/HR-codec-library @ 632:7fc57e2a6784
beginning of GSM-HR documentation
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Thu, 19 Mar 2026 04:13:45 +0000 |
| parents | |
| children | 3ab76caba41c |
line wrap: on
line source
Themyscira libgsmhr1: library for GSM-HR codec ============================================== The present library provides the following functionalities related to GSM-HR speech codec, also known as HRv1: * Stateful speech encoder and decoder engines based on GSM 06.06 reference code from ETSI; * TS 28.062 section C.3.2.1.1 stateful TFO transform for this codec; * A rich set of stateless utility functions for format conversion and other common manipulations. Compared to libgsmhr alternative implemented as part of Osmocom gapk, our implementation provides the following advantages: * In our librification of ETSI GSM-HR code, speech encoder and decoder engines have been outfitted with proper state structures, as opposed to the hack of treating the entire bss segment with global variables as a poor man's state structure. * The Rx front end has been factored out of the speech decoder and can also be used as a TFO transform. Because HRv1 codec falls chronologically between FRv1 and EFR, our TFO transform for HRv1 serves as a stepping stone toward future work on TFO transform for EFR. * We made a slight extension to this GSM-HR Rx front end, applicable to both full decoder and TFO transform configurations, to support BFI without payload bits. This condition occurs in the case of FACCH stealing, packet loss in IP transport, or the non-modifiable DSP PHY implementation in sysmoBTS that does not provide erroneous payload bits along with BFI. * We added many format conversion and other utility functions that are Themyscira original work, not from ETSI GSM-HR code. However, because of very limited practical utility of GSM-HR codec, almost no work has been done to speed up any of the grossly inefficient code that originates from ETSI - see HR-codec-limits article. GSM-HR codec frame formats ========================== Our speech encoder, speech decoder and TFO transform engines operate on the same canonical formats as the original reference code from ETSI: * The output from our speech encoder engine for each frame is an array of 20 16-bit words (18 codec parameters followed by VAD and SP flags) that matches ETSI *.cod format. * The input format to our speech decoder engine is an array of 22 16-bit words (18 codec parameters followed by BFI, UFI, SID and TAF) as represented in ETSI *.dec format. * The input format to our TFO transform implementation is a decoder input frame (*.dec), and the output mimics an encoder output (*.cod) frame complete with VAD and SP flags. (The latter outputs are VAD=1 SP=1 in the case of DTXd=0, or VAD=0 dummy and correct SP output in the case of DTXd=1.) All other standard formats for GSM-HR codec frames, namely TS 101 318, RFC 5993 and TW-TS-002, are supported via stateless format conversion functions. Representation and handling of BFI ---------------------------------- If a decoder input frame in the canonical 22-word *.dec format has BFI=1 and SID=0, it is a non-SID BFI frame, also called an unusable frame in GSM 06.41 spec. If such frame arrives in comfort noise insertion state, all parameters are ignored. On the other hand, if such frame arrives outside of DTX state, when ECU logic is applied instead, our version retains the logic from ETSI reference code in that codevector parameters from BFI frames are used if the voiced vs unvoiced mode matches between the BFI frame and the saved frame used by the ECU. This logic resides in the Rx front end that is shared between full decoder and TFO transform implementations. There is, however, an extension to this logic original to Themyscira: if the BFI word in the 22-word decoder input frame equals 2 instead of 1 (not allowed in ETSI reference code where BFI is strictly a binary flag), the frame is treated as BFI-no-data and no parameter bits are ever used in any state. Higher-level RTP input functions, described later in this article, feed this BFI=2 code to the speech decoder or TFO transform engine when RTP input is BFI without payload bits. Representation and handling of invalid SID ------------------------------------------ If a decoder input frame in the canonical 22-word *.dec format has indicators set to either SID=1 (irrespective of other flags) or SID=2 with either BFI or UFI nonzero, that input frame is invalid SID per GSM 06.41. All 18 speech parameters are fully ignored in such frames, always. In RTP transport these invalid SID frames can be represented only in TW-TS-002, not in either of the two non-Themyscira standards. TW-TS-002 offers the option of either including or omitting payload bits in invalid SID packets - however, if invalid SID payload bits are included, they are ignored by our speech decoder and TFO transform engines. Libgsmhr1 general usage ======================= The external public interface to Themyscira libgsmhr1 consists of a single header file <tw_gsmhr.h>; it should be installed in some system include directory. The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function prototypes), const qualifier is used where appropriate, and the interface is defined in terms of <stdint.h> types; <tw_gsmhr.h> includes <stdint.h>. State allocation and freeing ============================ In order to use the speech encoder, you will need to allocate an encoder state structure, and to use the speech decoder, you will need to allocate a decoder state structure. The same goes for the stateful TFO transform. The necessary state allocation functions are: struct gsmhr_encoder_state *gsmhr_encoder_create(int dtx); struct gsmhr_decoder_state *gsmhr_decoder_create(void); struct gsmhr_rxfe_state *gsmhr_rxfe_create(void); /* TFO transform */ struct gsmhr_encoder_state, struct gsmhr_decoder_state and struct gsmhr_rxfe_state are opaque structures to library users: you only get pointers which you remember and pass around, but <tw_gsmhr.h> does not give you full definitions of these structs. As a library user, you ordinarily don't even need to know the size of these structs, hence the necessary malloc() operation happens inside gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions. (But see the following section regarding alternative memory allocation schemes.) However, each structure is malloc'ed as a single chunk, hence when you are done with it, simply call free() to relinquish each encoder, decoder or TFO state instance. gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions can fail if the malloc() call inside fails, in which case these libgsmhr1 functions return NULL. The dtx argument to gsmhr_encoder_create() is a Boolean flag represented as an int; it tells the GSM-HR speech encoder whether it should operate with DTX enabled (run GSM 06.42 VAD and emit SID frames instead of speech frames per GSM 06.41) or DTX disabled (skip VAD and always emit speech frames). It should be noted that the original GSM-HR speech encoder from ETSI always runs GSM 06.42 VAD algorithm, whether DTX is enabled or disabled; if DTX is disabled, VAD flag is forced to 1, but all VAD and Tx DTX logic still executes and burns up CPU cycles. Due to work scope limits described in HR-codec-limits article, this poor design from ETSI has been retained at the present. However, the API design allows DTX enable-or-disable flag to be changed only with a full reset of the speech encoder, rather than per frame - this API design thus prepares for the possibility of cleaning up this implementation in the future and executing VAD and Tx DTX code only when DTX is enabled in the speech encoding direction. State reset functions ===================== void gsmhr_encoder_reset(struct gsmhr_encoder_state *st, int dtx); void gsmhr_decoder_reset(struct gsmhr_decoder_state *st); void gsmhr_rxfe_reset(struct gsmhr_rxfe_state *st); Each of these functions resets the state of the corresponding element to its initial or "home" state. Home states for the standard speech encoder and decoder are given in GSM 06.20 sections 5.5 and 5.6, respectively; the home state for the separated-out RxFE block (used for TFO transform) is a subset of the full speech decoder home state as relevant to the reduced functionality. The following const "variables" are exported by the library in order to facilitate alternative memory allocation schemes: extern const unsigned gsmhr_encoder_state_size; extern const unsigned gsmhr_decoder_state_size; extern const unsigned gsmhr_rxfe_state_size; Using these const "variables", an application can allocate buffers of the correct size for each state structure, and then initialize each newly allocated state structure with gsmhr_*_reset(), as an alternative to gsmhr_*_create() functions. Each of the standard gsmhr_*_create() functions allocates a buffer of the correct size using standard malloc(), then initializes it with the corresponding gsmhr_*_reset() function - hence the lower-level approach can be used by applications that desire some other memory allocation scheme than standard malloc(). Using the speech encoder ======================== To encode one 20 ms audio frame per GSM-HR, call gsmhr_encode_frame(): void gsmhr_encode_frame(struct gsmhr_encoder_state *st, const int16_t *pcm, int16_t *param); You need to provide an encoder state structure allocated earlier with gsmhr_encoder_create(), a block of 160 linear PCM samples, and an output buffer of 20 (GSMHR_NUM_PARAMS_ENC) 16-bit words into which the encoded GSM-HR frame will be written. The encoded frame format emitted by this function is the same as in the reference implementation from ETSI: 18 words of speech parameters followed by VAD and SP flags. Stateless format conversion functions described later in this document can be used to emit more commonly used RTP formats. The mandatory encoder homing function is included: if the input frame matches the encoder homing frame, the encoder state is reset to the home state at the end of gsmhr_encode_frame() processing. Using the speech decoder ======================== Our speech decoder main function is: void gsmhr_decode_frame(struct gsmhr_decoder_state *st, const int16_t *param, int16_t *pcm); The input frame format is the canonical one from ETSI: 22 (GSMHR_NUM_PARAMS_DEC) 16-bit words providing speech parameters followed by BFI, UFI, SID and TAF metadata flags. In normal operation, this internal canonical form of speech decoder input will be provided by one of the stateless format conversion functions in the same libgsmhr1. Important note: the parameter frame input to this function is expected to be valid, i.e., it is NOT subjected to explicit validation checks! If your application reads *.dec files or otherwise receives this format directly from some external source (as opposed to output from one of our own format conversion functions), you need to validate these bits with gsmhr_check_decoder_params() before feeding them to the decoder engine! This speech decoder function includes all mandatory logic for decoder homing: special handling of the homed state, decoder homing frame checks of both full and partial (to the first subframe) kinds, internal state reset and EHF output. TFO transform function ====================== To operate our TFO transform for GSM-HR codec, create a standalone RxFE state structure (use gsmhr_rxfe_create() or your own allocation followed by gsmhr_rxfe_reset()) and then call this function for every frame to be processed: void gsmhr_tfo_xfrm(struct gsmhr_rxfe_state *st, int dtxd, const int16_t *ul, int16_t *dl); UL Rx input has the same form as input to gsmhr_decode_frame(), and the same caveats apply in terms of validation checks. DL Tx output is emitted in the same form as gsmhr_encode_frame() output, complete with VAD and SP flags. VAD output should be considered a dummy, but SP output flag is valid: 1 in the case of a speech frame or 0 in the case of a SID frame; the latter is possible only when DTXd is enabled. DTXd control: dtxd argument to gsmhr_tfo_xfrm() tells the transform if it should emit both speech and SID frames or speech frames only, corresponding to DTXd flag in TRAU-UL frames that affects both speech encoder and TFO transform functions in traditional TRAUs. This flag can be changed mid-session: as explained in HR-codec-Rx-logic article, our implementation of TFO transform proceeds by applying classic Rx front end processing that only emits speech frames, and then replacing output with SID frames under certain conditions if DTXd is enabled.
