FreeCalypso > hg > gsm-codec-lib
view doc/HR-codec-library @ 633:3ab76caba41c
doc/HR-codec-library: document stateless utility functions
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Fri, 20 Mar 2026 02:17:55 +0000 |
| parents | 7fc57e2a6784 |
| children | 723265aea9f8 |
line wrap: on
line source
Themyscira libgsmhr1: library for GSM-HR codec ============================================== The present library provides the following functionalities related to GSM-HR speech codec, also known as HRv1: * Stateful speech encoder and decoder engines based on GSM 06.06 reference code from ETSI; * TS 28.062 section C.3.2.1.1 stateful TFO transform for this codec; * A rich set of stateless utility functions for format conversion and other common manipulations. Compared to libgsmhr alternative implemented as part of Osmocom gapk, our implementation provides the following advantages: * In our librification of ETSI GSM-HR code, speech encoder and decoder engines have been outfitted with proper state structures, as opposed to the hack of treating the entire bss segment with global variables as a poor man's state structure. * The Rx front end has been factored out of the speech decoder and can also be used as a TFO transform. Because HRv1 codec falls chronologically between FRv1 and EFR, our TFO transform for HRv1 serves as a stepping stone toward future work on TFO transform for EFR. * We made a slight extension to this GSM-HR Rx front end, applicable to both full decoder and TFO transform configurations, to support BFI without payload bits. This condition occurs in the case of FACCH stealing, packet loss in IP transport, or the non-modifiable DSP PHY implementation in sysmoBTS that does not provide erroneous payload bits along with BFI. * We added many format conversion and other utility functions that are Themyscira original work, not from ETSI GSM-HR code. However, because of very limited practical utility of GSM-HR codec, almost no work has been done to speed up any of the grossly inefficient code that originates from ETSI - see HR-codec-limits article. GSM-HR codec frame formats ========================== Our speech encoder, speech decoder and TFO transform engines operate on the same canonical formats as the original reference code from ETSI: * The output from our speech encoder engine for each frame is an array of 20 16-bit words (18 codec parameters followed by VAD and SP flags) that matches ETSI *.cod format. * The input format to our speech decoder engine is an array of 22 16-bit words (18 codec parameters followed by BFI, UFI, SID and TAF) as represented in ETSI *.dec format. * The input format to our TFO transform implementation is a decoder input frame (*.dec), and the output mimics an encoder output (*.cod) frame complete with VAD and SP flags. (The latter outputs are VAD=1 SP=1 in the case of DTXd=0, or VAD=0 dummy and correct SP output in the case of DTXd=1.) All other standard formats for GSM-HR codec frames, namely TS 101 318, RFC 5993 and TW-TS-002, are supported via stateless format conversion functions. Representation and handling of BFI ---------------------------------- If a decoder input frame in the canonical 22-word *.dec format has BFI=1 and SID=0, it is a non-SID BFI frame, also called an unusable frame in GSM 06.41 spec. If such frame arrives in comfort noise insertion state, all parameters are ignored. On the other hand, if such frame arrives outside of DTX state, when ECU logic is applied instead, our version retains the logic from ETSI reference code in that codevector parameters from BFI frames are used if the voiced vs unvoiced mode matches between the BFI frame and the saved frame used by the ECU. This logic resides in the Rx front end that is shared between full decoder and TFO transform implementations. There is, however, an extension to this logic original to Themyscira: if the BFI word in the 22-word decoder input frame equals 2 instead of 1 (not allowed in ETSI reference code where BFI is strictly a binary flag), the frame is treated as BFI-no-data and no parameter bits are ever used in any state. Higher-level RTP input functions, described later in this article, feed this BFI=2 code to the speech decoder or TFO transform engine when RTP input is BFI without payload bits. Representation and handling of invalid SID ------------------------------------------ If a decoder input frame in the canonical 22-word *.dec format has indicators set to either SID=1 (irrespective of other flags) or SID=2 with either BFI or UFI nonzero, that input frame is invalid SID per GSM 06.41. All 18 speech parameters are fully ignored in such frames, always. In RTP transport these invalid SID frames can be represented only in TW-TS-002, not in either of the two non-Themyscira standards. TW-TS-002 offers the option of either including or omitting payload bits in invalid SID packets - however, if invalid SID payload bits are included, they are ignored by our speech decoder and TFO transform engines. Libgsmhr1 general usage ======================= The external public interface to Themyscira libgsmhr1 consists of a single header file <tw_gsmhr.h>; it should be installed in some system include directory. The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function prototypes), const qualifier is used where appropriate, and the interface is defined in terms of <stdint.h> types; <tw_gsmhr.h> includes <stdint.h>. State allocation and freeing ============================ In order to use the speech encoder, you will need to allocate an encoder state structure, and to use the speech decoder, you will need to allocate a decoder state structure. The same goes for the stateful TFO transform. The necessary state allocation functions are: struct gsmhr_encoder_state *gsmhr_encoder_create(int dtx); struct gsmhr_decoder_state *gsmhr_decoder_create(void); struct gsmhr_rxfe_state *gsmhr_rxfe_create(void); /* TFO transform */ struct gsmhr_encoder_state, struct gsmhr_decoder_state and struct gsmhr_rxfe_state are opaque structures to library users: you only get pointers which you remember and pass around, but <tw_gsmhr.h> does not give you full definitions of these structs. As a library user, you ordinarily don't even need to know the size of these structs, hence the necessary malloc() operation happens inside gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions. (But see the following section regarding alternative memory allocation schemes.) However, each structure is malloc'ed as a single chunk, hence when you are done with it, simply call free() to relinquish each encoder, decoder or TFO state instance. gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions can fail if the malloc() call inside fails, in which case these libgsmhr1 functions return NULL. The dtx argument to gsmhr_encoder_create() is a Boolean flag represented as an int; it tells the GSM-HR speech encoder whether it should operate with DTX enabled (run GSM 06.42 VAD and emit SID frames instead of speech frames per GSM 06.41) or DTX disabled (skip VAD and always emit speech frames). It should be noted that the original GSM-HR speech encoder from ETSI always runs GSM 06.42 VAD algorithm, whether DTX is enabled or disabled; if DTX is disabled, VAD flag is forced to 1, but all VAD and Tx DTX logic still executes and burns up CPU cycles. Due to work scope limits described in HR-codec-limits article, this poor design from ETSI has been retained at the present. However, the API design allows DTX enable-or-disable flag to be changed only with a full reset of the speech encoder, rather than per frame - this API design thus prepares for the possibility of cleaning up this implementation in the future and executing VAD and Tx DTX code only when DTX is enabled in the speech encoding direction. State reset functions ===================== void gsmhr_encoder_reset(struct gsmhr_encoder_state *st, int dtx); void gsmhr_decoder_reset(struct gsmhr_decoder_state *st); void gsmhr_rxfe_reset(struct gsmhr_rxfe_state *st); Each of these functions resets the state of the corresponding element to its initial or "home" state. Home states for the standard speech encoder and decoder are given in GSM 06.20 sections 5.5 and 5.6, respectively; the home state for the separated-out RxFE block (used for TFO transform) is a subset of the full speech decoder home state as relevant to the reduced functionality. The following const "variables" are exported by the library in order to facilitate alternative memory allocation schemes: extern const unsigned gsmhr_encoder_state_size; extern const unsigned gsmhr_decoder_state_size; extern const unsigned gsmhr_rxfe_state_size; Using these const "variables", an application can allocate buffers of the correct size for each state structure, and then initialize each newly allocated state structure with gsmhr_*_reset(), as an alternative to gsmhr_*_create() functions. Each of the standard gsmhr_*_create() functions allocates a buffer of the correct size using standard malloc(), then initializes it with the corresponding gsmhr_*_reset() function - hence the lower-level approach can be used by applications that desire some other memory allocation scheme than standard malloc(). Using the speech encoder ======================== To encode one 20 ms audio frame per GSM-HR, call gsmhr_encode_frame(): void gsmhr_encode_frame(struct gsmhr_encoder_state *st, const int16_t *pcm, int16_t *param); You need to provide an encoder state structure allocated earlier with gsmhr_encoder_create(), a block of 160 linear PCM samples, and an output buffer of 20 (GSMHR_NUM_PARAMS_ENC) 16-bit words into which the encoded GSM-HR frame will be written. The encoded frame format emitted by this function is the same as in the reference implementation from ETSI: 18 words of speech parameters followed by VAD and SP flags. Stateless format conversion functions described later in this document can be used to emit more commonly used RTP formats. The mandatory encoder homing function is included: if the input frame matches the encoder homing frame, the encoder state is reset to the home state at the end of gsmhr_encode_frame() processing. Using the speech decoder ======================== Our speech decoder main function is: void gsmhr_decode_frame(struct gsmhr_decoder_state *st, const int16_t *param, int16_t *pcm); The input frame format is the canonical one from ETSI: 22 (GSMHR_NUM_PARAMS_DEC) 16-bit words providing speech parameters followed by BFI, UFI, SID and TAF metadata flags. In normal operation, this internal canonical form of speech decoder input will be provided by one of the stateless format conversion functions in the same libgsmhr1. Important note: the parameter frame input to this function is expected to be valid, i.e., it is NOT subjected to explicit validation checks! If your application reads *.dec files or otherwise receives this format directly from some external source (as opposed to output from one of our own format conversion functions), you need to validate these bits with gsmhr_check_decoder_params() before feeding them to the decoder engine! This speech decoder function includes all mandatory logic for decoder homing: special handling of the homed state, decoder homing frame checks of both full and partial (to the first subframe) kinds, internal state reset and EHF output. TFO transform function ====================== To operate our TFO transform for GSM-HR codec, create a standalone RxFE state structure (use gsmhr_rxfe_create() or your own allocation followed by gsmhr_rxfe_reset()) and then call this function for every frame to be processed: void gsmhr_tfo_xfrm(struct gsmhr_rxfe_state *st, int dtxd, const int16_t *ul, int16_t *dl); UL Rx input has the same form as input to gsmhr_decode_frame(), and the same caveats apply in terms of validation checks. DL Tx output is emitted in the same form as gsmhr_encode_frame() output, complete with VAD and SP flags. VAD output should be considered a dummy, but SP output flag is valid: 1 in the case of a speech frame or 0 in the case of a SID frame; the latter is possible only when DTXd is enabled. DTXd control: dtxd argument to gsmhr_tfo_xfrm() tells the transform if it should emit both speech and SID frames or speech frames only, corresponding to DTXd flag in TRAU-UL frames that affects both speech encoder and TFO transform functions in traditional TRAUs. This flag can be changed mid-session: as explained in HR-codec-Rx-logic article, our implementation of TFO transform proceeds by applying classic Rx front end processing that only emits speech frames, and then replacing output with SID frames under certain conditions if DTXd is enabled. Stateless utility functions =========================== All functions in this section are stateless (no encoder, decoder or RxFE state structure is needed); they merely manipulate data formats. void gsmhr_pack_ts101318(const int16_t *param, uint8_t *payload); This function converts a 112-bit GSM-HR codec frame from an array of speech parameters (18 16-bit words) into the packed format of ETSI TS 101 318, which is a buffer of 14 octets with every bit used for payload. Any extraneous bits in input 16-bit words (beyond the size of each parameter in bits) are ignored. void gsmhr_unpack_ts101318(const uint8_t *payload, int16_t *param); This function converts a 112-bit GSM-HR codec frame from the packed format of TS 101 318 into an array of 18 speech parameters. void gsmhr_encoder_twts002_out(const int16_t *param, uint8_t *payload); This function converts a cod-style frame (output from gsmhr_encode_frame() or gsmhr_tfo_xfrm(), or read from an ETSI *.cod file) into TW-TS-002 format. The output is always 15 octets long (the buffer must have this much room), and is valid per both RFC 5993 and TW-TS-002 specs. The only two possible frame types in this context are good speech and good SID, distinguished by SP flag in the cod-style input and by FT field in RFC 5993 output. int gsmhr_decoder_twts002_in(const uint8_t *payload, int16_t *param); This function reads a super-5993 frame in TW-TS-002 format from a buffer and converts it into the required form for input to gsmhr_decode_frame() or gsmhr_tfo_xfrm(), which is an extended form of ETSI's *.dec format. The input must be a valid super-5993 in the following sense: * The first octet in the buffer must be valid ToC per TW-TS-002 section 5.1; * F bit in this ToC octet must be cleared; * FT field must equal 0, 1, 2, 6 or 7 per TW-TS-002 section 5.2; * If FT equals 0, 2 or 6, the ToC octet must be followed by 14 octets of frame payload. If any of these rules are violated, gsmhr_decoder_twts002_in() returns a negative value (-1 if F bit is set or -2 if FT is invalid) and does not write anything into the output array. Otherwise, the function returns 0 (indicating success) and the output array is filled as follows: * For frame types 0, 2 and 6, the 18 speech parameters are filled from the TS-101-318-like payload portion of super-5993 input. * For frame types 1 and 7, the 18 speech parameters are set to all zeros, with the expectation that gsmhr_decode_frame() or gsmhr_tfo_xfrm() will ignore them. Please note that "verbose" invalid SID bits that may be present in TW-TS-002 transport are ignored. * The 4 metadata flags BFI, UFI, SID and TAF are set based on FT and the additional ToC flags defined in TW-TS-002 section 5.3. * Themyscira extension of BFI=2, described earlier in this document, is used to represent FT=7. * Invalid SID frames (FT=1) are converted to BFI=1 SID=1. int gsmhr_rtp_in_preen(const uint8_t *rtp_in, unsigned rtp_in_len, uint8_t *canon_pl); This function performs initial processing of RTP input that is expected to be one of the defined RTP formats for GSM-HR codec. It accepts all possibilities of TW-TS-002, RFC 5993 or TS 101 318 (listed in ThemWi order of preference) and writes canonical TW-TS-002 super-5993 format into a buffer. The output buffer must have 15 bytes of space, and the frame written into this buffer will ALWAYS be a valid input to gsmhr_decoder_twts002_in() function described above. The input arguments are RTP payload and its length. The return value is 0 if RTP input was in a recognized format, or -1 if it is invalid. In the case of invalid RTP input, the output is filled with ToC of 0x70 (BFI with no data) - the output is always valid. Zero-length RTP payloads are acceptable; if rtp_in_len is 0, then rtp_in pointer may be NULL. The output in this case is filled with ToC of 0x70 (BFI with no data), but the return value is 0, indicating success. The intent is that truly invalid RTP payloads are error events which should be counted, while NULL input is a normal occurrence when ThemWi jitter buffer (twjit) does not hold a previously received RTP packet that maps to the current tick. (Actually transmitted RTP packets with a zero-length payloads are also possible: they are ThemWi preferred alternative to IETF approach of intentional gaps in the RTP stream.) int gsmhr_rtp_in_direct(const uint8_t *rtp_in, unsigned rtp_in_len, int16_t *param); This function is fully equivalent to calling first gsmhr_rtp_in_preen(), then gsmhr_decoder_twts002_in(). It is however slightly more efficient, as it avoids the intermediate buffer and some copying. The return value is the same as gsmhr_rtp_in_preen(), and just like with that function, the output is always valid. Reading *.cod and *.dec files ----------------------------- The most native representation format for GSM-HR codec frames in libgsmhr1 is arrays of broken-down speech parameters. However, unlike TS 101 318 format in which every possible bit pattern is a plausible GSM-HR codec frame, an array of broken-down parameters that purports to be a GSM-HR frame can contain garbage. The additional metadata flags in the canonical decoder input format can also contain garbage - which our speech decoder and TFO transform engines are NOT prepared for! There is no potential for malfunction if these arrays of parameters and metadata flags come only from libgsmhr1 functions - but if an application needs to read *.cod or *.dec files, or otherwise accept external input in any of these formats, then an explicit validation step is required. int gsmhr_check_common_params(const int16_t *params); This function examines an array of 18 codec parameters in the int16_t representation used in this library, and checks if the unused upper bits of each int16_t word are cleared as they should be. The return value is 0 if the frame is valid or -1 if some extraneous high bits are set. int gsmhr_check_encoder_params(const int16_t *params); This function examines a frame of 20 int16_t words that corresponds to GSM-HR encoder output format, and checks if the unused upper bits of each int16_t word are cleared as they should be. This function should be used when reading from ETSI-format *.cod files, to guard against reading garbage or wrong endian. The return value is 0 if the frame is valid or -1 if some extraneous high bits are set. int gsmhr_check_decoder_params(const int16_t *params); This function examines a frame of 22 int16_t words that corresponds to GSM-HR decoder input format, and checks if the unused upper bits of each int16_t word are cleared as they should be. This function should be used when reading from ETSI-format *.dec files, to guard against reading garbage or wrong endian. The return value is 0 if the frame is valid or -1 if some extraneous high bits are set. Both BFI and SID words are limited to range [0,2], i.e., Themyscira BFI=2 extension is accepted. SID field manipulation ---------------------- Unlike FR and EFR, GSM-HR codec lacks fixed rules for Rx frame classification as valid SID, invalid SID or non-SID speech. The BTS makes this classification decision according to its internal private rules, and the SID flag then needs to be carried out of band in Abis, Ater and TFO. GSM 08.61 and TW-TS-002 (extended 5993) formats provide the necessary out-of-band SID indication, but the bare format of TS 101 318 does not. Therefore, the only kind of GSM-HR SID that can be represented in TS 101 318 format are perfect, 100% error-free SID frames in which all 79 bits of the SID field are set to 1. int gsmhr_ts101318_is_perfect_sid(const uint8_t *payload); This function checks the given TS 101 318 payload for the possibility of perfect SID. The return value is 2 (GSM 06.41 code for valid SID) if the frame is indeed a perfect SID, or 0 (GSM 06.41 code for non-SID speech) otherwise. void gsmhr_ts101318_set_sid_codeword(uint8_t *payload); This function sets all 79 bits of the SID field to 1s, forming a perfect SID frame in the 14-byte buffer. The first 33 bits that carry R0 and LPC parameters must already be filled correctly. void gsmhr_set_sid_cw_params(int16_t *params); This function fills parameters 4 through 17 of generated SID frames, setting them to the required SID codeword. It can also be used to transform a speech frame into a SID frame with the same R0 and LPC parameters. It is logically equivalent to gsmhr_ts101318_set_sid_codeword(), but operates on the array of parameters form, rather than TS 101 318 packed format.
