FreeCalypso > hg > gsm-codec-lib
comparison doc/HR-codec-library @ 632:7fc57e2a6784
beginning of GSM-HR documentation
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Thu, 19 Mar 2026 04:13:45 +0000 |
| parents | |
| children | 3ab76caba41c |
comparison
equal
deleted
inserted
replaced
| 631:6bad9af66f69 | 632:7fc57e2a6784 |
|---|---|
| 1 Themyscira libgsmhr1: library for GSM-HR codec | |
| 2 ============================================== | |
| 3 | |
| 4 The present library provides the following functionalities related to GSM-HR | |
| 5 speech codec, also known as HRv1: | |
| 6 | |
| 7 * Stateful speech encoder and decoder engines based on GSM 06.06 reference code | |
| 8 from ETSI; | |
| 9 | |
| 10 * TS 28.062 section C.3.2.1.1 stateful TFO transform for this codec; | |
| 11 | |
| 12 * A rich set of stateless utility functions for format conversion and other | |
| 13 common manipulations. | |
| 14 | |
| 15 Compared to libgsmhr alternative implemented as part of Osmocom gapk, our | |
| 16 implementation provides the following advantages: | |
| 17 | |
| 18 * In our librification of ETSI GSM-HR code, speech encoder and decoder engines | |
| 19 have been outfitted with proper state structures, as opposed to the hack of | |
| 20 treating the entire bss segment with global variables as a poor man's state | |
| 21 structure. | |
| 22 | |
| 23 * The Rx front end has been factored out of the speech decoder and can also be | |
| 24 used as a TFO transform. Because HRv1 codec falls chronologically between | |
| 25 FRv1 and EFR, our TFO transform for HRv1 serves as a stepping stone toward | |
| 26 future work on TFO transform for EFR. | |
| 27 | |
| 28 * We made a slight extension to this GSM-HR Rx front end, applicable to both | |
| 29 full decoder and TFO transform configurations, to support BFI without payload | |
| 30 bits. This condition occurs in the case of FACCH stealing, packet loss in IP | |
| 31 transport, or the non-modifiable DSP PHY implementation in sysmoBTS that does | |
| 32 not provide erroneous payload bits along with BFI. | |
| 33 | |
| 34 * We added many format conversion and other utility functions that are | |
| 35 Themyscira original work, not from ETSI GSM-HR code. | |
| 36 | |
| 37 However, because of very limited practical utility of GSM-HR codec, almost no | |
| 38 work has been done to speed up any of the grossly inefficient code that | |
| 39 originates from ETSI - see HR-codec-limits article. | |
| 40 | |
| 41 GSM-HR codec frame formats | |
| 42 ========================== | |
| 43 | |
| 44 Our speech encoder, speech decoder and TFO transform engines operate on the | |
| 45 same canonical formats as the original reference code from ETSI: | |
| 46 | |
| 47 * The output from our speech encoder engine for each frame is an array of 20 | |
| 48 16-bit words (18 codec parameters followed by VAD and SP flags) that matches | |
| 49 ETSI *.cod format. | |
| 50 | |
| 51 * The input format to our speech decoder engine is an array of 22 16-bit words | |
| 52 (18 codec parameters followed by BFI, UFI, SID and TAF) as represented in | |
| 53 ETSI *.dec format. | |
| 54 | |
| 55 * The input format to our TFO transform implementation is a decoder input frame | |
| 56 (*.dec), and the output mimics an encoder output (*.cod) frame complete with | |
| 57 VAD and SP flags. (The latter outputs are VAD=1 SP=1 in the case of DTXd=0, | |
| 58 or VAD=0 dummy and correct SP output in the case of DTXd=1.) | |
| 59 | |
| 60 All other standard formats for GSM-HR codec frames, namely TS 101 318, RFC 5993 | |
| 61 and TW-TS-002, are supported via stateless format conversion functions. | |
| 62 | |
| 63 Representation and handling of BFI | |
| 64 ---------------------------------- | |
| 65 | |
| 66 If a decoder input frame in the canonical 22-word *.dec format has BFI=1 and | |
| 67 SID=0, it is a non-SID BFI frame, also called an unusable frame in GSM 06.41 | |
| 68 spec. If such frame arrives in comfort noise insertion state, all parameters | |
| 69 are ignored. On the other hand, if such frame arrives outside of DTX state, | |
| 70 when ECU logic is applied instead, our version retains the logic from ETSI | |
| 71 reference code in that codevector parameters from BFI frames are used if the | |
| 72 voiced vs unvoiced mode matches between the BFI frame and the saved frame used | |
| 73 by the ECU. This logic resides in the Rx front end that is shared between full | |
| 74 decoder and TFO transform implementations. | |
| 75 | |
| 76 There is, however, an extension to this logic original to Themyscira: if the | |
| 77 BFI word in the 22-word decoder input frame equals 2 instead of 1 (not allowed | |
| 78 in ETSI reference code where BFI is strictly a binary flag), the frame is | |
| 79 treated as BFI-no-data and no parameter bits are ever used in any state. | |
| 80 Higher-level RTP input functions, described later in this article, feed this | |
| 81 BFI=2 code to the speech decoder or TFO transform engine when RTP input is BFI | |
| 82 without payload bits. | |
| 83 | |
| 84 Representation and handling of invalid SID | |
| 85 ------------------------------------------ | |
| 86 | |
| 87 If a decoder input frame in the canonical 22-word *.dec format has indicators | |
| 88 set to either SID=1 (irrespective of other flags) or SID=2 with either BFI or | |
| 89 UFI nonzero, that input frame is invalid SID per GSM 06.41. All 18 speech | |
| 90 parameters are fully ignored in such frames, always. | |
| 91 | |
| 92 In RTP transport these invalid SID frames can be represented only in TW-TS-002, | |
| 93 not in either of the two non-Themyscira standards. TW-TS-002 offers the option | |
| 94 of either including or omitting payload bits in invalid SID packets - however, | |
| 95 if invalid SID payload bits are included, they are ignored by our speech decoder | |
| 96 and TFO transform engines. | |
| 97 | |
| 98 Libgsmhr1 general usage | |
| 99 ======================= | |
| 100 | |
| 101 The external public interface to Themyscira libgsmhr1 consists of a single | |
| 102 header file <tw_gsmhr.h>; it should be installed in some system include | |
| 103 directory. | |
| 104 | |
| 105 The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function | |
| 106 prototypes), const qualifier is used where appropriate, and the interface is | |
| 107 defined in terms of <stdint.h> types; <tw_gsmhr.h> includes <stdint.h>. | |
| 108 | |
| 109 State allocation and freeing | |
| 110 ============================ | |
| 111 | |
| 112 In order to use the speech encoder, you will need to allocate an encoder state | |
| 113 structure, and to use the speech decoder, you will need to allocate a decoder | |
| 114 state structure. The same goes for the stateful TFO transform. The necessary | |
| 115 state allocation functions are: | |
| 116 | |
| 117 struct gsmhr_encoder_state *gsmhr_encoder_create(int dtx); | |
| 118 struct gsmhr_decoder_state *gsmhr_decoder_create(void); | |
| 119 struct gsmhr_rxfe_state *gsmhr_rxfe_create(void); /* TFO transform */ | |
| 120 | |
| 121 struct gsmhr_encoder_state, struct gsmhr_decoder_state and struct | |
| 122 gsmhr_rxfe_state are opaque structures to library users: you only get pointers | |
| 123 which you remember and pass around, but <tw_gsmhr.h> does not give you full | |
| 124 definitions of these structs. As a library user, you ordinarily don't even | |
| 125 need to know the size of these structs, hence the necessary malloc() operation | |
| 126 happens inside gsmhr_encoder_create(), gsmhr_decoder_create() and | |
| 127 gsmhr_rxfe_create() functions. (But see the following section regarding | |
| 128 alternative memory allocation schemes.) However, each structure is malloc'ed | |
| 129 as a single chunk, hence when you are done with it, simply call free() to | |
| 130 relinquish each encoder, decoder or TFO state instance. | |
| 131 | |
| 132 gsmhr_encoder_create(), gsmhr_decoder_create() and gsmhr_rxfe_create() functions | |
| 133 can fail if the malloc() call inside fails, in which case these libgsmhr1 | |
| 134 functions return NULL. | |
| 135 | |
| 136 The dtx argument to gsmhr_encoder_create() is a Boolean flag represented as an | |
| 137 int; it tells the GSM-HR speech encoder whether it should operate with DTX | |
| 138 enabled (run GSM 06.42 VAD and emit SID frames instead of speech frames per | |
| 139 GSM 06.41) or DTX disabled (skip VAD and always emit speech frames). | |
| 140 | |
| 141 It should be noted that the original GSM-HR speech encoder from ETSI always runs | |
| 142 GSM 06.42 VAD algorithm, whether DTX is enabled or disabled; if DTX is disabled, | |
| 143 VAD flag is forced to 1, but all VAD and Tx DTX logic still executes and burns | |
| 144 up CPU cycles. Due to work scope limits described in HR-codec-limits article, | |
| 145 this poor design from ETSI has been retained at the present. However, the API | |
| 146 design allows DTX enable-or-disable flag to be changed only with a full reset | |
| 147 of the speech encoder, rather than per frame - this API design thus prepares | |
| 148 for the possibility of cleaning up this implementation in the future and | |
| 149 executing VAD and Tx DTX code only when DTX is enabled in the speech encoding | |
| 150 direction. | |
| 151 | |
| 152 State reset functions | |
| 153 ===================== | |
| 154 | |
| 155 void gsmhr_encoder_reset(struct gsmhr_encoder_state *st, int dtx); | |
| 156 void gsmhr_decoder_reset(struct gsmhr_decoder_state *st); | |
| 157 void gsmhr_rxfe_reset(struct gsmhr_rxfe_state *st); | |
| 158 | |
| 159 Each of these functions resets the state of the corresponding element to its | |
| 160 initial or "home" state. Home states for the standard speech encoder and | |
| 161 decoder are given in GSM 06.20 sections 5.5 and 5.6, respectively; the home | |
| 162 state for the separated-out RxFE block (used for TFO transform) is a subset of | |
| 163 the full speech decoder home state as relevant to the reduced functionality. | |
| 164 | |
| 165 The following const "variables" are exported by the library in order to | |
| 166 facilitate alternative memory allocation schemes: | |
| 167 | |
| 168 extern const unsigned gsmhr_encoder_state_size; | |
| 169 extern const unsigned gsmhr_decoder_state_size; | |
| 170 extern const unsigned gsmhr_rxfe_state_size; | |
| 171 | |
| 172 Using these const "variables", an application can allocate buffers of the | |
| 173 correct size for each state structure, and then initialize each newly allocated | |
| 174 state structure with gsmhr_*_reset(), as an alternative to gsmhr_*_create() | |
| 175 functions. Each of the standard gsmhr_*_create() functions allocates a buffer | |
| 176 of the correct size using standard malloc(), then initializes it with the | |
| 177 corresponding gsmhr_*_reset() function - hence the lower-level approach can be | |
| 178 used by applications that desire some other memory allocation scheme than | |
| 179 standard malloc(). | |
| 180 | |
| 181 Using the speech encoder | |
| 182 ======================== | |
| 183 | |
| 184 To encode one 20 ms audio frame per GSM-HR, call gsmhr_encode_frame(): | |
| 185 | |
| 186 void gsmhr_encode_frame(struct gsmhr_encoder_state *st, const int16_t *pcm, | |
| 187 int16_t *param); | |
| 188 | |
| 189 You need to provide an encoder state structure allocated earlier with | |
| 190 gsmhr_encoder_create(), a block of 160 linear PCM samples, and an output buffer | |
| 191 of 20 (GSMHR_NUM_PARAMS_ENC) 16-bit words into which the encoded GSM-HR frame | |
| 192 will be written. The encoded frame format emitted by this function is the same | |
| 193 as in the reference implementation from ETSI: 18 words of speech parameters | |
| 194 followed by VAD and SP flags. Stateless format conversion functions described | |
| 195 later in this document can be used to emit more commonly used RTP formats. | |
| 196 | |
| 197 The mandatory encoder homing function is included: if the input frame matches | |
| 198 the encoder homing frame, the encoder state is reset to the home state at the | |
| 199 end of gsmhr_encode_frame() processing. | |
| 200 | |
| 201 Using the speech decoder | |
| 202 ======================== | |
| 203 | |
| 204 Our speech decoder main function is: | |
| 205 | |
| 206 void gsmhr_decode_frame(struct gsmhr_decoder_state *st, const int16_t *param, | |
| 207 int16_t *pcm); | |
| 208 | |
| 209 The input frame format is the canonical one from ETSI: 22 (GSMHR_NUM_PARAMS_DEC) | |
| 210 16-bit words providing speech parameters followed by BFI, UFI, SID and TAF | |
| 211 metadata flags. In normal operation, this internal canonical form of speech | |
| 212 decoder input will be provided by one of the stateless format conversion | |
| 213 functions in the same libgsmhr1. | |
| 214 | |
| 215 Important note: the parameter frame input to this function is expected to be | |
| 216 valid, i.e., it is NOT subjected to explicit validation checks! If your | |
| 217 application reads *.dec files or otherwise receives this format directly from | |
| 218 some external source (as opposed to output from one of our own format conversion | |
| 219 functions), you need to validate these bits with gsmhr_check_decoder_params() | |
| 220 before feeding them to the decoder engine! | |
| 221 | |
| 222 This speech decoder function includes all mandatory logic for decoder homing: | |
| 223 special handling of the homed state, decoder homing frame checks of both full | |
| 224 and partial (to the first subframe) kinds, internal state reset and EHF output. | |
| 225 | |
| 226 TFO transform function | |
| 227 ====================== | |
| 228 | |
| 229 To operate our TFO transform for GSM-HR codec, create a standalone RxFE state | |
| 230 structure (use gsmhr_rxfe_create() or your own allocation followed by | |
| 231 gsmhr_rxfe_reset()) and then call this function for every frame to be processed: | |
| 232 | |
| 233 void gsmhr_tfo_xfrm(struct gsmhr_rxfe_state *st, int dtxd, const int16_t *ul, | |
| 234 int16_t *dl); | |
| 235 | |
| 236 UL Rx input has the same form as input to gsmhr_decode_frame(), and the same | |
| 237 caveats apply in terms of validation checks. DL Tx output is emitted in the | |
| 238 same form as gsmhr_encode_frame() output, complete with VAD and SP flags. VAD | |
| 239 output should be considered a dummy, but SP output flag is valid: 1 in the case | |
| 240 of a speech frame or 0 in the case of a SID frame; the latter is possible only | |
| 241 when DTXd is enabled. | |
| 242 | |
| 243 DTXd control: dtxd argument to gsmhr_tfo_xfrm() tells the transform if it should | |
| 244 emit both speech and SID frames or speech frames only, corresponding to DTXd | |
| 245 flag in TRAU-UL frames that affects both speech encoder and TFO transform | |
| 246 functions in traditional TRAUs. This flag can be changed mid-session: as | |
| 247 explained in HR-codec-Rx-logic article, our implementation of TFO transform | |
| 248 proceeds by applying classic Rx front end processing that only emits speech | |
| 249 frames, and then replacing output with SID frames under certain conditions if | |
| 250 DTXd is enabled. |
