FreeCalypso > hg > gsm-codec-lib
comparison doc/AMR-library-API @ 476:c84bf526c7eb
beginning of libtwamr documentation
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Sat, 18 May 2024 21:22:07 +0000 |
| parents | |
| children | 936a08cc73ce |
comparison
equal
deleted
inserted
replaced
| 475:e512f0d25409 | 476:c84bf526c7eb |
|---|---|
| 1 Libtwamr general usage | |
| 2 ====================== | |
| 3 | |
| 4 The external public interface to Themyscira libtwamr consists of a single | |
| 5 header file <tw_amr.h>; it should be installed in some system include | |
| 6 directory. | |
| 7 | |
| 8 The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function | |
| 9 prototypes), const qualifier is used where appropriate, and the interface is | |
| 10 defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>. | |
| 11 | |
| 12 Public #define constant definitions | |
| 13 =================================== | |
| 14 | |
| 15 Libtwamr public API header file <tw_amr.h> defines these constants: | |
| 16 | |
| 17 #define AMR_MAX_PRM 57 /* max. num. of params */ | |
| 18 #define AMR_IETF_MAX_PL 32 /* max bytes in RFC 4867 frame */ | |
| 19 #define AMR_IETF_HDR_LEN 6 /* .amr file header bytes */ | |
| 20 #define AMR_COD_WORDS 250 /* # of words in 3GPP test seq format */ | |
| 21 | |
| 22 Explanation: | |
| 23 | |
| 24 * AMR_MAX_PRM is the maximum number of broken-down speech parameters in the | |
| 25 highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame | |
| 26 covered later in this document. | |
| 27 | |
| 28 * AMR_IETF_MAX_PL is the size of the output buffer that must be provided for | |
| 29 amr_frame_to_ietf(), and also most commonly the size of the staging buffer | |
| 30 which most applications will likely use for gathering the input to | |
| 31 amr_frame_from_ietf(). | |
| 32 | |
| 33 * AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum | |
| 34 covered later in this document, and this constant will also be needed by any | |
| 35 application that needs to read or write the fixed header at the beginning of | |
| 36 .amr files. | |
| 37 | |
| 38 * AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test | |
| 39 sequence format (.cod); the public definition is needed for sizing the arrays | |
| 40 used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions. | |
| 41 | |
| 42 Libtwamr enumerated types | |
| 43 ========================= | |
| 44 | |
| 45 Libtwamr public API header file <tw_amr.h> defines these 3 enums: | |
| 46 | |
| 47 enum RXFrameType { | |
| 48 RX_SPEECH_GOOD = 0, | |
| 49 RX_SPEECH_DEGRADED, | |
| 50 RX_ONSET, | |
| 51 RX_SPEECH_BAD, | |
| 52 RX_SID_FIRST, | |
| 53 RX_SID_UPDATE, | |
| 54 RX_SID_BAD, | |
| 55 RX_NO_DATA, | |
| 56 RX_N_FRAMETYPES /* number of frame types */ | |
| 57 }; | |
| 58 | |
| 59 enum TXFrameType { | |
| 60 TX_SPEECH_GOOD = 0, | |
| 61 TX_SID_FIRST, | |
| 62 TX_SID_UPDATE, | |
| 63 TX_NO_DATA, | |
| 64 TX_SPEECH_DEGRADED, | |
| 65 TX_SPEECH_BAD, | |
| 66 TX_SID_BAD, | |
| 67 TX_ONSET, | |
| 68 TX_N_FRAMETYPES /* number of frame types */ | |
| 69 }; | |
| 70 | |
| 71 enum Mode { | |
| 72 MR475 = 0, | |
| 73 MR515, | |
| 74 MR59, | |
| 75 MR67, | |
| 76 MR74, | |
| 77 MR795, | |
| 78 MR102, | |
| 79 MR122, | |
| 80 MRDTX | |
| 81 }; | |
| 82 | |
| 83 Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned | |
| 84 to each type are the same as those used by the official TS 26.073 encoder and | |
| 85 decoder programs. Note that Rx and Tx frame types are NOT equal! | |
| 86 | |
| 87 enum Mode should be self-explanatory: it covers the 8 possible codec modes of | |
| 88 AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of | |
| 89 SID frames. | |
| 90 | |
| 91 State allocation and freeing | |
| 92 ============================ | |
| 93 | |
| 94 In order to use the AMR encoder, you will need to allocate an encoder state | |
| 95 structure, and to use the AMR decoder, you will need to allocate a decoder state | |
| 96 structure. The necessary state allocation functions are: | |
| 97 | |
| 98 struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2); | |
| 99 struct amr_decoder_state *amr_decoder_create(void); | |
| 100 | |
| 101 struct amr_encoder_state and struct amr_decoder_state are opaque structures to | |
| 102 library users: you only get pointers which you remember and pass around, but | |
| 103 <tw_amr.h> does not give you full definitions of these structs. As a library | |
| 104 user, you don't even get to know the size of these structs, hence the necessary | |
| 105 malloc() operation happens inside amr_encoder_create() and amr_decoder_create(). | |
| 106 However, each structure is malloc'ed as a single chunk, hence when you are done | |
| 107 with it, simply call free() to relinquish each encoder or decoder state | |
| 108 instance. | |
| 109 | |
| 110 amr_encoder_create() and amr_decoder_create() functions can fail if the malloc() | |
| 111 call inside fails, in which case the two libtwamr functions in question return | |
| 112 NULL. | |
| 113 | |
| 114 The dtx argument to amr_encoder_create() is a Boolean flag represented as an | |
| 115 int; it tells the AMR encoder whether it should operate with DTX enabled or | |
| 116 disabled. (Note that DTX is also called SCR for Source-Controlled Rate in some | |
| 117 AMR specs.) The use_vad2 argument is another Boolean flag, also represented as | |
| 118 an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1. It is | |
| 119 a novel feature of libtwamr in that both VAD versions are included and | |
| 120 selectable at run time; see AMR-library-desc article for the details. | |
| 121 | |
| 122 State reset functions | |
| 123 --------------------- | |
| 124 | |
| 125 The state of an already-allocated AMR encoder or AMR decoder can be reset at | |
| 126 any time with these functions: | |
| 127 | |
| 128 void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2); | |
| 129 void amr_decoder_reset(struct amr_decoder_state *st); | |
| 130 | |
| 131 Note that the two extra arguments to amr_encoder_reset() are the same as the | |
| 132 arguments to amr_encoder_create() - the reset operation is complete. | |
| 133 amr_encoder_create() is a wrapper around malloc() followed by | |
| 134 amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc() | |
| 135 followed by amr_decoder_reset(). | |
| 136 | |
| 137 Using the AMR encoder | |
| 138 ===================== | |
| 139 | |
| 140 To encode one 20 ms audio frame per AMR, call amr_encode_frame(): | |
| 141 | |
| 142 void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode, | |
| 143 const int16_t *pcm, struct amr_param_frame *frame); | |
| 144 | |
| 145 You need to provide an encoder state structure allocated earlier with | |
| 146 amr_encoder_create(), the selection of which codec mode to use, and a block of | |
| 147 160 linear PCM samples. Only modes MR475 through MR122 are valid for 'mode' | |
| 148 argument to amr_encode_frame(); MRDTX is not allowed in this context. | |
| 149 | |
| 150 The output from amr_encode_frame() is written into this structure: | |
| 151 | |
| 152 struct amr_param_frame { | |
| 153 uint8_t type; | |
| 154 uint8_t mode; | |
| 155 int16_t param[AMR_MAX_PRM]; | |
| 156 }; | |
| 157 | |
| 158 This structure is public, but it is defined by libtwamr (not by any external | |
| 159 standard), and it is generally intended to be an intermediate stage before | |
| 160 output encoding. Library functions exist for generating 3 output formats: 3GPP | |
| 161 AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid. | |
| 162 | |
| 163 Native encoder output | |
| 164 --------------------- | |
| 165 | |
| 166 The output structure is filled as follows: | |
| 167 | |
| 168 type: Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA, | |
| 169 as defined by 3GPP. The last 3 are possible only when the encoder | |
| 170 operates with DTX enabled. | |
| 171 | |
| 172 mode: One of MR475 through MR122, same as the 'mode' argument to | |
| 173 amr_encode_frame(). | |
| 174 | |
| 175 param: Array of codec parameters, from 17 to 57 of them for modes MR475 through | |
| 176 MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX | |
| 177 in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output. | |
| 178 | |
| 179 3GPP AMR test sequence output | |
| 180 ----------------------------- | |
| 181 | |
| 182 The following function exists to convert the above encoder output into the test | |
| 183 sequence format which 3GPP defined for AMR, the insanely inefficient one with | |
| 184 250 (AMR_COD_WORDS) 16-bit words per frame: | |
| 185 | |
| 186 void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod); | |
| 187 | |
| 188 This function allows libtwamr encoder to be tested for correctness against the | |
| 189 set of test sequences in 3GPP TS 26.074. The output is in the local machine's | |
| 190 native byte order. | |
| 191 | |
| 192 RFC 4867 output | |
| 193 --------------- | |
| 194 | |
| 195 To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame | |
| 196 payload or storage-format frame (ToC octet followed by speech or SID data, but | |
| 197 no CMR payload header), call this function: | |
| 198 | |
| 199 unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes); | |
| 200 | |
| 201 The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the | |
| 202 return value is the actual number of bytes used. The shortest possible output | |
| 203 is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in | |
| 204 the case of TX_SPEECH_GOOD, mode MR122. | |
| 205 | |
| 206 Additional notes regarding output conversion functions | |
| 207 ------------------------------------------------------ | |
| 208 | |
| 209 The struct amr_param_frame that is input to amr_frame_to_ietf() or | |
| 210 amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame(). | |
| 211 These output conversion functions contain no guards against invalid input | |
| 212 (anything that cannot occur in the output from amr_encode_frame()), and are | |
| 213 thus allowed to segfault or corrupt memory etc if fed such invalid input. | |
| 214 | |
| 215 This lack of guard is justified in the present instance because struct | |
| 216 amr_param_frame is not intended to ever function as an external interface to | |
| 217 untrusted entities, instead this struct is intended to be only an intermediate | |
| 218 staging buffer between the call to amr_encode_frame() and an immediately | |
| 219 following call to one of the provided output conversion functions. | |
| 220 | |
| 221 AMR-EFR hybrid encoder | |
| 222 ====================== | |
| 223 | |
| 224 To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints: | |
| 225 | |
| 226 * 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or | |
| 227 amr_encoder_reset() that establishes the state for the encoder session. | |
| 228 | |
| 229 * 'mode' argument to amr_encode_frame() must be MR122 on every frame. | |
| 230 | |
| 231 After getting struct amr_param_frame out of amr_encode_frame(), call one of | |
| 232 these functions to generate the correct EFR DHF under the right conditions: | |
| 233 | |
| 234 void amr_dhf_subst_efr(struct amr_param_frame *frame); | |
| 235 void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm); | |
| 236 | |
| 237 Both functions check if the encoded frame is MR122 DHF (type equals | |
| 238 TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of | |
| 239 MR122 DHF), and if so, overwrite param[] array in the structure with the | |
| 240 different bit pattern of EFR DHF. The difference between the two functions is | |
| 241 that amr_dhf_subst_efr() performs the just-described substitution | |
| 242 unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if | |
| 243 the PCM input is EHF. The latter function matches the observed behavior of | |
| 244 T-Mobile USA, but perhaps some others implemented the simpler logic equivalent | |
| 245 to our first function. | |
| 246 | |
| 247 After this transformation, call EFR_params2frame() from libgsmefr (see | |
| 248 EFR-library-API) with param[] array in struct amr_param_frame as input. | |
| 249 | |
| 250 Using the AMR decoder | |
| 251 ===================== | |
| 252 |
