FreeCalypso > hg > gsm-codec-lib
changeset 641:83de961cc54b
document GSM-HR codec utilities
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Thu, 26 Mar 2026 22:48:20 +0000 |
| parents | e0e5905261e2 |
| children | 4122baa843c5 |
| files | doc/HR-codec-utils doc/TFO-transform doc/Utils-overview |
| diffstat | 3 files changed, 350 insertions(+), 1 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/HR-codec-utils Thu Mar 26 22:48:20 2026 +0000 @@ -0,0 +1,331 @@ +Beginning with gsm-codec-lib-r5 release, Themyscira Wireless GSM codec libraries +and utilities package includes support for GSM-HR codec for the sake of +completeness, alongside with more useful FR, EFR and AMR codecs. The set of +command line utilities for GSM-HR codec includes speech encoding and decoding, +conversion of encoded speech between different formats, display of various +encoded formats and certain specialized utilities described later in this +article. + +File formats for GSM-HR encoded speech +====================================== + +The present suite of tools supports ETSI *.cod and *.dec formats, TW-TS-005 +Annex B hexadecimal format, and a simple "raw packed" binary format. These +file formats are explained below. + +ETSI *.cod encoder output format +-------------------------------- + +ETSI reference implementation of GSM-HR speech encoder writes its output in +this format; for each encoded 20 ms frame the output consists of 18 speech +parameters followed by VAD and SP flags. Each parameter or flag is written +into the file as a 16-bit word, hence each encoded 20 ms frame turns into 40 +bytes in this format. + +ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.cod +files; we have an ETSI-style speech encoder utility that emits this reference +format and a TFO transform utility that does the same, and we support conversion +from *.cod into our preferred TW-TS-005 Annex B hex format. + +ETSI *.dec decoder input format +------------------------------- + +ETSI reference implementation of GSM-HR speech decoder takes this format as its +input. Each 20 ms Rx unit (traffic frame or garbage received in the place of +one) consists of 18 speech parameters followed by 4 words of flags (BFI, UFI, +SID and TAF), stored as 22 16-bit words. As explained in HR-codec-library +article, ThemWi implementation of GSM-HR includes an extension to this decoder +input format: BFI=1 means BFI with payload bits included in both ETSI and +ThemWi versions, but BFI=2 (ThemWi extension) means BFI without payload bits. + +ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.dec +files; we have an ETSI-style speech decoder utility that reads this reference +format and a TFO transform utility that does the same, and we support +bidirectional conversion between this *.dec format and our preferred TW-TS-005 +Annex B hex format. + +TW-TS-005 Annex B hexadecimal format +------------------------------------ + +Themyscira Wireless Technical Specification TW-TS-005 defines a hexadecimal file +format for sequences of RTP payloads for GSM speech codecs; TW-TS-005 Annex B +specifies application of this hex file format to GSM-HR codec. + +This TW-TS-005 Annex B hex is the preferred file format for GSM-HR encoded +speech recordings for most workflows. It can represent both Tx semantics (every +20 ms frame position is filled with either a good speech frame or a perfect SID; +during DTX pauses a new SID appears in every frame) and Rx semantics (BFI frame +gaps can occur anywhere and are expected during DTX pauses; SIDs can be valid +or invalid) in the same file format, hence it is the operator's responsibility +to know the semantics of each given recording file and to use it in the correct +context. + +As explained in TW-TS-005 Annex B itself (see TW-TS-005 article for the most +up-to-date link to the actual spec document), each frame can be represented +either in the basic RTP format of ETSI TS 101 318 section 5.2, or in the +extended RTP format of RFC 5993 and TW-TS-002. Utilities in the present suite +that write TW-TS-005 Annex B hex files can be told to emit either format, with +the exception of gsmhr-dec2hex utility which always emits the extended format; +utilities that read these hex files accept both formats. The two RTP formats +carry different information content: the basic format can only represent Tx +semantics, while the extended format can represent both semantics. + +Compared to *.cod format, TW-TS-005 Annex B format with Tx semantics lacks the +VAD flag, although the extended RTP format does represent an equivalent of SP +flag. However, the inclusion of VAD flag in *.cod format is only a debug +feature for speech encoder test sequences; it is not a part of the interface +from the Tx DTX handler to the Tx RSS as defined in GSM 06.41 Tx chapter. + +Compared to *.dec format, TW-TS-005 Annex B format with Rx semantics (which has +to use the extended RTP format) collapses 3 possible invalid SID conditions +(BFI=0 SID=1, BFI=1 SID=1, BFI=1 SID=2) into the same TW-TS-002 representation +of FT=1. However, per GSM 06.41 Table 1 the Rx DTX handler for GSM-HR is +required to apply exactly the same handling to all 3 possibilities, and the +same collapsing of invalid SID conditions also happens on TDM-based (8 kbit/s) +Abis and Ater interfaces and in TFO, as detailed in GSM 08.61 and 08.62 specs. + +Raw packed binary format +------------------------ + +When working with FR and EFR codecs, a speech recording with Tx semantics can be +stored in a gsmx binary file (see Binary-file-format article) that consists of +directly abutted codec frames (good speech or SID) in RTP format, with exactly +33 (FR) or 31 (EFR) bytes per frame. We offer an equivalent ability for GSM-HR +with the so-called raw packed format. It is a binary format that consists of +directly abutted frames; each frame is 14 bytes long and stores a GSM-HR codec +frame in the basic RTP format of TS 101 318 section 5.2, which we also call the +raw packed format. + +This raw packed binary file format is not used directly by any of our speech +encoder or decoder utilities, instead it is supported via gsmhr-hex2rpf and +gsmhr-rpf2hex format conversion utilities. + +Common command line options +=========================== + +Certain flag options are common across different utilities in the present suite +of command line tools for GSM-HR codec; these common flags are as follows: + +-b and -l Utilities that read or write ETSI *.cod or *.dec format emit + and expect the local machine's native byte order by default. + -b option forces big-endian byte order; -l forces little-endian. + +-d Speech encoder utilites run with Tx DTX disabled by default; + -d option enables speech encoding with DTX. The same logic + applies to DTXd control in TFO transform utilities. + +-x Utilities that emit TW-TS-005 Annex B hex format with Tx + semantics emit the basic RTP format (TS 101 318) by default; + -x option switches to the extended RTP format. (The latter + format is TW-TS-002, but it also constitutes valid RFC 5993 in + the case of Tx semantics.) + +Inspecting encoded speech file formats +====================================== + +In common with other GSM speech codecs supported by ThemWi GSM codec libraries +and utilities suite, utilities are provided that read GSM-HR encoded speech +recording files and display all codec frames contained therein, in terms of +compressed speech parameters and accompanying flags. These utilities are as +follows: + +Utility Reads file format +----------------------------------------- +gsmhr-cod-parse ETSI *.cod +gsmhr-dec-parse ETSI *.dec +tw5b-dump TW-TS-005 Annex B + +gsmhr-cod-parse and gsmhr-dec-parse expect the local machine's native byte order +by default; -b and -l override options are supported. + +ThemWi utilities for FR, EFR and AMR codecs display compressed speech parameters +in decimal form separated by spaces, with each subframe on its own line after +per-frame LPC parameters. A different format has been adopted for GSM-HR: + +* Individual speech parameters are displayed in hex, with a fixed number of + digits corresponding to the size of each parameter in bits; + +* Only two lines are used to display the actual speech parameters for each + frame, with per-frame parameters on the first line and all subframe parameters + on the second line; + +* The set of LPC parameters and each of the 4 subframe parameter sets are + displayed as comma-separated triplets; R0, Int and Mode parameters are + displayed as singletons; + +* Each just-described triplet or singleton is displayed as Name=value for better + readability; + +* Ignoring Name= annotations and treating commas and spaces as equivalent, all + 18 speech parameters are printed in their standard order as defined by ETSI. + +File format conversion utilities +================================ + +The following format conversions are supported between different GSM-HR encoded +speech formats: + +Utility From format To format +--------------------------------------------------------- +gsmhr-cod2hex ETSI *.cod TW-TS-005 Annex B +gsmhr-dec2hex ETSI *.dec TW-TS-005 Annex B +gsmhr-hex2dec TW-TS-005 Annex B ETSI *.dec +gsmhr-hex2rpf TW-TS-005 Annex B Raw packed format +gsmhr-rpf2hex Raw packed format TW-TS-005 Annex B + +The hexadecimal format of TW-TS-005 Annex B is treated as central; all provided +file format conversion utilities convert either from or to this central format. +Additional notes follow regarding each supported conversion. + +Conversion from ETSI *.cod to TW-TS-005 Annex B +----------------------------------------------- + +ETSI *.cod format naturally represents only Tx semantics, while TW-TS-005 +Annex B supports both semantics. Semantics don't change with file format +conversion, hence the output of gsmhr-cod2hex still has Tx semantics. -b and -l +options are supported for *.cod input; hex output is written in the basic RTP +format by default or in the extended RTP format with -x option. + +Conversion in the opposite direction is not supported, as there is no way to +resurrect VAD debug flag from a data source that lacks such. + +Conversion between ETSI *.dec and TW-TS-005 Annex B +--------------------------------------------------- + +Bidirectional conversion is supported between these two formats, carrying Rx +semantics. However, this conversion may be slightly lossy in each direction: + +* gsmhr-hex2dec is nothing more than a command line utility around libgsmhr1 + function gsmhr_rtp_in_direct() described in HR-codec-library article. The + exact same preprocessing step is done by every libgsmhr1-based program + whenever RTP input (be it real RTP or hex lines read from a TW-TS-005 Annex B + file) needs to be fed to GSM-HR speech decoder or TFO transform, hence the + output of gsmhr-hex2dec elucidates what always happens under the hood anyway. + + The extended RTP format with Rx semantics defined in TW-TS-002 allows RTP + payloads carrying GSM-HR invalid SID to either include or emit payload bits. + As explained in HR-codec-library article, gsmhr_decoder_twts002_in() function + and gsmhr_rtp_in_direct() wrapper around it ignore these optional payload + bits for invalid SID frames and always set all 18 speech parameters in the + dec-style frame to 0. This same behaviour becomes explicitly visible when + using gsmhr-hex2dec - but if the input contains invalid SID frames with + payload bits included, then the conversion is lossy in the strict sense. + +* gsmhr-dec2hex is an ad hoc program, not a wrapper around a library function, + as this operation is not needed in any standard workflow. The conversion may + be lossy in two cases: + + - All possible combinations that mean invalid SID (BFI=0 SID=1, BFI=1 SID=1, + BFI=1 SID=2, plus variants of the same with BFI=2) collapse into the same + representation in TW-TS-002, just like in 8 kbit/s TRAU frame format. + + - Whatever payload bits were given for these invalid SID frames in the 18 + speech parameter words are discarded, i.e., non-verbose invalid SID format + is written in TW-TS-002 output. + +Additional notes: + +* gsmhr-dec2hex expects the local machine's native byte order by default, but + supports -b and -l options. OTOH, gsmhr-hex2dec writes *.dec output in the + local machine's native byte order only. + +* By default gsmhr-hex2dec refuses to process files that contain BFI-no-data + frame gaps, as no such support exists in the standard GSM-HR speech decoder + from ETSI or its *.dec input format. (BFI=2 representation of such gaps is a + Themyscira extension.) -f option allows BFI=2 frames to be emitted. + +Conversion between TW-TS-005 Annex B and raw packed format +---------------------------------------------------------- + +Bidirectional conversion is supported between these two formats, carrying Tx +semantics. In gsmhr-hex2rpf conversion direction, the input hex file may be in +either basic or extended RTP format; if the latter is used, the only allowed +frame types are good speech (FT=0) and good SID (FT=2). The conversion is +lossless as long as Tx semantics are maintained, more specifically, as long as +the extended RTP format hex input does not contain any frames that are marked +as FT=2, but are not perfect SID with all 79 bits of SID codeword set to 1. If +such imperfect valid SID frames are present, they are converted to perfect SID. + +In gsmhr-rpf2hex conversion direction, each raw packed (TS 101 318) frame is +written out in hex, either unchanged (basic RTP format) or with a prepended +RFC 5993 ToC octet (extended RTP format, enabled with -x option). If -x option +is given, the classification of good speech vs good SID for the purpose of +emitted ToC octet is a check for perfect SID with all 79 bits of SID codeword +set to 1. + +gsmhr-rpf2hex conversion is always lossless. + +Speech encoder and decoder utilities +==================================== + +The present suite of tools provides 3 styles of speech encoder and decoder +utilities: + +gsmhr-encode Speech encoder, PCM speech input is in WAV format, compressed + speech output is in TW-TS-005 Annex B format. + +gsmhr-decode Speech decoder, compressed speech input is in TW-TS-005 Annex B + format, PCM speech output is in WAV format. + +gsmhr-encode-r Speech encoder, PCM speech input is in robe (raw big-endian) + format, compressed speech output is in TW-TS-005 Annex B format. + +gsmhr-decode-r Speech decoder, compressed speech input is in TW-TS-005 Annex B + format, PCM speech output is in robe format. + +gsmhr-etsi-enc Speech encoder, ETSI style, operating from *.inp to *.cod in + ETSI test sequence format. + +gsmhr-etsi-dec Speech decoder, ETSI style, operating from *.dec to *.out in + ETSI test sequence format. + +gsmhr-etsi-enc and gsmhr-etsi-dec utilities both read their inputs and write +their outputs in the local machine's native byte order by default. Both +utilities also accept -b and -l options that select the desired byte order +explicitly; these options affect both input and output for both encoder and +decoder utilities. The other two styles of speech encoder and decoder utilities +have no byte order concerns. + +TFO transform utilities +======================= + +TFO-transform article explains the general concept of TFO transform; +HR-codec-Rx-logic article explains ThemWi implementation of this transform for +GSM-HR codec. HR-codec-library article describes libgsmhr1 API functions for +this GSM-HR TFO transform; here are 2 command line utilities that exercise it: + +gsmhr-tfo-xfrm This TFO transform exerciser reads a stream of radio + leg A Rx frames from a TW-TS-005 Annex B hex file and + writes the "pristine" stream intended for radio leg B + Tx into another TW-TS-005 Annex B hex file. -d option + enables DTXd (disabled by default); -x option switches + the output RTP format from basic to extended. + +gsmhr-tfo-xfrm-dc This TFO transform utility reads radio leg A Rx input + in ETSI *.dec format and emits radio leg B Tx output in + ETSI *.cod format, thus acting as an inverse of + GSM 06.06 REID utility that was originally used to + generate test sequence *.dec files. This variant + exercises libgsmhr1 TFO transform function in its most + native form. + +gsmhr-tfo-xfrm-dc reads its *.dec input and writes its *.cod output in the +local machine's native byte order by default. -b and -l options are supported, +selecting either big-endian or little-endian byte order explicitly; these +options affect both *.dec input and *.cod output. OTOH, the more FR-like +gsmhr-tfo-xfrm utility has no byte order concerns. + +Hand-crafting *.cod and *.dec files +=================================== + +The present suite of GSM-HR codec utilities includes specialized tools for +hand-crafting *.cod and *.dec files: gsmhr-cod-craft and gsmhr-dec-craft, +respectively. Each utility reads an ad hoc line-based ASCII source file and +emits its respective binary format. The ad hoc source language for these two +special-purpose tools is the same except for parts that set frame metadata +flags, which are different between *.cod and *.dec given their opposite +semantics. + +Because of their highly specialized nature, these two utilities are not +documented further - please read the source code for further understanding. +Work scope limits explained in HR-codec-limits article apply here.
--- a/doc/TFO-transform Fri Mar 20 06:43:50 2026 +0000 +++ b/doc/TFO-transform Thu Mar 26 22:48:20 2026 +0000 @@ -115,5 +115,5 @@ the library that implements the present TFO transform along with other GSM-HR codec functions. -HR-codec-utils gsmhr-tfo-xfrm and gsmhr-tfo-xfrm-dc utilities will be +HR-codec-utils gsmhr-tfo-xfrm and gsmhr-tfo-xfrm-dc utilities are documented here.
--- a/doc/Utils-overview Fri Mar 20 06:43:50 2026 +0000 +++ b/doc/Utils-overview Thu Mar 26 22:48:20 2026 +0000 @@ -77,6 +77,24 @@ gsmfr-tfo-xfrm See TFO-transform article. +gsmhr-cod-craft See HR-codec-utils article. +gsmhr-cod-parse +gsmhr-cod2hex +gsmhr-dec-craft +gsmhr-dec-parse +gsmhr-dec2hex +gsmhr-decode +gsmhr-decode-r +gsmhr-encode +gsmhr-encode-r +gsmhr-etsi-dec +gsmhr-etsi-enc +gsmhr-hex2dec +gsmhr-hex2rpf +gsmhr-rpf2hex +gsmhr-tfo-xfrm +gsmhr-tfo-xfrm-dc + gsmrec-dump See Binary-file-format article. gsmx-to-tw5a See TW-TS-005 article.
