FreeCalypso > hg > gsm-codec-lib
view doc/HR-codec-utils @ 642:4122baa843c5 default tip
CHANGES: gsm-codec-lib-r5
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Fri, 27 Mar 2026 00:13:07 +0000 |
| parents | 83de961cc54b |
| children |
line wrap: on
line source
Beginning with gsm-codec-lib-r5 release, Themyscira Wireless GSM codec libraries and utilities package includes support for GSM-HR codec for the sake of completeness, alongside with more useful FR, EFR and AMR codecs. The set of command line utilities for GSM-HR codec includes speech encoding and decoding, conversion of encoded speech between different formats, display of various encoded formats and certain specialized utilities described later in this article. File formats for GSM-HR encoded speech ====================================== The present suite of tools supports ETSI *.cod and *.dec formats, TW-TS-005 Annex B hexadecimal format, and a simple "raw packed" binary format. These file formats are explained below. ETSI *.cod encoder output format -------------------------------- ETSI reference implementation of GSM-HR speech encoder writes its output in this format; for each encoded 20 ms frame the output consists of 18 speech parameters followed by VAD and SP flags. Each parameter or flag is written into the file as a 16-bit word, hence each encoded 20 ms frame turns into 40 bytes in this format. ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.cod files; we have an ETSI-style speech encoder utility that emits this reference format and a TFO transform utility that does the same, and we support conversion from *.cod into our preferred TW-TS-005 Annex B hex format. ETSI *.dec decoder input format ------------------------------- ETSI reference implementation of GSM-HR speech decoder takes this format as its input. Each 20 ms Rx unit (traffic frame or garbage received in the place of one) consists of 18 speech parameters followed by 4 words of flags (BFI, UFI, SID and TAF), stored as 22 16-bit words. As explained in HR-codec-library article, ThemWi implementation of GSM-HR includes an extension to this decoder input format: BFI=1 means BFI with payload bits included in both ETSI and ThemWi versions, but BFI=2 (ThemWi extension) means BFI without payload bits. ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.dec files; we have an ETSI-style speech decoder utility that reads this reference format and a TFO transform utility that does the same, and we support bidirectional conversion between this *.dec format and our preferred TW-TS-005 Annex B hex format. TW-TS-005 Annex B hexadecimal format ------------------------------------ Themyscira Wireless Technical Specification TW-TS-005 defines a hexadecimal file format for sequences of RTP payloads for GSM speech codecs; TW-TS-005 Annex B specifies application of this hex file format to GSM-HR codec. This TW-TS-005 Annex B hex is the preferred file format for GSM-HR encoded speech recordings for most workflows. It can represent both Tx semantics (every 20 ms frame position is filled with either a good speech frame or a perfect SID; during DTX pauses a new SID appears in every frame) and Rx semantics (BFI frame gaps can occur anywhere and are expected during DTX pauses; SIDs can be valid or invalid) in the same file format, hence it is the operator's responsibility to know the semantics of each given recording file and to use it in the correct context. As explained in TW-TS-005 Annex B itself (see TW-TS-005 article for the most up-to-date link to the actual spec document), each frame can be represented either in the basic RTP format of ETSI TS 101 318 section 5.2, or in the extended RTP format of RFC 5993 and TW-TS-002. Utilities in the present suite that write TW-TS-005 Annex B hex files can be told to emit either format, with the exception of gsmhr-dec2hex utility which always emits the extended format; utilities that read these hex files accept both formats. The two RTP formats carry different information content: the basic format can only represent Tx semantics, while the extended format can represent both semantics. Compared to *.cod format, TW-TS-005 Annex B format with Tx semantics lacks the VAD flag, although the extended RTP format does represent an equivalent of SP flag. However, the inclusion of VAD flag in *.cod format is only a debug feature for speech encoder test sequences; it is not a part of the interface from the Tx DTX handler to the Tx RSS as defined in GSM 06.41 Tx chapter. Compared to *.dec format, TW-TS-005 Annex B format with Rx semantics (which has to use the extended RTP format) collapses 3 possible invalid SID conditions (BFI=0 SID=1, BFI=1 SID=1, BFI=1 SID=2) into the same TW-TS-002 representation of FT=1. However, per GSM 06.41 Table 1 the Rx DTX handler for GSM-HR is required to apply exactly the same handling to all 3 possibilities, and the same collapsing of invalid SID conditions also happens on TDM-based (8 kbit/s) Abis and Ater interfaces and in TFO, as detailed in GSM 08.61 and 08.62 specs. Raw packed binary format ------------------------ When working with FR and EFR codecs, a speech recording with Tx semantics can be stored in a gsmx binary file (see Binary-file-format article) that consists of directly abutted codec frames (good speech or SID) in RTP format, with exactly 33 (FR) or 31 (EFR) bytes per frame. We offer an equivalent ability for GSM-HR with the so-called raw packed format. It is a binary format that consists of directly abutted frames; each frame is 14 bytes long and stores a GSM-HR codec frame in the basic RTP format of TS 101 318 section 5.2, which we also call the raw packed format. This raw packed binary file format is not used directly by any of our speech encoder or decoder utilities, instead it is supported via gsmhr-hex2rpf and gsmhr-rpf2hex format conversion utilities. Common command line options =========================== Certain flag options are common across different utilities in the present suite of command line tools for GSM-HR codec; these common flags are as follows: -b and -l Utilities that read or write ETSI *.cod or *.dec format emit and expect the local machine's native byte order by default. -b option forces big-endian byte order; -l forces little-endian. -d Speech encoder utilites run with Tx DTX disabled by default; -d option enables speech encoding with DTX. The same logic applies to DTXd control in TFO transform utilities. -x Utilities that emit TW-TS-005 Annex B hex format with Tx semantics emit the basic RTP format (TS 101 318) by default; -x option switches to the extended RTP format. (The latter format is TW-TS-002, but it also constitutes valid RFC 5993 in the case of Tx semantics.) Inspecting encoded speech file formats ====================================== In common with other GSM speech codecs supported by ThemWi GSM codec libraries and utilities suite, utilities are provided that read GSM-HR encoded speech recording files and display all codec frames contained therein, in terms of compressed speech parameters and accompanying flags. These utilities are as follows: Utility Reads file format ----------------------------------------- gsmhr-cod-parse ETSI *.cod gsmhr-dec-parse ETSI *.dec tw5b-dump TW-TS-005 Annex B gsmhr-cod-parse and gsmhr-dec-parse expect the local machine's native byte order by default; -b and -l override options are supported. ThemWi utilities for FR, EFR and AMR codecs display compressed speech parameters in decimal form separated by spaces, with each subframe on its own line after per-frame LPC parameters. A different format has been adopted for GSM-HR: * Individual speech parameters are displayed in hex, with a fixed number of digits corresponding to the size of each parameter in bits; * Only two lines are used to display the actual speech parameters for each frame, with per-frame parameters on the first line and all subframe parameters on the second line; * The set of LPC parameters and each of the 4 subframe parameter sets are displayed as comma-separated triplets; R0, Int and Mode parameters are displayed as singletons; * Each just-described triplet or singleton is displayed as Name=value for better readability; * Ignoring Name= annotations and treating commas and spaces as equivalent, all 18 speech parameters are printed in their standard order as defined by ETSI. File format conversion utilities ================================ The following format conversions are supported between different GSM-HR encoded speech formats: Utility From format To format --------------------------------------------------------- gsmhr-cod2hex ETSI *.cod TW-TS-005 Annex B gsmhr-dec2hex ETSI *.dec TW-TS-005 Annex B gsmhr-hex2dec TW-TS-005 Annex B ETSI *.dec gsmhr-hex2rpf TW-TS-005 Annex B Raw packed format gsmhr-rpf2hex Raw packed format TW-TS-005 Annex B The hexadecimal format of TW-TS-005 Annex B is treated as central; all provided file format conversion utilities convert either from or to this central format. Additional notes follow regarding each supported conversion. Conversion from ETSI *.cod to TW-TS-005 Annex B ----------------------------------------------- ETSI *.cod format naturally represents only Tx semantics, while TW-TS-005 Annex B supports both semantics. Semantics don't change with file format conversion, hence the output of gsmhr-cod2hex still has Tx semantics. -b and -l options are supported for *.cod input; hex output is written in the basic RTP format by default or in the extended RTP format with -x option. Conversion in the opposite direction is not supported, as there is no way to resurrect VAD debug flag from a data source that lacks such. Conversion between ETSI *.dec and TW-TS-005 Annex B --------------------------------------------------- Bidirectional conversion is supported between these two formats, carrying Rx semantics. However, this conversion may be slightly lossy in each direction: * gsmhr-hex2dec is nothing more than a command line utility around libgsmhr1 function gsmhr_rtp_in_direct() described in HR-codec-library article. The exact same preprocessing step is done by every libgsmhr1-based program whenever RTP input (be it real RTP or hex lines read from a TW-TS-005 Annex B file) needs to be fed to GSM-HR speech decoder or TFO transform, hence the output of gsmhr-hex2dec elucidates what always happens under the hood anyway. The extended RTP format with Rx semantics defined in TW-TS-002 allows RTP payloads carrying GSM-HR invalid SID to either include or emit payload bits. As explained in HR-codec-library article, gsmhr_decoder_twts002_in() function and gsmhr_rtp_in_direct() wrapper around it ignore these optional payload bits for invalid SID frames and always set all 18 speech parameters in the dec-style frame to 0. This same behaviour becomes explicitly visible when using gsmhr-hex2dec - but if the input contains invalid SID frames with payload bits included, then the conversion is lossy in the strict sense. * gsmhr-dec2hex is an ad hoc program, not a wrapper around a library function, as this operation is not needed in any standard workflow. The conversion may be lossy in two cases: - All possible combinations that mean invalid SID (BFI=0 SID=1, BFI=1 SID=1, BFI=1 SID=2, plus variants of the same with BFI=2) collapse into the same representation in TW-TS-002, just like in 8 kbit/s TRAU frame format. - Whatever payload bits were given for these invalid SID frames in the 18 speech parameter words are discarded, i.e., non-verbose invalid SID format is written in TW-TS-002 output. Additional notes: * gsmhr-dec2hex expects the local machine's native byte order by default, but supports -b and -l options. OTOH, gsmhr-hex2dec writes *.dec output in the local machine's native byte order only. * By default gsmhr-hex2dec refuses to process files that contain BFI-no-data frame gaps, as no such support exists in the standard GSM-HR speech decoder from ETSI or its *.dec input format. (BFI=2 representation of such gaps is a Themyscira extension.) -f option allows BFI=2 frames to be emitted. Conversion between TW-TS-005 Annex B and raw packed format ---------------------------------------------------------- Bidirectional conversion is supported between these two formats, carrying Tx semantics. In gsmhr-hex2rpf conversion direction, the input hex file may be in either basic or extended RTP format; if the latter is used, the only allowed frame types are good speech (FT=0) and good SID (FT=2). The conversion is lossless as long as Tx semantics are maintained, more specifically, as long as the extended RTP format hex input does not contain any frames that are marked as FT=2, but are not perfect SID with all 79 bits of SID codeword set to 1. If such imperfect valid SID frames are present, they are converted to perfect SID. In gsmhr-rpf2hex conversion direction, each raw packed (TS 101 318) frame is written out in hex, either unchanged (basic RTP format) or with a prepended RFC 5993 ToC octet (extended RTP format, enabled with -x option). If -x option is given, the classification of good speech vs good SID for the purpose of emitted ToC octet is a check for perfect SID with all 79 bits of SID codeword set to 1. gsmhr-rpf2hex conversion is always lossless. Speech encoder and decoder utilities ==================================== The present suite of tools provides 3 styles of speech encoder and decoder utilities: gsmhr-encode Speech encoder, PCM speech input is in WAV format, compressed speech output is in TW-TS-005 Annex B format. gsmhr-decode Speech decoder, compressed speech input is in TW-TS-005 Annex B format, PCM speech output is in WAV format. gsmhr-encode-r Speech encoder, PCM speech input is in robe (raw big-endian) format, compressed speech output is in TW-TS-005 Annex B format. gsmhr-decode-r Speech decoder, compressed speech input is in TW-TS-005 Annex B format, PCM speech output is in robe format. gsmhr-etsi-enc Speech encoder, ETSI style, operating from *.inp to *.cod in ETSI test sequence format. gsmhr-etsi-dec Speech decoder, ETSI style, operating from *.dec to *.out in ETSI test sequence format. gsmhr-etsi-enc and gsmhr-etsi-dec utilities both read their inputs and write their outputs in the local machine's native byte order by default. Both utilities also accept -b and -l options that select the desired byte order explicitly; these options affect both input and output for both encoder and decoder utilities. The other two styles of speech encoder and decoder utilities have no byte order concerns. TFO transform utilities ======================= TFO-transform article explains the general concept of TFO transform; HR-codec-Rx-logic article explains ThemWi implementation of this transform for GSM-HR codec. HR-codec-library article describes libgsmhr1 API functions for this GSM-HR TFO transform; here are 2 command line utilities that exercise it: gsmhr-tfo-xfrm This TFO transform exerciser reads a stream of radio leg A Rx frames from a TW-TS-005 Annex B hex file and writes the "pristine" stream intended for radio leg B Tx into another TW-TS-005 Annex B hex file. -d option enables DTXd (disabled by default); -x option switches the output RTP format from basic to extended. gsmhr-tfo-xfrm-dc This TFO transform utility reads radio leg A Rx input in ETSI *.dec format and emits radio leg B Tx output in ETSI *.cod format, thus acting as an inverse of GSM 06.06 REID utility that was originally used to generate test sequence *.dec files. This variant exercises libgsmhr1 TFO transform function in its most native form. gsmhr-tfo-xfrm-dc reads its *.dec input and writes its *.cod output in the local machine's native byte order by default. -b and -l options are supported, selecting either big-endian or little-endian byte order explicitly; these options affect both *.dec input and *.cod output. OTOH, the more FR-like gsmhr-tfo-xfrm utility has no byte order concerns. Hand-crafting *.cod and *.dec files =================================== The present suite of GSM-HR codec utilities includes specialized tools for hand-crafting *.cod and *.dec files: gsmhr-cod-craft and gsmhr-dec-craft, respectively. Each utility reads an ad hoc line-based ASCII source file and emits its respective binary format. The ad hoc source language for these two special-purpose tools is the same except for parts that set frame metadata flags, which are different between *.cod and *.dec given their opposite semantics. Because of their highly specialized nature, these two utilities are not documented further - please read the source code for further understanding. Work scope limits explained in HR-codec-limits article apply here.
