changeset 641:83de961cc54b

document GSM-HR codec utilities
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 26 Mar 2026 22:48:20 +0000
parents e0e5905261e2
children 4122baa843c5
files doc/HR-codec-utils doc/TFO-transform doc/Utils-overview
diffstat 3 files changed, 350 insertions(+), 1 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/HR-codec-utils	Thu Mar 26 22:48:20 2026 +0000
@@ -0,0 +1,331 @@
+Beginning with gsm-codec-lib-r5 release, Themyscira Wireless GSM codec libraries
+and utilities package includes support for GSM-HR codec for the sake of
+completeness, alongside with more useful FR, EFR and AMR codecs.  The set of
+command line utilities for GSM-HR codec includes speech encoding and decoding,
+conversion of encoded speech between different formats, display of various
+encoded formats and certain specialized utilities described later in this
+article.
+
+File formats for GSM-HR encoded speech
+======================================
+
+The present suite of tools supports ETSI *.cod and *.dec formats, TW-TS-005
+Annex B hexadecimal format, and a simple "raw packed" binary format.  These
+file formats are explained below.
+
+ETSI *.cod encoder output format
+--------------------------------
+
+ETSI reference implementation of GSM-HR speech encoder writes its output in
+this format; for each encoded 20 ms frame the output consists of 18 speech
+parameters followed by VAD and SP flags.  Each parameter or flag is written
+into the file as a 16-bit word, hence each encoded 20 ms frame turns into 40
+bytes in this format.
+
+ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.cod
+files; we have an ETSI-style speech encoder utility that emits this reference
+format and a TFO transform utility that does the same, and we support conversion
+from *.cod into our preferred TW-TS-005 Annex B hex format.
+
+ETSI *.dec decoder input format
+-------------------------------
+
+ETSI reference implementation of GSM-HR speech decoder takes this format as its
+input.  Each 20 ms Rx unit (traffic frame or garbage received in the place of
+one) consists of 18 speech parameters followed by 4 words of flags (BFI, UFI,
+SID and TAF), stored as 22 16-bit words.  As explained in HR-codec-library
+article, ThemWi implementation of GSM-HR includes an extension to this decoder
+input format: BFI=1 means BFI with payload bits included in both ETSI and
+ThemWi versions, but BFI=2 (ThemWi extension) means BFI without payload bits.
+
+ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.dec
+files; we have an ETSI-style speech decoder utility that reads this reference
+format and a TFO transform utility that does the same, and we support
+bidirectional conversion between this *.dec format and our preferred TW-TS-005
+Annex B hex format.
+
+TW-TS-005 Annex B hexadecimal format
+------------------------------------
+
+Themyscira Wireless Technical Specification TW-TS-005 defines a hexadecimal file
+format for sequences of RTP payloads for GSM speech codecs; TW-TS-005 Annex B
+specifies application of this hex file format to GSM-HR codec.
+
+This TW-TS-005 Annex B hex is the preferred file format for GSM-HR encoded
+speech recordings for most workflows.  It can represent both Tx semantics (every
+20 ms frame position is filled with either a good speech frame or a perfect SID;
+during DTX pauses a new SID appears in every frame) and Rx semantics (BFI frame
+gaps can occur anywhere and are expected during DTX pauses; SIDs can be valid
+or invalid) in the same file format, hence it is the operator's responsibility
+to know the semantics of each given recording file and to use it in the correct
+context.
+
+As explained in TW-TS-005 Annex B itself (see TW-TS-005 article for the most
+up-to-date link to the actual spec document), each frame can be represented
+either in the basic RTP format of ETSI TS 101 318 section 5.2, or in the
+extended RTP format of RFC 5993 and TW-TS-002.  Utilities in the present suite
+that write TW-TS-005 Annex B hex files can be told to emit either format, with
+the exception of gsmhr-dec2hex utility which always emits the extended format;
+utilities that read these hex files accept both formats.  The two RTP formats
+carry different information content: the basic format can only represent Tx
+semantics, while the extended format can represent both semantics.
+
+Compared to *.cod format, TW-TS-005 Annex B format with Tx semantics lacks the
+VAD flag, although the extended RTP format does represent an equivalent of SP
+flag.  However, the inclusion of VAD flag in *.cod format is only a debug
+feature for speech encoder test sequences; it is not a part of the interface
+from the Tx DTX handler to the Tx RSS as defined in GSM 06.41 Tx chapter.
+
+Compared to *.dec format, TW-TS-005 Annex B format with Rx semantics (which has
+to use the extended RTP format) collapses 3 possible invalid SID conditions
+(BFI=0 SID=1, BFI=1 SID=1, BFI=1 SID=2) into the same TW-TS-002 representation
+of FT=1.  However, per GSM 06.41 Table 1 the Rx DTX handler for GSM-HR is
+required to apply exactly the same handling to all 3 possibilities, and the
+same collapsing of invalid SID conditions also happens on TDM-based (8 kbit/s)
+Abis and Ater interfaces and in TFO, as detailed in GSM 08.61 and 08.62 specs.
+
+Raw packed binary format
+------------------------
+
+When working with FR and EFR codecs, a speech recording with Tx semantics can be
+stored in a gsmx binary file (see Binary-file-format article) that consists of
+directly abutted codec frames (good speech or SID) in RTP format, with exactly
+33 (FR) or 31 (EFR) bytes per frame.  We offer an equivalent ability for GSM-HR
+with the so-called raw packed format.  It is a binary format that consists of
+directly abutted frames; each frame is 14 bytes long and stores a GSM-HR codec
+frame in the basic RTP format of TS 101 318 section 5.2, which we also call the
+raw packed format.
+
+This raw packed binary file format is not used directly by any of our speech
+encoder or decoder utilities, instead it is supported via gsmhr-hex2rpf and
+gsmhr-rpf2hex format conversion utilities.
+
+Common command line options
+===========================
+
+Certain flag options are common across different utilities in the present suite
+of command line tools for GSM-HR codec; these common flags are as follows:
+
+-b and -l	Utilities that read or write ETSI *.cod or *.dec format emit
+		and expect the local machine's native byte order by default.
+		-b option forces big-endian byte order; -l forces little-endian.
+
+-d		Speech encoder utilites run with Tx DTX disabled by default;
+		-d option enables speech encoding with DTX.  The same logic
+		applies to DTXd control in TFO transform utilities.
+
+-x		Utilities that emit TW-TS-005 Annex B hex format with Tx
+		semantics emit the basic RTP format (TS 101 318) by default;
+		-x option switches to the extended RTP format.  (The latter
+		format is TW-TS-002, but it also constitutes valid RFC 5993 in
+		the case of Tx semantics.)
+
+Inspecting encoded speech file formats
+======================================
+
+In common with other GSM speech codecs supported by ThemWi GSM codec libraries
+and utilities suite, utilities are provided that read GSM-HR encoded speech
+recording files and display all codec frames contained therein, in terms of
+compressed speech parameters and accompanying flags.  These utilities are as
+follows:
+
+Utility			Reads file format
+-----------------------------------------
+gsmhr-cod-parse		ETSI *.cod
+gsmhr-dec-parse		ETSI *.dec
+tw5b-dump		TW-TS-005 Annex B
+
+gsmhr-cod-parse and gsmhr-dec-parse expect the local machine's native byte order
+by default; -b and -l override options are supported.
+
+ThemWi utilities for FR, EFR and AMR codecs display compressed speech parameters
+in decimal form separated by spaces, with each subframe on its own line after
+per-frame LPC parameters.  A different format has been adopted for GSM-HR:
+
+* Individual speech parameters are displayed in hex, with a fixed number of
+  digits corresponding to the size of each parameter in bits;
+
+* Only two lines are used to display the actual speech parameters for each
+  frame, with per-frame parameters on the first line and all subframe parameters
+  on the second line;
+
+* The set of LPC parameters and each of the 4 subframe parameter sets are
+  displayed as comma-separated triplets; R0, Int and Mode parameters are
+  displayed as singletons;
+
+* Each just-described triplet or singleton is displayed as Name=value for better
+  readability;
+
+* Ignoring Name= annotations and treating commas and spaces as equivalent, all
+  18 speech parameters are printed in their standard order as defined by ETSI.
+
+File format conversion utilities
+================================
+
+The following format conversions are supported between different GSM-HR encoded
+speech formats:
+
+Utility		From format		To format
+---------------------------------------------------------
+gsmhr-cod2hex	ETSI *.cod		TW-TS-005 Annex B
+gsmhr-dec2hex	ETSI *.dec		TW-TS-005 Annex B
+gsmhr-hex2dec	TW-TS-005 Annex B	ETSI *.dec
+gsmhr-hex2rpf	TW-TS-005 Annex B	Raw packed format
+gsmhr-rpf2hex	Raw packed format	TW-TS-005 Annex B
+
+The hexadecimal format of TW-TS-005 Annex B is treated as central; all provided
+file format conversion utilities convert either from or to this central format.
+Additional notes follow regarding each supported conversion.
+
+Conversion from ETSI *.cod to TW-TS-005 Annex B
+-----------------------------------------------
+
+ETSI *.cod format naturally represents only Tx semantics, while TW-TS-005
+Annex B supports both semantics.  Semantics don't change with file format
+conversion, hence the output of gsmhr-cod2hex still has Tx semantics.  -b and -l
+options are supported for *.cod input; hex output is written in the basic RTP
+format by default or in the extended RTP format with -x option.
+
+Conversion in the opposite direction is not supported, as there is no way to
+resurrect VAD debug flag from a data source that lacks such.
+
+Conversion between ETSI *.dec and TW-TS-005 Annex B
+---------------------------------------------------
+
+Bidirectional conversion is supported between these two formats, carrying Rx
+semantics.  However, this conversion may be slightly lossy in each direction:
+
+* gsmhr-hex2dec is nothing more than a command line utility around libgsmhr1
+  function gsmhr_rtp_in_direct() described in HR-codec-library article.  The
+  exact same preprocessing step is done by every libgsmhr1-based program
+  whenever RTP input (be it real RTP or hex lines read from a TW-TS-005 Annex B
+  file) needs to be fed to GSM-HR speech decoder or TFO transform, hence the
+  output of gsmhr-hex2dec elucidates what always happens under the hood anyway.
+
+  The extended RTP format with Rx semantics defined in TW-TS-002 allows RTP
+  payloads carrying GSM-HR invalid SID to either include or emit payload bits.
+  As explained in HR-codec-library article, gsmhr_decoder_twts002_in() function
+  and gsmhr_rtp_in_direct() wrapper around it ignore these optional payload
+  bits for invalid SID frames and always set all 18 speech parameters in the
+  dec-style frame to 0.  This same behaviour becomes explicitly visible when
+  using gsmhr-hex2dec - but if the input contains invalid SID frames with
+  payload bits included, then the conversion is lossy in the strict sense.
+
+* gsmhr-dec2hex is an ad hoc program, not a wrapper around a library function,
+  as this operation is not needed in any standard workflow.  The conversion may
+  be lossy in two cases:
+
+  - All possible combinations that mean invalid SID (BFI=0 SID=1, BFI=1 SID=1,
+    BFI=1 SID=2, plus variants of the same with BFI=2) collapse into the same
+    representation in TW-TS-002, just like in 8 kbit/s TRAU frame format.
+
+  - Whatever payload bits were given for these invalid SID frames in the 18
+    speech parameter words are discarded, i.e., non-verbose invalid SID format
+    is written in TW-TS-002 output.
+
+Additional notes:
+
+* gsmhr-dec2hex expects the local machine's native byte order by default, but
+  supports -b and -l options.  OTOH, gsmhr-hex2dec writes *.dec output in the
+  local machine's native byte order only.
+
+* By default gsmhr-hex2dec refuses to process files that contain BFI-no-data
+  frame gaps, as no such support exists in the standard GSM-HR speech decoder
+  from ETSI or its *.dec input format.  (BFI=2 representation of such gaps is a
+  Themyscira extension.)  -f option allows BFI=2 frames to be emitted.
+
+Conversion between TW-TS-005 Annex B and raw packed format
+----------------------------------------------------------
+
+Bidirectional conversion is supported between these two formats, carrying Tx
+semantics.  In gsmhr-hex2rpf conversion direction, the input hex file may be in
+either basic or extended RTP format; if the latter is used, the only allowed
+frame types are good speech (FT=0) and good SID (FT=2).  The conversion is
+lossless as long as Tx semantics are maintained, more specifically, as long as
+the extended RTP format hex input does not contain any frames that are marked
+as FT=2, but are not perfect SID with all 79 bits of SID codeword set to 1.  If
+such imperfect valid SID frames are present, they are converted to perfect SID.
+
+In gsmhr-rpf2hex conversion direction, each raw packed (TS 101 318) frame is
+written out in hex, either unchanged (basic RTP format) or with a prepended
+RFC 5993 ToC octet (extended RTP format, enabled with -x option).  If -x option
+is given, the classification of good speech vs good SID for the purpose of
+emitted ToC octet is a check for perfect SID with all 79 bits of SID codeword
+set to 1.
+
+gsmhr-rpf2hex conversion is always lossless.
+
+Speech encoder and decoder utilities
+====================================
+
+The present suite of tools provides 3 styles of speech encoder and decoder
+utilities:
+
+gsmhr-encode	Speech encoder, PCM speech input is in WAV format, compressed
+		speech output is in TW-TS-005 Annex B format.
+
+gsmhr-decode	Speech decoder, compressed speech input is in TW-TS-005 Annex B
+		format, PCM speech output is in WAV format.
+
+gsmhr-encode-r	Speech encoder, PCM speech input is in robe (raw big-endian)
+		format, compressed speech output is in TW-TS-005 Annex B format.
+
+gsmhr-decode-r	Speech decoder, compressed speech input is in TW-TS-005 Annex B
+		format, PCM speech output is in robe format.
+
+gsmhr-etsi-enc	Speech encoder, ETSI style, operating from *.inp to *.cod in
+		ETSI test sequence format.
+
+gsmhr-etsi-dec	Speech decoder, ETSI style, operating from *.dec to *.out in
+		ETSI test sequence format.
+
+gsmhr-etsi-enc and gsmhr-etsi-dec utilities both read their inputs and write
+their outputs in the local machine's native byte order by default.  Both
+utilities also accept -b and -l options that select the desired byte order
+explicitly; these options affect both input and output for both encoder and
+decoder utilities.  The other two styles of speech encoder and decoder utilities
+have no byte order concerns.
+
+TFO transform utilities
+=======================
+
+TFO-transform article explains the general concept of TFO transform;
+HR-codec-Rx-logic article explains ThemWi implementation of this transform for
+GSM-HR codec.  HR-codec-library article describes libgsmhr1 API functions for
+this GSM-HR TFO transform; here are 2 command line utilities that exercise it:
+
+gsmhr-tfo-xfrm		This TFO transform exerciser reads a stream of radio
+			leg A Rx frames from a TW-TS-005 Annex B hex file and
+			writes the "pristine" stream intended for radio leg B
+			Tx into another TW-TS-005 Annex B hex file.  -d option
+			enables DTXd (disabled by default); -x option switches
+			the output RTP format from basic to extended.
+
+gsmhr-tfo-xfrm-dc	This TFO transform utility reads radio leg A Rx input
+			in ETSI *.dec format and emits radio leg B Tx output in
+			ETSI *.cod format, thus acting as an inverse of
+			GSM 06.06 REID utility that was originally used to
+			generate test sequence *.dec files.  This variant
+			exercises libgsmhr1 TFO transform function in its most
+			native form.
+
+gsmhr-tfo-xfrm-dc reads its *.dec input and writes its *.cod output in the
+local machine's native byte order by default.  -b and -l options are supported,
+selecting either big-endian or little-endian byte order explicitly; these
+options affect both *.dec input and *.cod output.  OTOH, the more FR-like
+gsmhr-tfo-xfrm utility has no byte order concerns.
+
+Hand-crafting *.cod and *.dec files
+===================================
+
+The present suite of GSM-HR codec utilities includes specialized tools for
+hand-crafting *.cod and *.dec files: gsmhr-cod-craft and gsmhr-dec-craft,
+respectively.  Each utility reads an ad hoc line-based ASCII source file and
+emits its respective binary format.  The ad hoc source language for these two
+special-purpose tools is the same except for parts that set frame metadata
+flags, which are different between *.cod and *.dec given their opposite
+semantics.
+
+Because of their highly specialized nature, these two utilities are not
+documented further - please read the source code for further understanding.
+Work scope limits explained in HR-codec-limits article apply here.
--- a/doc/TFO-transform	Fri Mar 20 06:43:50 2026 +0000
+++ b/doc/TFO-transform	Thu Mar 26 22:48:20 2026 +0000
@@ -115,5 +115,5 @@
 			the library that implements the present TFO transform
 			along with other GSM-HR codec functions.
 
-HR-codec-utils		gsmhr-tfo-xfrm and gsmhr-tfo-xfrm-dc utilities will be
+HR-codec-utils		gsmhr-tfo-xfrm and gsmhr-tfo-xfrm-dc utilities are
 			documented here.
--- a/doc/Utils-overview	Fri Mar 20 06:43:50 2026 +0000
+++ b/doc/Utils-overview	Thu Mar 26 22:48:20 2026 +0000
@@ -77,6 +77,24 @@
 
 gsmfr-tfo-xfrm		See TFO-transform article.
 
+gsmhr-cod-craft		See HR-codec-utils article.
+gsmhr-cod-parse
+gsmhr-cod2hex
+gsmhr-dec-craft
+gsmhr-dec-parse
+gsmhr-dec2hex
+gsmhr-decode
+gsmhr-decode-r
+gsmhr-encode
+gsmhr-encode-r
+gsmhr-etsi-dec
+gsmhr-etsi-enc
+gsmhr-hex2dec
+gsmhr-hex2rpf
+gsmhr-rpf2hex
+gsmhr-tfo-xfrm
+gsmhr-tfo-xfrm-dc
+
 gsmrec-dump		See Binary-file-format article.
 
 gsmx-to-tw5a		See TW-TS-005 article.