view doc/HR-codec-utils @ 642:4122baa843c5 default tip

CHANGES: gsm-codec-lib-r5
author Mychaela Falconia <falcon@freecalypso.org>
date Fri, 27 Mar 2026 00:13:07 +0000
parents 83de961cc54b
children
line wrap: on
line source

Beginning with gsm-codec-lib-r5 release, Themyscira Wireless GSM codec libraries
and utilities package includes support for GSM-HR codec for the sake of
completeness, alongside with more useful FR, EFR and AMR codecs.  The set of
command line utilities for GSM-HR codec includes speech encoding and decoding,
conversion of encoded speech between different formats, display of various
encoded formats and certain specialized utilities described later in this
article.

File formats for GSM-HR encoded speech
======================================

The present suite of tools supports ETSI *.cod and *.dec formats, TW-TS-005
Annex B hexadecimal format, and a simple "raw packed" binary format.  These
file formats are explained below.

ETSI *.cod encoder output format
--------------------------------

ETSI reference implementation of GSM-HR speech encoder writes its output in
this format; for each encoded 20 ms frame the output consists of 18 speech
parameters followed by VAD and SP flags.  Each parameter or flag is written
into the file as a 16-bit word, hence each encoded 20 ms frame turns into 40
bytes in this format.

ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.cod
files; we have an ETSI-style speech encoder utility that emits this reference
format and a TFO transform utility that does the same, and we support conversion
from *.cod into our preferred TW-TS-005 Annex B hex format.

ETSI *.dec decoder input format
-------------------------------

ETSI reference implementation of GSM-HR speech decoder takes this format as its
input.  Each 20 ms Rx unit (traffic frame or garbage received in the place of
one) consists of 18 speech parameters followed by 4 words of flags (BFI, UFI,
SID and TAF), stored as 22 16-bit words.  As explained in HR-codec-library
article, ThemWi implementation of GSM-HR includes an extension to this decoder
input format: BFI=1 means BFI with payload bits included in both ETSI and
ThemWi versions, but BFI=2 (ThemWi extension) means BFI without payload bits.

ThemWi suite of GSM-HR codec utilities allows examining and hand-crafting *.dec
files; we have an ETSI-style speech decoder utility that reads this reference
format and a TFO transform utility that does the same, and we support
bidirectional conversion between this *.dec format and our preferred TW-TS-005
Annex B hex format.

TW-TS-005 Annex B hexadecimal format
------------------------------------

Themyscira Wireless Technical Specification TW-TS-005 defines a hexadecimal file
format for sequences of RTP payloads for GSM speech codecs; TW-TS-005 Annex B
specifies application of this hex file format to GSM-HR codec.

This TW-TS-005 Annex B hex is the preferred file format for GSM-HR encoded
speech recordings for most workflows.  It can represent both Tx semantics (every
20 ms frame position is filled with either a good speech frame or a perfect SID;
during DTX pauses a new SID appears in every frame) and Rx semantics (BFI frame
gaps can occur anywhere and are expected during DTX pauses; SIDs can be valid
or invalid) in the same file format, hence it is the operator's responsibility
to know the semantics of each given recording file and to use it in the correct
context.

As explained in TW-TS-005 Annex B itself (see TW-TS-005 article for the most
up-to-date link to the actual spec document), each frame can be represented
either in the basic RTP format of ETSI TS 101 318 section 5.2, or in the
extended RTP format of RFC 5993 and TW-TS-002.  Utilities in the present suite
that write TW-TS-005 Annex B hex files can be told to emit either format, with
the exception of gsmhr-dec2hex utility which always emits the extended format;
utilities that read these hex files accept both formats.  The two RTP formats
carry different information content: the basic format can only represent Tx
semantics, while the extended format can represent both semantics.

Compared to *.cod format, TW-TS-005 Annex B format with Tx semantics lacks the
VAD flag, although the extended RTP format does represent an equivalent of SP
flag.  However, the inclusion of VAD flag in *.cod format is only a debug
feature for speech encoder test sequences; it is not a part of the interface
from the Tx DTX handler to the Tx RSS as defined in GSM 06.41 Tx chapter.

Compared to *.dec format, TW-TS-005 Annex B format with Rx semantics (which has
to use the extended RTP format) collapses 3 possible invalid SID conditions
(BFI=0 SID=1, BFI=1 SID=1, BFI=1 SID=2) into the same TW-TS-002 representation
of FT=1.  However, per GSM 06.41 Table 1 the Rx DTX handler for GSM-HR is
required to apply exactly the same handling to all 3 possibilities, and the
same collapsing of invalid SID conditions also happens on TDM-based (8 kbit/s)
Abis and Ater interfaces and in TFO, as detailed in GSM 08.61 and 08.62 specs.

Raw packed binary format
------------------------

When working with FR and EFR codecs, a speech recording with Tx semantics can be
stored in a gsmx binary file (see Binary-file-format article) that consists of
directly abutted codec frames (good speech or SID) in RTP format, with exactly
33 (FR) or 31 (EFR) bytes per frame.  We offer an equivalent ability for GSM-HR
with the so-called raw packed format.  It is a binary format that consists of
directly abutted frames; each frame is 14 bytes long and stores a GSM-HR codec
frame in the basic RTP format of TS 101 318 section 5.2, which we also call the
raw packed format.

This raw packed binary file format is not used directly by any of our speech
encoder or decoder utilities, instead it is supported via gsmhr-hex2rpf and
gsmhr-rpf2hex format conversion utilities.

Common command line options
===========================

Certain flag options are common across different utilities in the present suite
of command line tools for GSM-HR codec; these common flags are as follows:

-b and -l	Utilities that read or write ETSI *.cod or *.dec format emit
		and expect the local machine's native byte order by default.
		-b option forces big-endian byte order; -l forces little-endian.

-d		Speech encoder utilites run with Tx DTX disabled by default;
		-d option enables speech encoding with DTX.  The same logic
		applies to DTXd control in TFO transform utilities.

-x		Utilities that emit TW-TS-005 Annex B hex format with Tx
		semantics emit the basic RTP format (TS 101 318) by default;
		-x option switches to the extended RTP format.  (The latter
		format is TW-TS-002, but it also constitutes valid RFC 5993 in
		the case of Tx semantics.)

Inspecting encoded speech file formats
======================================

In common with other GSM speech codecs supported by ThemWi GSM codec libraries
and utilities suite, utilities are provided that read GSM-HR encoded speech
recording files and display all codec frames contained therein, in terms of
compressed speech parameters and accompanying flags.  These utilities are as
follows:

Utility			Reads file format
-----------------------------------------
gsmhr-cod-parse		ETSI *.cod
gsmhr-dec-parse		ETSI *.dec
tw5b-dump		TW-TS-005 Annex B

gsmhr-cod-parse and gsmhr-dec-parse expect the local machine's native byte order
by default; -b and -l override options are supported.

ThemWi utilities for FR, EFR and AMR codecs display compressed speech parameters
in decimal form separated by spaces, with each subframe on its own line after
per-frame LPC parameters.  A different format has been adopted for GSM-HR:

* Individual speech parameters are displayed in hex, with a fixed number of
  digits corresponding to the size of each parameter in bits;

* Only two lines are used to display the actual speech parameters for each
  frame, with per-frame parameters on the first line and all subframe parameters
  on the second line;

* The set of LPC parameters and each of the 4 subframe parameter sets are
  displayed as comma-separated triplets; R0, Int and Mode parameters are
  displayed as singletons;

* Each just-described triplet or singleton is displayed as Name=value for better
  readability;

* Ignoring Name= annotations and treating commas and spaces as equivalent, all
  18 speech parameters are printed in their standard order as defined by ETSI.

File format conversion utilities
================================

The following format conversions are supported between different GSM-HR encoded
speech formats:

Utility		From format		To format
---------------------------------------------------------
gsmhr-cod2hex	ETSI *.cod		TW-TS-005 Annex B
gsmhr-dec2hex	ETSI *.dec		TW-TS-005 Annex B
gsmhr-hex2dec	TW-TS-005 Annex B	ETSI *.dec
gsmhr-hex2rpf	TW-TS-005 Annex B	Raw packed format
gsmhr-rpf2hex	Raw packed format	TW-TS-005 Annex B

The hexadecimal format of TW-TS-005 Annex B is treated as central; all provided
file format conversion utilities convert either from or to this central format.
Additional notes follow regarding each supported conversion.

Conversion from ETSI *.cod to TW-TS-005 Annex B
-----------------------------------------------

ETSI *.cod format naturally represents only Tx semantics, while TW-TS-005
Annex B supports both semantics.  Semantics don't change with file format
conversion, hence the output of gsmhr-cod2hex still has Tx semantics.  -b and -l
options are supported for *.cod input; hex output is written in the basic RTP
format by default or in the extended RTP format with -x option.

Conversion in the opposite direction is not supported, as there is no way to
resurrect VAD debug flag from a data source that lacks such.

Conversion between ETSI *.dec and TW-TS-005 Annex B
---------------------------------------------------

Bidirectional conversion is supported between these two formats, carrying Rx
semantics.  However, this conversion may be slightly lossy in each direction:

* gsmhr-hex2dec is nothing more than a command line utility around libgsmhr1
  function gsmhr_rtp_in_direct() described in HR-codec-library article.  The
  exact same preprocessing step is done by every libgsmhr1-based program
  whenever RTP input (be it real RTP or hex lines read from a TW-TS-005 Annex B
  file) needs to be fed to GSM-HR speech decoder or TFO transform, hence the
  output of gsmhr-hex2dec elucidates what always happens under the hood anyway.

  The extended RTP format with Rx semantics defined in TW-TS-002 allows RTP
  payloads carrying GSM-HR invalid SID to either include or emit payload bits.
  As explained in HR-codec-library article, gsmhr_decoder_twts002_in() function
  and gsmhr_rtp_in_direct() wrapper around it ignore these optional payload
  bits for invalid SID frames and always set all 18 speech parameters in the
  dec-style frame to 0.  This same behaviour becomes explicitly visible when
  using gsmhr-hex2dec - but if the input contains invalid SID frames with
  payload bits included, then the conversion is lossy in the strict sense.

* gsmhr-dec2hex is an ad hoc program, not a wrapper around a library function,
  as this operation is not needed in any standard workflow.  The conversion may
  be lossy in two cases:

  - All possible combinations that mean invalid SID (BFI=0 SID=1, BFI=1 SID=1,
    BFI=1 SID=2, plus variants of the same with BFI=2) collapse into the same
    representation in TW-TS-002, just like in 8 kbit/s TRAU frame format.

  - Whatever payload bits were given for these invalid SID frames in the 18
    speech parameter words are discarded, i.e., non-verbose invalid SID format
    is written in TW-TS-002 output.

Additional notes:

* gsmhr-dec2hex expects the local machine's native byte order by default, but
  supports -b and -l options.  OTOH, gsmhr-hex2dec writes *.dec output in the
  local machine's native byte order only.

* By default gsmhr-hex2dec refuses to process files that contain BFI-no-data
  frame gaps, as no such support exists in the standard GSM-HR speech decoder
  from ETSI or its *.dec input format.  (BFI=2 representation of such gaps is a
  Themyscira extension.)  -f option allows BFI=2 frames to be emitted.

Conversion between TW-TS-005 Annex B and raw packed format
----------------------------------------------------------

Bidirectional conversion is supported between these two formats, carrying Tx
semantics.  In gsmhr-hex2rpf conversion direction, the input hex file may be in
either basic or extended RTP format; if the latter is used, the only allowed
frame types are good speech (FT=0) and good SID (FT=2).  The conversion is
lossless as long as Tx semantics are maintained, more specifically, as long as
the extended RTP format hex input does not contain any frames that are marked
as FT=2, but are not perfect SID with all 79 bits of SID codeword set to 1.  If
such imperfect valid SID frames are present, they are converted to perfect SID.

In gsmhr-rpf2hex conversion direction, each raw packed (TS 101 318) frame is
written out in hex, either unchanged (basic RTP format) or with a prepended
RFC 5993 ToC octet (extended RTP format, enabled with -x option).  If -x option
is given, the classification of good speech vs good SID for the purpose of
emitted ToC octet is a check for perfect SID with all 79 bits of SID codeword
set to 1.

gsmhr-rpf2hex conversion is always lossless.

Speech encoder and decoder utilities
====================================

The present suite of tools provides 3 styles of speech encoder and decoder
utilities:

gsmhr-encode	Speech encoder, PCM speech input is in WAV format, compressed
		speech output is in TW-TS-005 Annex B format.

gsmhr-decode	Speech decoder, compressed speech input is in TW-TS-005 Annex B
		format, PCM speech output is in WAV format.

gsmhr-encode-r	Speech encoder, PCM speech input is in robe (raw big-endian)
		format, compressed speech output is in TW-TS-005 Annex B format.

gsmhr-decode-r	Speech decoder, compressed speech input is in TW-TS-005 Annex B
		format, PCM speech output is in robe format.

gsmhr-etsi-enc	Speech encoder, ETSI style, operating from *.inp to *.cod in
		ETSI test sequence format.

gsmhr-etsi-dec	Speech decoder, ETSI style, operating from *.dec to *.out in
		ETSI test sequence format.

gsmhr-etsi-enc and gsmhr-etsi-dec utilities both read their inputs and write
their outputs in the local machine's native byte order by default.  Both
utilities also accept -b and -l options that select the desired byte order
explicitly; these options affect both input and output for both encoder and
decoder utilities.  The other two styles of speech encoder and decoder utilities
have no byte order concerns.

TFO transform utilities
=======================

TFO-transform article explains the general concept of TFO transform;
HR-codec-Rx-logic article explains ThemWi implementation of this transform for
GSM-HR codec.  HR-codec-library article describes libgsmhr1 API functions for
this GSM-HR TFO transform; here are 2 command line utilities that exercise it:

gsmhr-tfo-xfrm		This TFO transform exerciser reads a stream of radio
			leg A Rx frames from a TW-TS-005 Annex B hex file and
			writes the "pristine" stream intended for radio leg B
			Tx into another TW-TS-005 Annex B hex file.  -d option
			enables DTXd (disabled by default); -x option switches
			the output RTP format from basic to extended.

gsmhr-tfo-xfrm-dc	This TFO transform utility reads radio leg A Rx input
			in ETSI *.dec format and emits radio leg B Tx output in
			ETSI *.cod format, thus acting as an inverse of
			GSM 06.06 REID utility that was originally used to
			generate test sequence *.dec files.  This variant
			exercises libgsmhr1 TFO transform function in its most
			native form.

gsmhr-tfo-xfrm-dc reads its *.dec input and writes its *.cod output in the
local machine's native byte order by default.  -b and -l options are supported,
selecting either big-endian or little-endian byte order explicitly; these
options affect both *.dec input and *.cod output.  OTOH, the more FR-like
gsmhr-tfo-xfrm utility has no byte order concerns.

Hand-crafting *.cod and *.dec files
===================================

The present suite of GSM-HR codec utilities includes specialized tools for
hand-crafting *.cod and *.dec files: gsmhr-cod-craft and gsmhr-dec-craft,
respectively.  Each utility reads an ad hoc line-based ASCII source file and
emits its respective binary format.  The ad hoc source language for these two
special-purpose tools is the same except for parts that set frame metadata
flags, which are different between *.cod and *.dec given their opposite
semantics.

Because of their highly specialized nature, these two utilities are not
documented further - please read the source code for further understanding.
Work scope limits explained in HR-codec-limits article apply here.