view doc/Audio-mode-config @ 847:6e137995c9c8

doc/Audio-mode-config: elaborate on AEC and FIR blocks
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 10 Aug 2021 01:50:41 +0000
parents 6a0fcbca8ac7
children 6c306705f503
line wrap: on
line source

There exist a number of tunable settings in the Iota ABB (the chip that performs
A-to-D and D-to-A conversion for the voice path) and in the Calypso DSP which
in TI's firmware architecture are meant to be configured through the audio mode
facility of the RiViera Audio Service.  The ABB settings grouped under the audio
mode are as follows:

* The selection of which analog interface pins the downlink audio should be
  sent to: EARN&EARP (earpiece), AUXON&AUXOP (auxiliary) or HSO (headset).

* The selection of which analog interface pins the uplink audio should be taken
  from: MICIN&MICIP (main microphone), AUXI (auxiliary input) or HSMICP
  (headset microphone).

* The selection of AUXI input levels when this analog input is in use for the
  voice uplink.

* Analog gains for the uplink, the downlink and the analog sidetone from the
  uplink input to the downlink output.

* Selection of a special filter bypass mode for the voice downlink.

* The selection of MICBIAS (or HSMICBIAS) voltage between 2.0 V and 2.5 V.

The DSP voice path settings grouped under the audio mode are as follows:

* The selection of the digital voice path as being between GSM and the ABB (the
  default for analog voice interfaces), between GSM and MCSI (the external
  digital voice interface) or between MCSI and the ABB (non-GSM operation).

* FIR filter coefficients for the voice uplink and for the voice downlink.

* Enabling/disabling and configuration of the Acoustic Echo Cancellation (AEC)
  mechanism.

The firmware paradigm for working with all of the above settings is as follows:

* In a lab environment, each of the listed settings can be independently tweaked
  and read back through ETM packets over the RVTMUX debug serial interface; the
  corresponding fc-tmsh commands (matching TI's original Windows-based TMSH)
  are auw for writing individual audio parameters and aur for reading them back.

* In end-use operation, TI's intent as realized in the firmware design is that
  all of the listed audio settings will only be changed as a group, loaded from
  audio mode configuration files in FFS.

Each audio mode configuration needs to be assigned a name between 1 and 9
characters long, and for each named configuration there are two files in FFS:

/aud/modename.cfg is the main configuration file
/aud/modename.vol is the corresponding volume setting file

This paradigm is a good fit for "dumbphone" handsets in which there usually
will be several different voice audio configurations for classic handheld
operation, for the hands-free loudspeaker mode, for operation with a wired
headset, and if the phone uses its hands-free loudspeaker plus the Calypso DSP
to play ringtones (as opposed to using a buzzer on BU/PWT or a ringtone player
chip that drives the speaker bypassing the voice path), there will also need to
be an output-only audio configuration for ringing.

How do the audio mode config files under /aud come into being?  It appears that
TI's original intent was that a configuration would be manually constructed on
a test device via TMSH auw commands, saved in the FFS of that test device with
the aus command, then read out of that test device FFS in binary form and
reuploaded as an opaque blob to all devices on the production line.  One can do
the same procedure with our fc-tmsh and fc-fsio which fully replicate the
relevant functionality of TI's original TMSH (to the best of our knowledge),
but in FreeCalypso we have an alternate way which fits better with our UNIX
philosophy: we have created our own ASCII text format for representing all of
the content in TI's /aud/*.cfg binary files and tiaud-* utilities for compiling
TI's binary cfg files from our ASCII source format, disassembling a *.cfg file
read out of FFS into the same ASCII format, and creating the required *.vol
companion files, which are also binary.

A note about volume settings: the Iota ABB has two variable gain controls in
the voice downlink path: the main "volume" gain in rather coarse 6 dB steps
(the choices being 0 dB, -6 dB, -12 dB, -18 dB, -24 dB and mute) and a finer
"calibration" gain in 1 dB steps between -6 and +6 dB.  It appears that TI's
intent was that only the coarse volume control in 6 dB steps is to be visible
to the user, with just 5 possible non-mute volume levels, and that the finer
gain control be set at the factory in the audio mode config files for each mode
as some form of calibration.  Pirelli DP-L10 significantly deviates from this
model by providing 10 non-mute volume levels to the user with 2 dB or 3 dB steps
between them by changing both VOLCTL and VDLPG fields in the VBDCTRL register,
but at the present time we have no plans to make a similar drastic change in
FreeCalypso.

Another noteworthy feature of the audio mode system with respect to volume
control is that there is a separate *.vol file that stores the current volume
setting for each mode.  In a "dumbphone" handset firmware built according to
TI's paradigm, the /aud/*.cfg files will be written once on the factory
production line and only read afterward, but whenever the user turns the volume
up or down in the UI, the *.vol file _corresponding to the current mode_ will
be updated by the running fw.  Thus the fw would maintain a separate notion of
the current volume for ringing, for the earpiece speaker, for the hands-free
loudspeaker and for the wired headset, something which Pirelli's fw very
notoriously fails to do.

Old vs. new AEC
===============

One of the settings in the audio mode config structure underwent an evolutionary
change within the span of history that is relevant to FreeCalypso - this setting
is the configuration for AEC, the Acoustic Echo Cancellation functional block
of the Calypso DSP.  As TI's GSM DSPs evolved (before, during and after the
Calypso era), their AEC implementation evolved along with the rest, and
different evolutionary versions of AEC require different configuration and
tuning parameters.  When the audio mode facility was first implemented, the AEC
block in TI's GSM DSPs of that time was controlled with a single 16-bit control
word; the people in the SSA group who implemented RiViera Audio Service then
decided to split different bits from this one DSP control word into 5 different
parameter words, and the result was the "old" 5-word AEC config.

But the version of AEC implemented in the DSP ROM in the Calypso silicon version
we work with is slightly newer; this version corresponds to what TI's L1 code
calls L1_NEW_AEC.  However, the waters then got muddied: for reasons which we
(FreeCalypso team) cannot understand (perhaps miscommunication between different
groups at TI), TI's TCS211 reference firmware shipped with L1_NEW_AEC disabled
(C preprocessor symbol set to 0 instead of 1), even though the underlying DSP
AEC block (combination of ROM and official patches) is the "new" kind and not
the "old" one.  There are two fallouts from this software misconfiguration on
TI's part:

1) If one takes stock TCS211 from TI or any derivative version in which this
   aspect is unchanged (all mokoN firmwares, and all FC firmwares up to
   Magnetite) and tries to enable AEC, the result will be a poor AEC
   configuration: the old echo level and long vs short settings do nothing on
   the new DSP, whereas the new tunable parameters will remain at their defaults
   with no way to tweak them.  I (Mother Mychaela) can only guess that this
   situation is what Openmoko must have run into when they tried to get AEC
   working.

2) When someone downstream of TI figures out that L1_NEW_AEC needs to be changed
   from 0 to 1 and actually makes that change, like we did in our Tourmaline fw,
   the format and size of the audio mode binary structure change, and all old
   audio mode config files become invalid.

Our FreeCalypso work is affected by point 2 above: we started working with audio
mode config files in 2017, using the old AEC configuration, and only made the
switch to L1_NEW_AEC in 2021.  We now have two kinds of audio mode config binary
files: the old kind that are 164 bytes long, and the new kind that are 176 bytes
long.  Our Tourmaline firmware has L1_NEW_AEC enabled, while Magnetite (our
legacy backward compatiblity fw) has it disabled.

To prevent loading of garbage into AEC config when an audio mode file of the
wrong kind is loaded, we have implemented the following workaround in both
Tourmaline and Magnetite: if the loaded mode config file has the wrong length,
the AEC config is set to the default disabled state instead of whatever is in
the mode file - loading an AEC config of the wrong format is not possible.

Default audio configuration
===========================

The default audio config set in the Iota ABB registers and in the DSP when no
named audio mode config has been loaded with the audio_mode_load() API call
(accessible via AT@AUL or via fc-tmsh aul command) is as follows, in the syntax
which our tiaud-compile utility accepts as input and which our tiaud-decomp
utility emits as output:

voice-path 0
mic default {
	gain 3
	output-bias 0
	fir  0 0x4000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
	fir  8 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
	fir 16 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
	fir 24 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
}
speaker ear+aux {
	gain 0
	audio-filter 0
	fir  0 0x4000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
	fir  8 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
	fir 16 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
	fir 24 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
}
sidetone -5
aec 0 0 0 0 0

The above version is the one produced by Magnetite and earlier firmwares without
L1_NEW_AEC; in the new Tourmaline version the last line changes to:

aec-new 0 0 0x1 0x7FFF 0x1FFF 0x4000 0x32 0x1000 0x1000 0 0 0

The meaning is as follows:

* voice-path is the DSP digital voice path setting, 0 means the standard
  configuration with the voice channel going between GSM and the local analog
  voice hardware attached to the ABB.

* The default microphone input is used for the voice uplink (MICIN&MICIP pins),
  whereas the voice downlink is presented on both EARN&EARP and AUXON&AUXOP
  pins, i.e., both "ear" and "aux" VDL amplifiers are enabled.

* The microphone gain is 3 dB, the fine gain adjustment in the voice downlink
  path is 0 dB, and the sidetone gain is -5 dB.

* output-bias 0 under mic means that the MICBIAS voltage is set to 2.0 V.

* audio-filter 0 under speaker means that the VFBYP bit in the VBCTRL1 register
  is NOT set, i.e., the normal configuration.

* DSP FIR filters do nothing, as coefficient 0 is set to unity and all other
  coefficients are set to zero.

* The AEC mechanism in the DSP is disabled, although the format of the bits that
  say so is different between old and new AEC versions.  In the new version
  there are a number of tunable settings that only kick in when AEC is enabled;
  when AEC is disabled by default, these tunable knobs still have sensible
  defaults that aren't all zeros.

Creating your own audio mode configurations
===========================================

The input to our tiaud-compile utility can contain every setting shown in the
default case above, or any desired subset thereof.  For any settings not given
in the input, the defaults from the above will be used, except that
tiaud-compile's current default for the speaker mode is just ear rather than
ear+aux.  (It is a default which you should NOT depend on; set it explicitly if
it matters!)  A few notes:

* For all settings given as numbers, the number given in the ASCII input is the
  number that goes into TI's binary structure, without any transformation, even
  in those cases where the result is counter-intuitive, such as "audio-filter 0"
  meaning that the filter is *enabled*.

* The 3 possible mode keywords for the mic mode are default, aux and headset,
  corresponding to MICIN&MICIP, AUXI and HSMICP analog inputs, respectively.

* The 5 possible mode keywords for the speaker mode are ear, aux, headset,
  buzzer and ear+aux.  The buzzer speaker mode exists only on TI's Nausica ABB
  predating Iota, i.e., it won't work on any of the Calypso+Iota+Rita devices
  built or supported by FreeCalypso, but our tiaud-compile and tiaud-decomp
  utilities support it because it is nominally supported by TI's RiViera Audio
  Service and its binary data structure for audio mode configuration.

* When mic is set to aux, an additional mic setting called extra-gain becomes
  available.  If extra-gain is set to 0, the AUXI gain will be set to 28.2 dB,
  if extra-gain is set to 1, the AUXI gain will be set to 4.6 dB; all other
  values will be considered invalid by the firmware.

* Each of the two FIR filters in the DSP (one for uplink, one for downlink) has
  a total of 31 coefficients, numbered 0 through 30, inclusive.  In the ASCII
  input to tiaud-compile you can put each coefficient on its own fir line, put
  all 31 coefficients on the same line, or group them in any other way you like.
  The grouping used in the tiaud-decomp output has been chosen for line length
  reasons.

aec vs aec-new in tiaud-compile input
=====================================

tiaud-compile accepts both aec (old) and aec-new settings; aec must be followed
by 5 numbers, aec-new must be followed by 12 numbers.  Each number is a 16-bit
value, and they go into the binary structure without further interpretation by
tiaud-compile - instead the firmware is the entity that gives them meaning.
Numbers without 0x prefix are interpreted as decimal.

tiaud-compile will generate one type or the other of the binary output file,
following these rules:

* If an aec setting is given, a 164 byte file will be produced, with the 5 AEC
  words being the given ones.

* If an aec-new setting is given, a 176 byte file will be produced, with the 12
  AEC words being the given ones.

* If neither setting is given, a 164 byte file will be produced, with the 5 AEC
  words of the old type being all zeros.  Thanks to the modified audio mode
  loading code in our firmwares, these 164 byte mode files can still be used
  with current Tourmaline fw, with AEC set to its default disabled state.

New AEC parameter words
=======================

The 12 words that configure AEC of the L1_NEW_AEC flavor (appearing on the
aec-new line in tiaud-compile input or in an fc-tmsh auw 12 command) map as
follows:

Word 0: aec_enable

This word must be set to 0 to disable AEC or 2 to enable it.  This word is
translated to a single DSP control bit by the Audio Service layer, thus no
other values must be written into it.

Word 1: continuous_filtering

This word is written directly into the DSP, and we have no documentation for it
beyond "enable (1) or disable (0) continuous mode filtering".

Word 2: granularity_attenuation

This word is written directly into the DSP, and we have no documentation for it
beyond "granularity of the smoothed attenuation".

Word 3: smoothing_coefficient

This word is written directly into the DSP, and we have no documentation for it
beyond "smoothing coefficient".

Word 4: max_echo_suppression_level

This word is written directly into the DSP; it is described as "maximum
attenuation level", and the following constants are defined for it:

  #define AUDIO_MAX_ECHO_0dB              (0x7FFF)
  #define AUDIO_MAX_ECHO_2dB              (0x65AA)
  #define AUDIO_MAX_ECHO_3dB              (0x59AD)
  #define AUDIO_MAX_ECHO_6dB              (0x4000)
  #define AUDIO_MAX_ECHO_12dB             (0x1FFF)
  #define AUDIO_MAX_ECHO_18dB             (0x0FFF)
  #define AUDIO_MAX_ECHO_24dB             (0x07FF)

Word 5: vad_factor

This word is written directly into the DSP, and we have no documentation for it
beyond "VAD factor relative to the current estimated energy".  VAD must stand
for "voice activity detector", but our knowledge ends here.

Word 6: absolute_threshold

This word is written directly into the DSP, and we have no documentation for it
beyond "VAD absolute offset relative to the current estimated energy".

Word 7: factor_asd_filtering

This word is written directly into the DSP, and we have no documentation for it
beyond "modifying factor of d_far_end_noise for filtering decision".

Word 8: factor_asd_muting

This word is written directly into the DSP, and we have no documentation for it
beyond "modifying factor of d_far_end_noise for muting decision".

Word 9: aec_visibility

This word must be set to 0 for normal operation or 0x200 for "AEC visibility"
debug mode.  This word is translated to a single L1 control bit by the Audio
Service layer, thus no other values must be written into it.

Word 10: noise_suppression_enable

This word must be set to 0 to disable SPENH algorithm or 4 to enable it.  This
word is translated to a single DSP control bit by the Audio Service layer, thus
no other values must be written into it.  We don't know what this "speech
enhancement" algorithm does, and whether or not it is the same as "noise
suppression".

Word 11: noise_suppression_level

This config word is mapped to just two bits in the actual DSP control word by
the Audio Service layer, thus there are only 4 possible valid values here:

  #define AUDIO_NOISE_NO_LIMITATION       (0x0000)
  #define AUDIO_NOISE_6dB                 (0x0020)
  #define AUDIO_NOISE_12dB                (0x0040)
  #define AUDIO_NOISE_18dB                (0x0060)

Some known-good AEC configurations
==================================

The terse descriptions of parameter words given above unfortunately constitute
the total extent of our knowledge of the AEC block in our dear Calypso DSP and
its tuning parameters - we don't know anything more.  However, we do have 3
example configurations to look at: we have the default values of the tuning
parameters that appear to be initialized by the DSP itself on boot, and we have
two AEC-enabled configurations set by Pirelli DP-L10 firmware: one for handheld
and wired headset modes, the other for the hands-free loudspeaker mode.  Here
are the 12 parameter words in the 3 available configurations:

Parameter word			Default		Pirelli		Pirelli
				(AEC disabled)	handheld	hands-free
--------------------------------------------------------------------------
aec_enable			0 (off)		2 (on)		2 (on)
continuous_filtering		0 (off)		1 (on)		1 (on)
granularity_attenuation		0x0001		0x0014		0x0014
smoothing_coefficient		0x7FFF		0x0CCC		0x0CCC
max_echo_suppression_level	0x1FFF (12 dB)	0x59AD (3 dB)	0x0FFF (18 dB)
vad_factor			0x4000		0x4000		0x4000
absolute_threshold		0x0032		0x0032		0x0032
factor_asd_filtering		0x1000		0x1000		0x1000
factor_asd_muting		0x1000		0x1000		0x1000
aec_visibility			0 (off)		0 (off)		0 (off)
noise_suppression_enable	0 (off)		4 (on)		4 (on)
noise_suppression_level		0 (none)	0 (none)	0x0060 (18 dB)

The following observations can be made:

* The 4 parameters vad_factor, absolute_threshold, factor_asd_filtering and
  factor_asd_muting remain unchanged between TI's DSP default and Pirelli's
  production configs.  On the basis of this observation, I (Mother Mychaela)
  get the feeling that these four should be left alone.

* Besides the obvious steps of enabling AEC and SPENH, Pirelli did change
  continuous_filtering (from off to on), granularity_attenuation and
  smoothing_coefficient.  Unfortunately, unless we recover the source code for
  our Calypso DSP ROM or some documents explaining this version of AEC in
  detail, we have no way of understanding what these parameters do, let alone
  evaluating the merits of Pirelli's change.

* max_echo_suppression_level and noise_suppression_level seem to be the two
  parameters most amenable to tuning.

* It is interesting to note that Pirelli's fw enables AEC not only in the
  loudspeaker mode, but also in the more basic handheld and wired headset
  modes.  The two AEC configs differ only in max_echo_suppression_level and
  noise_suppression_level parameters, with the loudspeaker mode AEC config
  being more aggressive.

Prior to seeing what Pirelli's fw does, my (Mychaela's) own thinking was that
AEC is only needed in loudspeaker configurations, not handheld or headset.
After seeing Pirelli's AEC configs, I reason that enabling a less aggressive
AEC configuration in those less echo-prone modes probably doesn't hurt - thus
until and unless we recover more documentation or other knowledge, the plan for
our own FreeCalypso Libre Dumbphone handset is to do what Pirelli does: use a
less aggressive AEC config in handheld and headset modes, and a more aggressive
one in the hands-free loudspeaker mode.

On our current FCDEV3B setup with a SparkFun COM-09151 loudspeaker and a CUI
CMC-9745-130T microphone, applying Pirelli's loudspeaker-mode AEC config
produces echo cancellation that sounds acceptable to our subjective human
evaluator on the far end of test calls.

FIR filter details
==================

Calypso DSP has two FIR filters in the voice paths, one in the uplink path and
one in the downlink path.  Aside from their placement, the two FIR filters are
identical.  Each FIR block has 31 taps (making a 30th order filter), and each of
the 31 coefficients is a 16-bit fixed-point number.  The fixed point format is
F2.14 aka Q14: to get the real coefficient from the physical 16 bits, treat the
16-bit datum as a two's complement signed integer, then divide by 16384.
Examples: 0x4000 means 1, 0x2000 means 0.5, 0xC000 means -1, 0xE000 means -0.5.

In principle you can set all 31 coefficients to whatever you like, but in
practice only two possible configurations are used:

* When the FIR filter is disabled (identity transform), coefficient 0 is set to
  0x4000 (unity) and all other coefficients are set to 0.  In this configuration
  the FIR block does not introduce any extra delay: all delayed samples are
  multiplied by 0 and thus produce no effect.

* When some non-identity frequency response transformation is desired, a linear
  phase filter is set up: coefficient #15 becomes the main tap (significantly
  greater in absolute value than all others) and all other coefficients mirror
  around it symmetrically: #0 equals #30, #1 equals #29 and so forth, until #14
  equals #16.  This filter adds 1.875 ms of delay (15 sample times) to the voice
  path in which it is active, and an equal amount of "pre-ringing".

The presumed purpose of these two FIR filters (uplink and downlink) is to
flatten the frequency response of the speaker and microphone transducers, or
perhaps even more ambitiously, the frequency response of the modeled acoustic
environment.  However, actually coming up with a good set of FIR filter
coefficients given a desired frequency response is a hard problem, one where
forward engineering is much more difficult than reverse.

When it comes to reverse engineering of existing Calypso DSP FIR filters, a
total of 7 specimen have been captured out in the wild so far: one downlink FIR
filter from Openmoko's non-functional para0.cfg (no way of knowing which speaker
it was once designed for), and a set of 6 filters extracted from Pirelli DP-L10,
3 uplink and 3 downlink, corresponding to the 3 audio routing modes supported on
this phone model (handheld, hands-free and wired headset).  All 7 are linear
phase filters as described above.  Analyzing the frequency response of a given
already existing FIR filter is easy: just use the fir2freq program in our
freecalypso-reveng Hg repository.  OTOH, coming up with a new set of FIR filter
coefficients for some desired frequency response (e.g., for a new phone handset
being designed) is a much harder problem, one which we will probably have to
outsource to a hired DSP/FIR expert.

Calypso FIR support in FC host tools
------------------------------------

Uplink and downlink FIR filter coefficients can be included in the input to
tiaud-compile.  Each coefficient is given as the actual 16-bit word going into
the DSP (Q14 scaling included), and can be specified either in hex or as a
signed decimal integer.

We also have a dedicated ASCII file format for a FIR filter coefficient set by
itself, like this example:

fir-coeff-table

0x0178 0x0AB5 0xF43D 0xFED5 0xFCA7 0x04D8 0x00B8 0x0371
0x032F 0x0007 0x151C 0xF24C 0x19A6 0xE918 0xF7CD 0x7D0C
0xF7CD 0xE918 0x19A6 0xF24C 0x151C 0x0007 0x032F 0x0371
0x00B8 0x04D8 0xFCA7 0xFED5 0xF43D 0x0AB5 0x0178

(This example is the FIR filter extracted from Openmoko's non-functional
para0.cfg.)  This by-itself FIR filter coeff set format is accepted as input to
the auw-fir command in fc-tmsh (allowing experimental FIR filters to be uploaded
to a running Calypso device for testing) and to our fir2freq analysis program.
We will probably use the same format if and when we embark on a venture to
design our own FIR filters for our own handset hardware.

fc-tmsync aur and aur-all addition
==================================

New addition as of fc-host-tools-r16: our aur command which natively resides in
fc-tmsh (audio mode full access read operation via ETM) has also been
implemented in fc-tmsync for scripted usage.  Furthermore, we also implemented
an aur-all command that issues the same sequence of aur operations as the
firmware's built-in audio_mode_save() and emits the output on stdout in the same
format as tiaud-decomp.  The end effect is that fc-tmsync aur-all is a much
shorter and more direct way of obtaining exactly the same result as would
previously be obtained by saving the current audio mode config with aus, reading
out the resulting binary file with fc-fsio and decoding it with tiaud-decomp.

The implementation of aur-all and the more elementary aur 12 command in
fc-tmsync works only with firmware versions that have L1_NEW_AEC enabled -
therefore, these commands work with FC Tourmaline but not Magnetite.

Furthermore, our aur command in both fc-tmsh and fc-tmsync and the new fc-tmsync
aur-all command also work against Pirelli's firmware - this alien fw implements
ETM aur operation exactly the same as standard TCS211, and it has L1_NEW_AEC
enabled, such that aur 12 returns the 24 byte long L1_NEW_AEC version of
T_AUDIO_AEC_CFG structure.  The combination of this functionality in Pirelli's
fw and our fc-tmsync addition makes it possible to read out Pirelli's highly
tuned audio configurations in a very convenient manner, much more convenient
than reading ABB registers with abbr and reading DSP API words with r16.