Firmware debug: handover to the community

Spacefalcon the Outlaw falcon at ivan.Harhan.ORG
Sun Jun 7 21:53:38 CEST 2015


Hello FC community,

As I wrote earlier, I would like to switch my focus to finishing the
design of the Calypso GSM development board I proposed earlier - an
important step toward hardware independence from Openmoko - and while
I work on this hw design, let DS and other community members try their
hands at the task of debugging our not-quite-working fw.  And don't
worry, if no one else has any success with debugging the fw, I *will*
get back to it myself after I take the hw subproject detour - but if
someone does manage to make some progress with the fw while I work on
the hw, we will have achieved at least a little bit of parallelism.

I have now produced all of the tools that should be necessary for
someone like DS to take a stab at firmware bring-up debugging; here is
a tour of what they are:

* A version of TCS211/leo2moko reference fw with debug extensions:

https://bitbucket.org/falconian/leo2moko-debug

This version of leo2moko includes an implementation of my AT-over-RVTMUX
mechanism, thus one can send AT commands to it and receive responses
(as well as unsolicited notifications of incoming calls etc) over the
RVTMUX interface presented on the Freerunner's headset jack.  Advantages:

* No need to fuss with getting a decent terminal program running on
  the FR's AP;

* When going over RVTMUX, all AT commands sent to the fw and all
  responses and unsolicited notifications appear in the rvinterf
  session log, correlated with the traces from the fw activity they
  invoke.

Usage:

1. If running QtMoko, stop the regular sw that manipulates the modem:

/etc/init.d/qtmoko-neo stop

2. Manipulate the modem power and headset jack switches as necessary:

cd /sys/bus/platform/devices/gta02-pm-gsm.0
echo 0 > power_on
echo 1 > download

The modem should now be off and the headset jack should be connected
to the IrDA UART signals.

3. Compile leo2moko-debug fw and flash it into the modem with
fc-loadtool.  Run fc-loadtool -h gta02 /dev/ttyUSBx on your host
machine, and do 'echo 1 > power_on' on the GTA02 AP to power the modem
up for the flashing operation.

4. After flashing is complete, exit fc-loadtool, power the modem off
(echo 0 > power_on), then run rvinterf /dev/ttyUSBx on your host
machine and power the modem back on: echo 1 > power_on

5. The fw should now be running, rvinterf should be printing and
optionally logging all debug trace output, and you can now run
fc-shell to send AT commands to it over RVTMUX.

6. Run fc-shell; it will connect to the already running rvinterf
process via a local UNIX domain socket on your host machine.  fc-shell
will present a '>' prompt; you can type AT commands there.

Here is a session log of this leo2moko-debug fw successfully connecting
to T-Mobile USA and dialing 13034997111 (a public time-by-phone service):

https://www.freecalypso.org/traces/20150606-leo2moko-at-rvt.txt

Grep for 'ATI:' to see what commands I sent to it and all returned
responses; the surrounding traces show L1 activity etc.

Besides the addition of my AT-over-RVTMUX hack, the other change in
leo2moko-debug relative to standard leo2moko is that I patched the GPF
binary lib that contains pf_TaskEntry() to not disable system traces.
You can see my mechanism for applying these binary patches (so you can
easily make other similar patches on your own) here:

https://bitbucket.org/falconian/tcs211-patches

Verbose traces from various protocol stack entities still need
explicit enabling (having *everything* enabled would overwhelm the
trace mechanism, so one needs to be judicious with what needs to be
enabled in a given debug session), but one can do so easily with
fc-shell without any further patching.  For example, to enable all
traces from the ACI/MMI entity, type the following command at the '>'
fc-shell prompt:

>sp MMI TRACECLASS FFFFFFFF

The names of protocol stack entities like MMI and "system primitive"
keywords like TRACECLASS need to be in all uppercase as that is what
GPF on the target expects.

OK, so that is the working but Windows-built and blob-laden TCS211
reference version.  Now let us look and see how our own gcc-built fw
fares currently.  When debugging on the Freerunner (i.e., until we get
our own development board built), our gsm-fw needs to be flashed with
fc-loadtool the same way as TCS211/leo2moko, except for a different
variant of the flash program command: it is flash program-m0 with
TI-built stuff but flash program-bin with gcc-built fw, owing to the
different binary output formats used by the two very different
compilation toolchains.

So you program finlink/flashImage.bin into flash after compiling it,
and then power the modem up with rvinterf running just like we did
with leo2moko-debug.  Connect with fc-shell and issue AT commands the
same way.  Here is a log showing how it currently fails miserably:

https://www.freecalypso.org/traces/20150606-fc-failed-call.txt

As you can see by grepping both logs for 'ATI:', I was commanding both
firmwares to do the same sequence of operations: enable radio
functionality (+cfun=1), connect to the default operator (+cops) and
then dial a test call.  Analyzing the failing log above, we see the
following:

* The sequence leading up to the initial establishment of network
  registration appears to be all good: the fw runs a power measurement
  over all frequency channels in all supported bands, selects the
  strongest ones it found, attempts to sync to them, and that sync
  succeeds on those that really are local GSM base stations.  The fw
  picks the strongest to be its serving cell, executes all appropriate
  L1 procedures to make it so, and sends a RACH to it.  The network
  responds with an immediate assignment, the MS goes into dedicated
  mode and performs whatever exchange with the network is done upon
  registration (location update, I guess - I'm not a real GSM expert,
  just a firmware tinkerer :).  The AT command channel now gives an OK
  response to the AT+COPS command and +CREG and %CSQ indications
  showing the registered state and RSSI.  So far, so good.

* Once the fw quiesces in the registered state, trouble begins: the
  ATI output indicates that the registration state bounces, i.e., the
  modem loses network registration and then reacquires it.  Looking in
  the log, we see that deep sleep is the triggering condition: as soon
  as the fw allows the hw to go into deep sleep, trouble begins.  Once
  the modem wakes up from deep sleep, all subsequent L1 traces
  corresponding to burst Rx (NP_I for paging channel etc) show some
  error_flag being set to 1 instead of 0.  (This error flag is 0 with
  our fw prior to going into deep sleep, and always 0 with the working
  leo2moko fw even though it does go into deep sleep.)  Once this
  error_flag has been set, it is "game over": no further Rx succeeds
  until the fw declares the network connection to have been lost and
  restarts the network search process cold.

* When I tried to dial an outgoing call, the fw seems to go berzerk.
  One can see traces related to failure to allocate a GPF partition,
  freeing a NULL pointer and other major badness.

The next thing I tried was taking deep sleep out of the equation, at
least for the time being, to see what else is broken before we dive in
deep to see what's wrong with deep sleep.  The command to disable deep
sleep and use big sleep instead (26 MHz VCXO stays running) is
AT%SLEEP=2, although I first had to fix a bug in the TCS3.2-based
version of ACI we are using that made this command inaccessible.  Here
is the log with deep sleep disabled:

https://www.freecalypso.org/traces/20150607-fc-bigsleep.txt

Analysis:

* With big sleep taking the place of deep sleep, the burst Rx errors
  no longer occur, and network registration remains solid until I try
  to dial a call.

* When I do dial that outgoing call, traces about memory allocation
  and freeing errors are seen once again, and everything seems to go
  haywire after that.  The call fails like before.

Thus we know that deep sleep is not the only problem we've got on our
hands, i.e., our fw is broken even without it.  This is the point at
which I would like to hand this debug chase over to other community
members, and let DS and others get a taste for what GSM fw debugging
is like. :)

Happing hacking,
SF


More information about the Community mailing list