Firmware bring-up status

Spacefalcon the Outlaw falcon at ivan.Harhan.ORG
Sat May 16 04:24:06 CEST 2015


Hello project followers,

I got an update: I found and fixed the issue that caused our experimental
fw to crash on boot when the modem is still "hot" from a previous power
cycle, i.e., has only been off for a few seconds rather than minutes.

The issue was a pure firmware bug, nothing to blame on the hardware.
The GSM protocol stack implementation we are using is built atop
Condat's framework called GPF, which is a layer above or a wrapper
around Nucleus.  GPF manages several dynamic memory allocation pools,
both the fully dymanic kind (like classic malloc) and the "partition"
kind that cannot get fragmented because all allocation and freeing is
done in terms of preset partitions.  Each memory pool (either DM or PM)
needs a Nucleus control block, and GPF's OS layer (the one which I had
to reconstruct from disassembly of binary objects around this time a
year ago) is responsible for calling NU_Create_Memory_Pool() or
NU_Create_Partition_Pool() to initialize these control blocks.

The trouble happened because these control blocks are themselves
allocated dynamically - the very first dynamic memory pool's control
block is statically allocated in the bss segment, and all subsequently
needed ones are allocated from that first pool.  And it just so happens
that Nucleus includes an "error check" function whereby if a control
block passed to a NU_Create_<whatever>() function already has its
signature word filled in, the "create" function fails on the assumption
that someone tried to re-initialize an already active object.

The power cycle dependency happened because of data retention in the
external SRAM - the one inside the weird Samsung K5A3281 flash+RAM
chip used in Openmoko's modem.  If the modem hasn't been powered off
long enough, this SRAM will retain (some of) its content from the
previous power cycle - and if enough bits have survived such that the
memory where the control blocks in question happen to be allocated
retains the "magic" signature word values (32 bits in each control
block), the new boot cycle crashes spectacularly as the firmware fails
to properly create its GPF memory management structures.

The issue never occurred with TI's original TCS211 firmware because
they put the "raw" memory for the pools in the bss segment which is
explicitly zeroed out by the firmware's early init code on every boot.
But in our FreeCalypso fw I moved those "raw" memory chunks into
separate int.ram and ext.ram sections which are not part of the
zeroed-out bss segment: I figured why waste CPU cycles zeroing out
memory whose initial content is not supposed to be depended on...

My current fix: I added a bzero() call to zero out just these specific
control blocks right before passing them to NU_Create_<xxx>_Pool()
functions.

With this fix, the "hot modem" boot crash no longer occurs.  An
effective test is to send a tgtreset command to a running fw via
fc-tmsh: it reboots the Calypso via its watchdog timer without cycling
the power to anything, and prior to the fix I just made, it reliably
caused the boot crash to occur.

I haven't done any further troubleshooting on the other issue yet -
AT+COPS failing to connect to the GSM network - I'm going to take
another look at it now.

Happy hacking,
Mychaela aka Space Falcon


More information about the Community mailing list