FCDEV3B hardware bug: sleep mode self-reboot

Mychaela Falconia mychaela.falconia at gmail.com
Sat Jul 22 00:23:24 UTC 2017


Hello FreeCalypso community,

When I originally set out to build our own FCDEV3B modem board, one of
my goals has been to produce a proper replacement for the no-longer-
made, surplus-exhausted and not very convenient Neo FreeRunner, i.e.,
produce a modem that is strictly no worse than Openmoko's on every
point.  This goal has now been achieved on all points except two:

* Our current FCDEV3B modems have a problem with sleep modes as
  described in detail below.  This problem has never been seen on
  Openmoko hardware or on any of our other pre-existing hw targets
  (Mot C1xx or Pirelli), thus it is a regression defect in our FCDEV3B
  hardware.

* Testing voice calls and Calypso audio features is very inconvenient
  but possible on the FreeRunner.  On our own FCDEV3B hardware I still
  haven't got around to adding the loudspeaker and microphone, hence
  those circuits on our modem boards have not been exercised at all
  yet.

Note that RF calibration is no longer in the list of our own hardware's
regressions relative to Openmoko: even though my CMU200 still has not
been properly repaired and I still have to use the Aux Tx generator
(B96 variant) instead of the broken main Tx, I have completed the
development of fully automated calibration software that produces
calibration results which I believe to be no worse than Openmoko's.  I
can now take a freshly assembled board, connect it to the CMU200 and
run a single shell script (fc-rfcal-tri900) - this script will run all
7 required calibration parts (VCXO, 3 Rx bands and 3 Tx bands), saving
all produced calibration results in the flash file system.  The accuracy
of this calibration is probably less than ideal because I don't have a
fancy metrology-grade cabling setup with precisely measured insertion
loss and because the calibration status of my CMU200 itself is unknown,
but it should definitely be within the GSM 05.05 spec tolerances.

Now onto the sleep mode problem, which is definitely a regression in
our current hw relative to Openmoko and all other known pre-existing
Calypso hw targets.  Before I get into further details, I need to
emphasize that the sleep mode problem on our current FCDEV3B boards is
quite different from the infamous deep sleep problem (bug #1024) that
used to plague Openmoko devices.  Openmoko's bug #1024 affected only
deep sleep, not other sleep modes, and its manifestation was that
while camped on a cell in idle mode, the modem would wake up from deep
sleep in a messed-up state and not be able to receive the paging
channel correctly any more until it gave up and reestablished a new
network sync, causing the network registration status to bounce.

Our sleep mode problem is quite different: the manifestation of our
sleep mode hw bug is that certain sleep-wake sequences cause the modem
to suddenly reboot: yes, a total reboot out of the blue, completely
blowing away whatever you were doing at the time.  This behaviour is
something that was never experienced by Openmoko, nor by us on any of
our pre-existing Calypso hw targets, thus it is a new problem.

Out of the 5 FCDEV3B boards I have left, I have 3 that aren't broken
in other ways, and on all 3 boards the sleep mode self-reboot bug is
reproducible 100% of the time under the following conditions:

* The firmware needs to be Magnetite-l1reconst.  I have heard reports
  that the self-reboot bug does not happen with Citrine fw (and thus
  Magnetite-hybrid may be similarly avoiding the bug), but it is a
  *hardware* bug, and one of the first steps in debugging such is
  getting it reproducible.  Magnetite-l1reconst fw does the job of
  making the hw bug 100% reproducible.

* This Magnetite-l1reconst fw needs to be flashed, not loaded via
  fc-xram, as the manifestation of the bug involves the modem self-
  rebooting.

* There needs to be a SIM inserted in the socket.  It doesn't matter
  whether or not this SIM is recognized as valid by any network
  operator, as reproducing the sleep mode self-reboot bug does not
  involve connecting to a network or even bringing up the radio at
  all, i.e., the antenna can be disconnected.  The SIM only needs to
  be valid for the AT+CFUN=1 command.

* Boot the board with Magnetite-l1reconst fw, and see the boot output
  in the rvinterf window.  Without changing any sleep modes with
  AT%SLEEP commands, i.e., with all sleep modes enabled by default,
  give it an AT+CFUN=1 command, either through fc-shell or through the
  dedicated AT command UART.  Instead of the command completing with
  an OK response, the modem will self-reboot, which you should observe
  in the rvinterf window.

Disabling all sleep modes with AT%SLEEP=0 stops this self-reboot from
happening, allowing us to get past the AT+CFUN=1 command and on to
subsequent radio bring-up, which is what we've been doing since early
April when I first brought our FCDEV3B boards home and immediately hit
the self-reboot bug.  However, only in the last few days have I done a
deeper investigation.

The first noteworthy observation is that the sleep mode that triggers
the self-reboot (at least in this scenario) is not deep sleep or even
big sleep, but rather small sleep.  If I issue AT%SLEEP=3 (big sleep
and deep sleep enabled, small sleep disabled) before AT+CFUN=1, the
latter command always succeeds, and I can then proceed to radio
bring-up (AT+COPS=0) in this state.  On rare occasions the modem will
self-reboot on a sleep-wake sequence while connected to a network and
listening for paging with big sleep and deep sleep enabled; I haven't
tried big sleep only, but even with both big and deep sleep enabled,
the reboot is quite rare, most of the time the modem is fine like that.

On the other hand, if I issue AT%SLEEP=1 (enable small sleep only)
before AT+CFUN=1, the latter always triggers the self-reboot, just
like with all sleep modes enabled, thus we know that small sleep is
the culprit in this self-reboot scenario.

At this point I need to explain the 3 Calypso sleep modes.  Big sleep
and deep sleep involve L1 code calculating how long the system is
going to remain idle and programming the hardware to disable TDMA
frame interrupts for that long, and in the case of deep sleep, also
stopping the VCXO for that duration.  Small sleep OTOH happens in the
idle thread of Nucleus RTOS scheduler: when all tasks are blocked
waiting for something and the task scheduler falls into the idle
thread waiting for the next interrupt, if small sleep is enabled, TI's
modified version of Nucleus' idle thread will stop the ARM7 CPU by
cutting off its clock with a control register bit; this CPU clock is
then re-enabled by the hardware when the next interrupt occurs.

Small sleep cannot have a duration longer than one TDMA frame
(4.615 ms), as it always ends as soon as an interrupt occurs, and
unless some other interrupt occurs sooner, the wake-up event will be
the next TDMA frame interrupt - except when suppressed by big sleep or
deep sleep logic, these interrupts always occur on every TDMA frame.

But here is what happens on the AT+CFUN=1 command: this command brings
up the SIM interface, and while bits are being transferred to and from
the SIM, the responsible Nucleus task is blocked waiting on a timer.
But while the Nucleus task is blocked thusly, interrupts occur quite
frequently, as the SIM interface hardware interrupts on every byte and
the interrupt handler feeds it the next byte to be sent.  Because no
Nucleus tasks are in runable state, when the SIM interrupt handler
returns, control falls back into the idle thread which activates small
sleep, only to be woken up by the next SIM interrupt one byte later.
Thus the execution of the AT+CFUN=1 command with all sleep modes
enabled involves a lot of back-to-back sleep-wake sequences in rapid
succession, and it is my hypothesis that the latter act as the
triggering condition for our hardware bug.

The next question becomes: what do we about it?  We know that we have
a hardware bug on our hands because the exact same firmware works
without a hitch on Openmoko-made FreeRunners and other pre-existing
Calypso hw targets.  The first thought that naturally comes to mind is
that the rapid back-to-back sleep-wake sequences cause an increased
current draw on one of the power supply rails, which in turn causes
too great of a voltage drop somewhere.  But the perplexing thing is
that our entire modem core design comes straight from Openmoko,
virtually unchanged at the physical layout level, and at the schematic
level (including all capacitor values) Openmoko's modem is completely
unchanged from TI's Leonardo reference in this area.  So how come we
have a hw problem in a section that directly follows both TI and
Openmoko known-good references?

My other first thought from months back was that perhaps the series
jumpers I inserted into VBAT power paths for current measurement
purposes were the culprit, by adding too much series resistance into
those power current paths, but the following experiments strongly
suggest that the problem ought to be elsewhere:

* I tried removing the two-post headers with shorting blocks from JP2
  and JP3, and replaced them with solid wire jumpers soldered directly
  into the PCB holes - no improvement.

* I tried upping the 0402 caps at C220 and C221 (on the VBAT net near
  the inputs to Iota LDO regulators) from 100 nF (Leonardo value that
  always worked fine for Openmoko) to 1 uF - no improvement.

* I tried feeding higher voltages to our board's "battery" power input,
  up to 5 V, to cancel out any voltage drop in the VBAT path before
  the regulators - no improvement.

The above results all suggest that the problem is not likely to be on
the VBAT input side of the LDO regulators in the Iota chip, but is
more likely to be on the regulator output side, i.e., between the
regulated voltage outputs from Iota and the corresponding power
consumers in the Calypso itself or perhaps in the flash+XRAM chip.

I have a sinking feeling that we may not be able to fix this hw bug,
and we may have to publicly admit that our hardware is defective in a
way which we are not able to fully understand, let alone fix, and tell
our users to disable all sleep modes as a workaround.  Or try to hide
the embarrassment by checking in a code change to our firmwares that
disables sleep by default if the hardware target is FCDEV3B, and hope
that no one notices...  However, if it is possible to "fix" the sleep
mode self-reboot mode bug allopathically by upping some other
capacitor(s) on one or more regulated power nets, then I would like to
apply that allopathic fix to our next batch of boards.

Right now I could really use some help from one of the other community
members who has an FCDEV3B board and would be able to do some probing
with an oscilloscope.  At the moment there are only two other FC
community members who have FCDEV3B boards: Das Signal and whoever in
the Serg+Kent+possible_others team currently has the board I sent to
them.  (Harald Welte also has a board, but I doubt that he would be
interested in helping fix something that only affects FreeCalypso and
not OsmocomBB.)  If either of you happen to have access to an
oscilloscope and the knowledge of how to operate it, as well as sharp
eyes and a steady hand (or an assistant who can offer such) to hold an
o'scope probe on one side of an 0402 capacitor, you could help the
project by probing at several points to see if a voltage drop can be
spotted when an AT+CFUN=1 command triggers a self-reboot as described
earlier.  If you are able and willing to help, please let me know and
I'll provide more detailed instructions as to exactly where to probe
and what to look for.

Hasta la Victoria, Siempre,
Mychaela aka The Mother


More information about the Community mailing list