Firmware deblobbing stats

Mychaela Falconia mychaela.falconia at gmail.com
Sun Sep 22 03:50:58 UTC 2019


Hello FreeCalypso community,

As a little fun exercise, I have just written a tool that allows us to
quantitatively measure exactly how much we have deblobbed our TI-based
modem firmware.  For years I've been saying that our starting point
(the TCS211 semi-src that has been salvaged from the ruins of former
Openmoko) was approximately half source, half blobs, and that our
current FreeCalypso modem fw is almost completely blob-free - but of
course such statements are general terms, lacking quantitative
substance.  The new blobstat tool finally gives us the actual numbers
which we haven't had until now.

Given a final build product that has been produced from a mixture of
source components and linkable binary objects, just how can one
quantify precisely what percentage is source and what percentage is
blobs?  The answer can be obtained from the map file produced by the
linker.  Just like the more popular ELF, TI's object format (a COFF
variant) is based around sections: linkable objects consist of
sections, and so does the final link output.  Every byte in the final
fw image belongs to some section, and each of these output sections
from the linker's perspective is made up of various input sections
(meaning sections taken from linkable objects), with a few bits added
by the linker itself (long Thumb call trampolines and some filler and
padding bytes).  The map file produced by the linker shows the
allocation of every byte in the final fw image: it lists all generated
output sections, shows what input sections each output section is made
of, and shows all linker-added fillers and trampolines.

My blobstat program reads and parses these map files, taking account
of every code section that went into that final link.  It also reads
another file (a classification spec) that indicates which linkable
*.lib files (or which individual objects within these libs) should be
counted in the src category, versus which ones should be counted in
the blob category.  One can also define any other classification
categories as desired.

Let's look at our starting point first.  There are no surviving map or
COFF files corresponding to moko11 or moko10, but there is a surviving
map file corresponding to moko10-beta1; we can use this map file
because there is no difference in the state of source vs blobs between
moko10-beta1 and moko11.  Analyzing this gsm_<blah>.map file from
moko10-beta1 with blobstat, we get the following numbers:

* The total number of bytes in the final fw image that came from
linkable code bits (as opposed to linker-generated fillers and
trampolines) is 0x2156FC.  For comparison, the total image size is
0x2255B4 - there is almost 64 KiB of dead space in there, filled with
padding.

* The portion of the bits which were either compiled from source by OM
or for which they had the exact corresponding source which they chose
not to touch is 0xD3D34 bytes, or about 40% of the total.

* The portion of the bits coming from linkable *.lib files for which
OM did not have corresponding source is 0x1419C8 of the total.

Thus my original assessment of OM's firmware being about half source,
half blobs was pretty close to the true numbers, which turn out to be
60% blobs, 40% source.

Now let's look at our current FreeCalypso production firmware: namely,
the 20190409 build of FC Magnetite hybrid for the fcdev3b target.  Here
the total number of non-padding, non-filler, non-trampoline bytes (the
actual code size) is 0x23F7A4, and guess what the blob percentage is...
The only parts of FC Magnetite hybrid fw which exist as blobs with no
exact corresponding source are Nucleus, the OSL and OSX glue layers of
GPF, and the TMS470 compiler's RTS library.  These blob bits add up to
a grand total of 0xA82C (43052) bytes, comprising about 1.8% of the
total fw code size.  Thus we have gone from 60% blobs, 40% source to
98% source, 1.8% blobs.

So what are we going to do with these last remaining 43052 bytes of
code which we currently use in the form of binary objects with no
corresponding source?  Out of this entire blob division, the one part
which currently stands as the last remaining bone in our figurative
throat (OSL and OSX bits of GPF) weighs 0x3A90 (14992) bytes: about
one third of the total blob division, or just 0.6% of the total fw
code size.  Needless to say, a blob that weighs a total of 14992 bytes
and comes in the form of COFF objects with full symbolic info (-g
style) is very easy to reverse-engineer and thoroughly understand.
There are no mysteries in this OSL/OSX glue code, it is very thoroughly
understood - at least by me.  Instead the problem is that I am not
able to turn this disassembly understanding into recompilable C code -
more precisely, I am not able to produce os_???.c code that can be fed
to TI's TMS470 compiler (the specific version used in the TCS211
program) and which would produce output exactly matching the original
blobs.

I have recently written an article explaining the situation with these
OSL/OSX components and where our Magnetite and Selenite firmwares
stand with respect to them:

https://www.freecalypso.org/hg/freecalypso-docs/file/tip/Firmware-deblobbing

Oh, and the new blobstat program resides in the freecalypso-reveng
repository:

https://www.freecalypso.org/hg/freecalypso-reveng/file/tip/blobstat

Hasta la Victoria, Siempre,
Mychaela aka The Mother


More information about the Community mailing list