comparison doc/Loadtools-performance @ 671:e66fafeeb377

doc/Loadtools-performance: new faster flash operations
author Mychaela Falconia <falcon@freecalypso.org>
date Sun, 08 Mar 2020 03:43:11 +0000
parents 8c6e7b7e701c
children f2a023c20653
comparison
equal deleted inserted replaced
670:815c3f8bcff1 671:e66fafeeb377
1 Dumping and programming flash 1 Memory dump performance
2 ============================= 2 =======================
3 3
4 Here are the expected run times for the flash dump2bin operation of dumping the 4 Here are the expected run times for the flash dump2bin operation of dumping the
5 entire flash content of a Calypso GSM device: 5 entire flash content of a Calypso GSM device with the current version of
6 fc-loadtool which uses the new binary transfer protocol:
6 7
7 Dump of 4 MiB flash (e.g., Openmoko GTA01/02 or Mot C139/140) at 115200 baud: 8 Dump of 4 MiB flash (e.g., Openmoko GTA01/02 or Mot C139/140) at 115200 baud:
8 12m53s 9 6m4s
9 10
10 The same 4 MiB flash dump at 812500 baud: 1m50s 11 The same 4 MiB flash dump at 812500 baud: 0m52s
11 12
12 Dump of 8 MiB flash (e.g., Mot C155/156) at 812500 baud: 3m40s 13 Dump of 8 MiB flash (e.g., Mot C155/156) at 812500 baud: 1m44s
14
15 These times are a 2x improvement compared to all previous versions of
16 fc-loadtool (prior to fc-host-tools-r13) which used a hex-based transfer
17 protocol.
13 18
14 Because of the architecture of fc-loadtool and its loadagent back-end, the run 19 Because of the architecture of fc-loadtool and its loadagent back-end, the run
15 time of a flash dump operation depends only on the serial baud rate and the 20 time of a flash dump operation depends only on the serial baud rate and the
16 size of the flash area to be dumped; it should not depend on the USB-serial 21 size of the flash area to be dumped; it should not depend on the USB-serial
17 adapter type or any host system properties, as long as the host system and 22 adapter type or any host system properties, as long as the host system and
19 programming and fc-xram loading operations are quite different in that their 24 programming and fc-xram loading operations are quite different in that their
20 run times do depend on the host system and USB-serial adapter or other serial 25 run times do depend on the host system and USB-serial adapter or other serial
21 port hardware - this host system dependency exists because of the way these 26 port hardware - this host system dependency exists because of the way these
22 operations are implemented in our architecture. 27 operations are implemented in our architecture.
23 28
29 Flash programming operations
30 ============================
31
24 Here are some examples of expected flash programming times, all obtained on the 32 Here are some examples of expected flash programming times, all obtained on the
25 Mother's Slackware 14.2 host system: 33 Mother's Slackware 14.2 host system:
26 34
27 Flashing an Openmoko GTA02 modem (K5A3281CTM flash chip) with a new firmware 35 Flashing an Openmoko GTA02 modem (K5A3281CTM flash chip) with a new firmware
28 image (2376448 bytes), using a PL2303 USB-serial cable at 115200 baud: 7m35s 36 image (2376448 bytes), using a PL2303 USB-serial cable at 115200 baud: 0m19s to
37 erase 37 sectors, 3m45s to program the image.
29 38
30 Flashing the same OM GTA02 modem with the same fw image, using a CP2102 39 Flashing the same OM GTA02 modem with the same fw image, using a CP2102
31 USB-serial cable at 812500 baud: 1m52s 40 USB-serial cable at 812500 baud: 0m19s to erase, 0m51s to program.
32 41
33 Flashing a Magnetite hybrid fw image (2378084 bytes) into an FCDEV3B board 42 Flashing a Magnetite hybrid fw image (2378084 bytes) into an FCDEV3B board
34 (S71PL129N flash chip) via an FT2232D adapter at 812500 baud: 2m11s 43 (S71PL129N flash chip) via an FT2232D adapter at 812500 baud: 0m24s to erase
44 13 sectors (4 small and 9 large), 1m27s to program the image.
35 45
36 These times are just for the flash program-bin operation, not counting the 46 Regardless of whether you execute these two steps separately or use one of our
37 flash erase which must be done first. Flash erase times are determined 47 new flash e-program-{bin,m0,srec} commands, flash programming is always done in
38 entirely by physical processes inside the flash chip and are not affected by 48 two steps: first the erase operation covering the needed range of sectors, then
39 software design or the serial link: for each sector to be erased, fc-loadtool 49 the actual programming operation that includes the data transfer.
40 issues the sector erase command to the flash chip and then polls the chip for 50
41 operation completion status; the polling is done over the serial link and thus 51 Flash erase times are determined entirely by physical processes inside the
42 may seem very slow, but the extra bit of latency added by the finite polling 52 flash chip and thus should not be affected by software design or the serial
43 speed is still negligible compared to the time of the actual sector erase 53 link: for each sector to be erased, fc-loadtool issues the sector erase command
44 operation inside the flash chip. In contrast, the execution time of a flash 54 to the flash chip and then polls the chip for operation completion status; the
45 program-bin operation is a sum of 3 components: 55 polling is done over the serial link and thus may seem very slow, but the extra
56 bit of latency added by the finite polling speed is still negligible (at least
57 on the Mother's Slackware system) compared to the time of the actual sector
58 erase operation inside the flash chip. One remaining flaw is that in our
59 current implementation the issuance of each individual sector erase command to
60 the flash chip takes 6 command-response exchanges between fc-loadtool and
61 loadagent; on my Slackware host system this extra overhead is still negligible
62 compared to the 0.5s or more for the actual erase operation time, but this
63 overhead may become more significant on host systems with higher latency.
64
65 After the erase operation, the execution time of the main flash programming
66 operation is a sum of 3 components:
46 67
47 * The time it takes for the bits to be transferred over the serial link; 68 * The time it takes for the bits to be transferred over the serial link;
48 * The time it takes for the flash programming operation to complete on the 69 * The time it takes for the flash programming operation to complete on the
49 target (physics inside the flash chip); 70 target (physics inside the flash chip);
50 * The overhead of command-response exchanges between fc-loadtool and loadagent. 71 * The overhead of command-response exchanges between fc-loadtool and loadagent.
72
73 Because image data transfer is taking place in this step, flash programming at
74 812500 baud is faster than 115200 baud, although it is not the same 7x
75 improvement as happens with flash dumps. The present version of fc-loadtool
76 also uses a new binary transfer protocol instead of the hex-based one used in
77 previous versions (prior to fc-host-tools-r13); this change produces a 2x
78 improvement for OM GTA02 flashing, but only a smaller improvement for FCDEV3B
79 flashing.
80
81 Notice the difference in flash programming times between GTA02 and FCDEV3B: the
82 fw image size is almost exactly the same, any difference in latency between
83 CP2102 and FT2232D is less likely to produce such significant time difference
84 given our current 2048 byte transfer block size, thus the difference in physical
85 flash program operation times between K5A3281CTM and S71PL129N flash chips seems
86 to be the most likely explanation.
51 87
52 Programming flash using program-m0 or program-srec 88 Programming flash using program-m0 or program-srec
53 ================================================== 89 ==================================================
54 90
55 Prior to fc-host-tools-r12 flash programming via flash program-m0 or 91 Prior to fc-host-tools-r12 flash programming via flash program-m0 or