FreeEMS Issues - Tunix
View Issue Details
0000478TunixLoaderspublic2011-12-12 12:392012-03-30 20:32
unknown, updating issue to appear in roadmap
0000478: Un-required Pause of 1.87179 Seconds And Double Query
14.166 seconds of actual loading, 4.469 seconds of waiting around before hand. 18.635 total when 14.3 or so was possible?

0.34ms between initial req 0x0D and end of first answer 0xE0 0x08 0x3E
1.87179 seconds huge unrequired pause
then same for 0x0D / 0xE1 0x00 0x3E
then 7.635ms short pause
then 0xB6 then 2.5834 seconds HUGE delay by SM before response of 0xE0 0x00 0x3E
then 5.5ms short pause
then "complex" transmission of 0xA2 0x00 0x30 0xE0 with response of 0xE0 0x00 0x3E
then 1.7ms short pause
then 0xB8
then 0.328 second erase pause which is consistent with solid burn, erase from that point on with efficient 3ms per write packet/response/delay set which are 20 bytes of data and structure/address, 3 byte response and sub 1ms pause before next cycle


 - Why does the SM take 2.6 seconds to do the first erase? This may be unanswerable LOL.
 - Why does mtxloader query the device twice before beginning comms?
 - Why does mtxloader not verify what it wrote? This should be available and optional, but on by default, with setting it off remembered for next time.

The first response is a positive "we're fresh out of boot" and the second is a "we're warm", both are positive.

Attached is a saleae session showing the timings. Adding Sean to this as he may have some insight.
Output from the app is:

Attempting Wakeup...
Command not recognized, e=0xe1
Attempting to erase main flash!
select() (read) timeout, attempts 0!
select() (read) timeout, attempts 1!
Main Flash Erased...
Uploading ECU firmware, eta 0 seconds...

Then a bunch of these with different pages:

Setting page to 0xe0
Erasing page 0xe0
Page 0xe0 erased...

And never a mention of writing anything. I realise that you write in small bits, but maybe "writing X kb in Y chunks of Z bytes" would be nice for the user to see :-)

Also, verifying as an option would be super! I've been using mtxloader exclusively (on linux) for the last few weeks without a hitch. I just happened to have the saleae hooked up and thought I'd see why it paused at the beginning.
No tags attached.
? mtxloader.firmware.load.logicdata (1,895,617) 2011-12-12 12:39
log msloader_strace.log (1,644,049) 2011-12-12 22:49

2011-12-12 12:40   
Corrected initial statement. Sean, any thoughts?
2011-12-12 12:47   
Go and rerun msloader (same code as mtxloader without the GUI) under strace and use the strace man page to get full function call output and timings to see how long each call is siting for. Its up to you to LEARN how to use strace properly as it'll be invaluable for debugging issues on your end that I can't replicate due to lack of hardware or other reasons.
2011-12-12 12:54   
I said:

0.34ms between initial req 0x0D and end of first answer 0xE0 0x08 0x3E
1.87179 seconds huge unrequired pause
then same for 0x0D / 0xE1 0x00 0x3E

MTX said:

Attempting Wakeup...
Command not recognized, e=0xe1

Do you or do you not get this output and behaviour?

Assigning back. This is an informational report with target set to "future", for a reason. I'm interested to hear Sean's comments and I'm not spending time on it, I've got LOTS of other stuff to do. Especially on a simple app like this, finding the reason for it calling "hello" twice should be straight forward without any form of debug. You may care to address some of the other issues that I raised at the same time, or not.
2011-12-12 13:08   
That's what you see on YOUR SALAE, not what the OS sees. Do a strace with timings and full function call output to COMPARE.
2011-12-12 22:47   
Since I have no idea what kind of serial adapter, what OS, driver version or whatever is on YOUR end, my strace is likely not to match up or make sense, hence why YOU need to do it. Attaching my strace of the following:

strace -tt -o /tmp/msloader_strace.log msloader freeems /dev/ttyUSB0 firmware/FreeEMS-0.2.0-SNAPSHOT-49-g7f8d64c-BenchTest.s19
2011-12-12 23:50   
Since I expect you to not understand the trace and keep poking me for info I'll digest the important parts

17:54:53.825720 write(1, "Attempting Wakeup...\n", 21) = 21

Go and check the device to make sure you can WRITE to it, set 200 ms timeout
17:54:53.825848 select(4, NULL, [3], [3], {0, 200000}) = 1 (out [3], left {0, 199976})
Response came back saying FD3 is writable, took 24 microseconds to let us know.

Send a carriage return
17:54:53.825995 write(3, "\r", 1) = 1
Write was successfull

Check device for readability, set 1 second timeout
17:54:53.826080 select(4, [3], NULL, [3], {1, 0}) = 1 (in [3], left {0, 999082})
Response that FD 3 is readable, tool 918 microseconds to let us know

Attempt to read 3 charactors
17:54:53.827078 read(3, "\341", 3) = 1
Only got one charactor out of the 3 requested

Tell the OS to wake us up when FD3 has more data available, 1 second timeout
17:54:53.827143 select(4, [3], NULL, [3], {1, 0}) = 1 (in [3], left {0, 996154})
OS wakes us up, response took 3846 microseconds (3.846 ms)

Ask for 2 more chars from FD3
17:54:53.831069 read(3, "\0>", 2) = 2
Got 2 more chars from FD3
17:54:53.831161 write(1, "Command not recognized, e=0xe1\n", 31) = 31
Code didn't like the FIRST byte in the response, "\341", and bitches about it, however the remaining bytes are OK, and that function eventually returns true and mtxloader continues. This may be a logic/sequencing thing within Mtxloader, with regards to properly handling the response from the SM, can you clarify the response codes?

Spit out error message
17:54:53.831257 write(1, "Attempting to erase main flash!\n", 32) = 32
Tell OS to wake us up when device is writable, 200ms timeout
17:54:53.831339 select(4, NULL, [3], [3], {0, 200000}) = 1 (out [3], left {0, 199994})
Resposne back in 6us, device is writable.
Write "\266" to device
17:54:53.831421 write(3, "\266", 1) = 1
Tell us to wake us up when device is READABLE, 1 second timeout
17:54:53.831483 select(4, [3], NULL, [3], {1, 0}) = 0 (Timeout)
Timeout, nothing happened
Tell the user we had a timeout of 1 seconds
17:54:54.835441 write(1, "select() (read) timeout, attempt"..., 37) = 37
Tell us to wake us up when device is READABLE, 1 second timeout
17:54:54.835589 select(4, [3], NULL, [3], {1, 0}) = 0 (Timeout)
Timeout, nothing happened
Tell the user we had a timeout of 1 seconds
17:54:55.839826 write(1, "select() (read) timeout, attempt"..., 37) = 37
Tell us to wake us up when device is READABLE, 1 second timeout
17:54:55.840042 select(4, [3], NULL, [3], {1, 0}) = 1 (in [3], left {0, 320377})
Device has data waiting, we slept for 1.679623 seconds on this last attempt
Read 3 bytes from device
17:54:56.519907 read(3, "\340\0>", 3) = 3
Got all three
17:54:56.520045 write(1, "Main Flash Erased...\n", 21) = 21
and away we go.
2011-12-13 13:00   
Great info, I'm going to sit on this and work on some other things, but will get back to this and give you more info on what is wrong/right etc. Thanks for taking the time to educate me. You're right that I would have asked and I appreciate you not forcing me to pointlessly ask when you could write it up front like this. Thanks!

God damn it! The decoding you did is for a different execution run. I just went to search for the time stamps to check some things and they're not there... matching data is essential. If you have the log of the one you decoded, please attach...
2011-12-13 13:01   
I found the matching stuff, still, next time, same file, please.
2011-12-13 13:19   
OK, you're doing a bunch of stuff wrong. Please read this document in detail. [^]

You can only send 0x0D once, the first message, the response from that will be 0xE0 0x08 0x3E which means "happy, fresh out of reset, prompt".

The second time you send it, it's an unknown command because you're already awake, so you get an error 0xE1 0x00 0x3E which means "unknown command, SM running, prompt". Why do you send it a second time? Don't!

The long pause is from the B6 command, which I've stated many times shouldn't be used with TA card FreeEMS stuff. It only erases one of the four blocks, not all four. Furthermore, you send 0xB6 more than once. Even if it worked, that doesn't make sense. How MTX loader burns a full image with this approach I do not know. It shouldn't work. I should investigate this further, perhaps you found the way to make it work as is.

What you should be doing is this:

Send 0x0D
Receive either 0xE0 0x08 0x3E "thanks for the wakeup" or 0xE1 0x00 0x3E "we're already awake"
Set PPAGE to whatever is needed
Send 0xB8 to erase that page
Write data for that page
Verify data for that page
Set PPAGE to next needed page

You can also do all the erasing first, then all the writing, then all the erasing, however it is slower and requires a lot more changing page. Sean pointed this out to me some time ago.

I hope this helps.
2011-12-13 18:01   
>The second time you send it, it's an unknown command because you're already awake, so you get an error 0xE1 0x00 0x3E which means "unknown command, SM running, prompt". Why do you send it a second time? Don't!

Where in the trace that I annotated above am I sending 0x0d TWICE??

mtxloader already does something similar to this, so point out EXACTLY WHERE I am sending 0x0D twice? I've read that document, so point me to the EXACT PLACE that's applicable, as 90% of the doc isn't applicable to me, and I don't have the time/ability to memorize it in full.

If I'm doing stuff wrong, then why does the firmware LOAD AND WORK every time? It should be: "I'm not doing it FREDS WAY", and since fred's way isn't actually published, how could I possibly match that to your satisfaction??

Also point out where 0xB6 is broken in the APPnote/Doc (i.e. page + context to locate it)
2011-12-13 18:40   
(edited on: 2011-12-13 18:41)
It's not. But check out the saleae log, and stop being a difficult arsehole all the time. We're supposed to be working together and you consistently spit the dummy and throw your toys out of the cot every single time that I raise an issue. MTXLoader, which this issue is about, not msloader, which it is not about, though, they SHOULD share the exact same code base and only have variance in the interface to the core code. From what you say above, it sounds like they do not. I'd strongly suggest merging the two code bases and simply linking twice from the same set of object files to generate your two distinct binaries. This will save you a lot of time on maintenance and reduce errors/mistakes from duplicated and forgotten work. MTXLoader, in the Saleae log attached, DOES send 0x0D twice, with predictable results. Both results should be handled gracefully. No result should be a fail. Results other than those two results should be a fail. Fails should tell the user what happened.

As I stated above "How MTX loader burns a full image with this approach I do not know.", which, paraphrased, says "I do not know how or why the firmware LOADS AND WORKS every time." as it certainly does not make sense. EDIT, I found my mistake, though it's still wrong, see below. Re doing it Fred's way, I think you'll agree that it does not make sense to erase the memory after writing firmware data to it, or even just to erase it more than once. From the app note:

$B6 — Erase_ALL B6 — Erase all of FLASH and EEPROM memory, preserving only the 2K monitor area. The $E6 error code will be returned if the command does not complete successfully.

The app note predates the multi bank processors. The SM should erase all, but in my previous testing, it did not occur. I should go back and verify that that was indeed the case with the Saleae.

EDIT: It turns out that I misread the log. What you do is B6 erase ALL, then B8, erase page. There is absolutely zero point to this. It just wastes time and flash cycles. You're doing the right thing, already, for most of it, which is why it works, but you're running B6 first, which is pointless on the current SM as it only erases 1/4 of the flash.

So, to fully fix this app, do the following:

 - Send 0x0D once only.
 - Handle BOTH positive responses, even though one is an "error".
 - Error on any other scenario, such as no data, incomplete data, etc.
 - Do not pause for long periods after receiving successful responses.
 - Stop doing the "erase main flash" B6 command as you don't need to when doing the B8 step by step erase.
 - Verify the burn went well by reading the data back out (make this optional or compulsory, at your choice, but it should be possible/easy).

With those minor changes, the app will behave admirably. It already does a good job, but it doesn't entirely make sense. I'll update the issue body to reflect the truth now by changing a 6 to an 8.

2011-12-13 18:41   
Change a 6 to an 8.
2011-12-13 18:49   
You're the one being difficult, how fucking hard is it to run "man strace" and spend 60 seconds scanning through the options?? You utterly REFUSE to even use the knowledge I give you, have you run strace on your end yet and correlated with your salea trace? if not, go back and do that and let me know when you're done and have finished your correlation.

mtxloader and msloader are IDENTICAL functionality, they are basically just WRAPPERS of the exact same underlying "loader" code, one being CLI one being command line, strace the CLI tool, as the trace will be 1000x cleaner and won't have the gui system calls in the output.

You have yet to provide WHERE the 0x0D is sent twice like I asked for. Failure to provide the information means it just gets bounced back to you until you do your homework properly. the loader tool has no idea if it's talking to a "good" or "bad" SM (banked/unbanked) device hence it takes the SAFE route, by doing it both ways, is it perfect, no, does it do the job correctly always? yup.
2011-12-13 19:11   
It's good to hear that the code is the same, hooray. I'm not on a machine attached to a device right now, so I can't do your ssssstrace. Furthermore, you have exact instructions on how to fix your program's semantics provided step by step, in plain simple English, such that even HentaiXP could understand it, above.

As for good/bad safe/not, doing it both ways makes zero sense on any device. ALL devices work with the paged method. Some don't work with the full method. If you're going to do the paged method, then it makes zero sense to do the full method too, no matter which device is attached...

It's dirty, and wrong, and you've got all the info that you need to fix it. Stop whinging and either ignore the ticket, which is a perfectly reasonable approach, or fix it. End of fucking story.

You have enough info, do NOT bounce it back to me. People are already complaining about the amount of time that I'm spending on dealing with your childish antics. Give them what they want.
2011-12-13 19:12   
Edit original description to say 14.3 seconds possible, because it is, and reassign to dave, who it should be assigned to, again, for the lat time, hopefully.
2011-12-13 22:41   
if you gave me what I asked for (you know, the obvious, like complete context, actual work/testing that was requested (strace) and so on and so forth, like where you see the duplicate 0x0D which you STILL HAVE NOT ANSWERED), this back and forth shit wouldn't happen... The fault does not lie 100% with me...
2011-12-13 23:18   
<scrolls up>

Yep, there is still an attachement showing the sending 0x0D twice and the timing of it, which give clues as to which code is responsible, in case it's such a mess that it's not already obvious. Look, fix it, or forget it. You do not need an strace showing that shit, just open the source files and fix it. OK, fuck this. But, this is the LAST time I touch MTX source, just like at least 4 keen developers before me who also found you absolutely impossible to work with.


Pull request: 8c15225

Fixing the erase all was trivial, however as the commit comment notes the sending the 0x0D twice shit is a shambles and I don't have time for fixing that anymore than I do dealing with your childish behaviour. Enjoy.
2011-12-14 00:22   
I do need the strace. As I cannot readily make sense of salea logs on my ancient (crap) machine. since I do NOT have a Salea I have no way to do verification here, but with strace, we BOTH can examine what the software is DOING which is the point I've been trying to get across to you repeatedly and have failed utterly at. I've checked my trace logs and do NOT see any duplicate 0x0D's and it would be helpful to see if perhaps msloader is NOT DOING IT AT ALL, but that its originating from the usb device, perhaps on init, or some other phase.

Have you tested your erase fix to make sure the firmware is still fully functional?
2011-12-14 00:26   
And my point is, no, wait, I said I wasn't going to bother anymore. Closing, half fixed.
2011-12-19 20:10   
Tested the fix in 8c15225, works as advertised.
2011-12-19 22:55   
revised fix in 01242f887cacad00068002ab8266780819620bdf which improves error messages slightly.
2011-12-20 21:53   
Fix works EXACTLY the same as mine, what a surprise! Pushed my commit to a named branch in my repo as a record of this lack of merging/fastforwarding silliness.
2011-12-20 22:17   
You didnt fix the log msgs nor did you properly rebase hence your pull req was denied.
2011-12-20 22:23   
Woops, I was under the false impression that it was up to date. You could have mentioned it, though, or rebased it yourself, it would have rebased cleanly.
2011-12-20 22:31   
Another assumption yet again. Get your facts straight...
2011-12-20 22:56   
I did, just before I posted. It was a clean commit onto a file that hadn't been touched for 10 days, and wasn't touched in any of the commits that I failed to have in my local master at the time when I made the change. IE, my commit and change were perfectly valid and clean to pull in, either by instant merge recursive or instant rebase. Last significant change was nearly a month before mine.

commit 3e1705556a9c53d2ede672a16863aab1c75f7e23
Author: David Andruczyk <>
Date: Sun Dec 4 13:16:22 2011 -0500

    Finished debug transition, enabled -std=c99 compile flag

commit c9be9e6861f08735e74d51545a2608bd629773c4
Author: David Andruczyk <>
Date: Fri Nov 18 11:54:34 2011 -0500

    Loader seems to be semi-reliably working on OS-X now HOWEVER only when using /dev/cu.PL2303-<SERIALNUM> vs /dev/cu.usbserial which still only seems to work the first time after plugin), using 1.4.x curren