Mantis Bug Tracker

View Issue Details Jump to Notes ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000270LoaderSerial Monitor Commspublic2011-09-24 13:412013-01-18 23:07
ReporterFred 
Assigned Tosean94z 
PriorityhighSeverityminorReproducibilityalways
StatusassignedResolutionreopened 
PlatformLinuxOSDebianOS VersionSid/Unstable
Product Version 
Target Version0.1.0Fixed in Version0.1.0 
Summary0000270: First Connect Fails Several Times
DescriptionFirst hit of connect returns instantly without success
Second hit hangs for a while, then returns without success
Third hit returns instantly without success
Fourth hit works reasonably

And variations of the above. Please fix, costing me time while loading firmware during dev.
TagsNo tags attached.
Issue TypeBug
Risk of Breakagemedium
Attached Files

- Relationships

-  Notes
User avatar (0000470)
sean94z (reporter)
2011-10-24 21:36

fixed since commit f5627738cac3a3dc06dda087a1d0b761aacd7411
User avatar (0000481)
Fred (administrator)
2011-10-24 23:13

Will test and close tomorrow!
User avatar (0000483)
Fred (administrator)
2011-10-26 08:50

I saw this yesterday, but not the same as before, and it was probably a one off, but, it got me thinking... :-) Yes, I know, dangerous...

Does the code currently do this:

 - Flush
 - Send init sequence
 - Listen for reply with short timeout
 - If receive reply, set connected state and mark load button
 - If not get reply, mark connect button again
 - Stop

If so, what we should probably do is have a setting for connect retries such that that sequence, not including stop, is repeated N times in quick succession if not successful the first time. Due to the naturally short time out the user would barely notice if it had to do it 3 times and the chances are it would *always* work on the second round.

This covers the case where you just HAPPEN to get a framing error in that first try, or something like that. It should also be really easy to do in code and really effective! Thoughts?
User avatar (0000484)
Fred (administrator)
2011-10-26 08:51

Back to you for comment and/or minor change as per last note.
User avatar (0000487)
sean94z (reporter)
2011-10-27 00:39

the code currently does this:

 - Send init sequence
 - Flush
 - Listen for reply with short timeout 3x
 - If receive reply, set connected state and mark load button
 - I could change the text to "try to connect again"
 - Stop
User avatar (0000488)
Fred (administrator)
2011-10-27 07:22

Hmmm, I may be off track here, but that seems wrong to me.

If you flush after you send the init sequence, you risk flushing the reply!

Also, what good does listening three times do? Unless you're listening too soon, in which case a very short delay in the thread doing the listening (it is threaded, right? :-) ) should make sure the reply is ready to RX when you "listen".

Don't change the button to "connect again", that'd be poor style, just output "connect failed, please try again" in the text output area or a pop up, preferbaly just the text area.
User avatar (0000492)
sean94z (reporter)
2011-10-27 14:27

Sorry I left something out.

 - Send serial init
 - Flush
 - Send SM sequence
 - Listen for reply with short timeout 3x
 - If receive reply, set connected state and mark load button
 - I could change the text to "try to connect again"
 - Stop

Dave turned me onto *selecting instead of *reading from the serial port. A select will wait until a char comes in, while a *read may return nothing. All *selects are tried three times before the serialCode kicks back a read error.
User avatar (0000493)
Fred (administrator)
2011-10-27 14:39

So by "send serial init" you don't mean that at all. You mean init the serial port? And by Send SM Sequence you mean send the open connection stuff to the SM?

So the logic on the selects is which:

A do select, if timeout do again, etc
B do select, if wrong data OR timeout do again, etc.

If B that's got to be wrong... but I suspect your wording was vague, not the code wrong, right?

Actually, reading again, "serialCode" is your built-in lib? And you do three selects on all fail to reads? With no ability to override the time out? Hmmm. The time out for a GP read would want to be very short so that it returned quickly, but the timeout on a specific speed port and setup would want to be tuned to the requirements of that setup such that you're sure it should have arrived by now and can confidently issue an error. Hmmm, I'll leave that stuff in your hands :-)

However, what IS clear is that you're only sending the sm sequence once, and only listening for it once (with the low level retry), and it would be good if your high level stuff could also do retries on the sending level too.
User avatar (0000553)
Fred (administrator)
2011-11-11 18:26

On the mac it does one fail to connect and then succeeds most times. Just FYI. I think this logic, whatever it is, needs a rework.
User avatar (0000554)
sean94z (reporter)
2011-11-11 19:26

I think it just needs a couple sleep()s.
User avatar (0000557)
Fred (administrator)
2011-11-11 20:05

How can a sleep call possibly do anything when it's locked up and already in a coma?
User avatar (0000558)
sean94z (reporter)
2011-11-11 20:16

hmmm maybe I need a couple sleep()s LOL
User avatar (0002271)
Fred (administrator)
2012-10-06 12:51

LOL @ last comment. Rather than making a new issue:

(14:10:47) Fred: first connect blocks for a second or so
(14:10:52) Fred: second one works instantly

This is DEFINITELY solvable. I know because I've done it:

fred@cheetah:~/workspaces/eclipse/serial-monitor$ time java -jar target/serial-monitor-0.0.1-SNAPSHOT-bin.jar /dev/ttyUSB0
Opening serial device: /dev/ttyUSB0
Wrote 0x0DGoing to sleep for 6
Over slept by 4
 Got 3 bytes! Successful open: 3 {0xE0, 0x08, 0x3E, }

real 0m0.295s
user 0m0.228s
sys 0m0.040s

295 milliseconds includes JVM startup, class loading, etc, etc, etc...

The truth of the matter is that after just 2 milliseconds you can have your answer. Something is wrong with the way you're handling serial reads IMO.
User avatar (0002274)
Fred (administrator)
2012-10-06 20:31

(21:54:15) Fred: Error: Data read, but it was not a serial monitor ACK
(21:54:15) Fred: Info: Unable to summon SerialMonitor
(21:54:15) Fred: how many retires?
(21:54:34) Fred: Info: closing serial port
(21:54:38) Fred: on second connect...
(21:54:58) Fred: p, li { white-space: pre-wrap; } serial monitor already running
(21:55:05) Fred: so the first one DID succeed...
(21:55:28) Fred: flush your buffer(s) before sending/reading?
(21:56:53) Fred: close/reset should return instantly.
(21:57:00) Fred: and always
(21:57:10) Fred: it virtually can't fail to send one byte and close the device
(21:58:10) Fred: holy shit
(21:58:21) Fred: i close the WINDOW and it takes a full second to terminate
(21:58:23) Fred: why
(21:58:25) Fred: not cool
User avatar (0002276)
sean94z (reporter)
2012-10-06 21:44

(21:54:15) Fred: how many retires? 0

(21:55:28) Fred: flush your buffer(s) before sending/reading? It's flushed after the port has been opened.

(21:56:53) Fred: close/reset should return instantly. The read() block has to expire before the threads can read the terminate request. The block has been reduced greatly in the serialLib.

I was unable to repeat this error on my recent work, but let me see if I can figure out what's going on via the clues you gave me.
User avatar (0002278)
Fred (administrator)
2012-10-07 00:18

If blocking on a specific quantity of data, it should be for a specific time.

If blocking on single byte reads into a buffer, it should be for a short time.

I also just had a rip fail in the middle, that's pretty disappointing as there is no real reason not to keep trying quite hard to complete.
User avatar (0002279)
sean94z (reporter)
2012-10-07 00:21

"If blocking on a specific quantity of data, it should be for a specific time." Maybe, but don’t some SM operations take longer to complete than others???

right, read operations are easy to *retry
User avatar (0002280)
Fred (administrator)
2012-10-07 00:22

Yes, but you know up front, even if only empirically, how long that is.
User avatar (0002281)
sean94z (reporter)
2012-10-07 00:28

I think the generic time-out is currently 2 seconds.
User avatar (0002282)
Fred (administrator)
2012-10-07 00:35

Seems like a life time.

For a single byte I wait just 2ms. The worst case is erase all, which takes under 2.6 seconds.

Everything else is fast.

Block is just snother way of sleeping, really.
User avatar (0002283)
sean94z (reporter)
2012-10-07 00:37

Yeah over kill for all but erase all. It was just a quick&dirty way to get it done in a hurry.
User avatar (0002284)
sean94z (reporter)
2012-10-07 00:43

quick/dirty/temporary :-p
User avatar (0002475)
sean94z (reporter)
2012-12-13 16:20

The connect button has been removed all together, this issue can no longer exist.
User avatar (0002476)
Fred (administrator)
2012-12-13 16:23

Presumably it still connects, why can't it fail the same way?
User avatar (0002535)
Fred (administrator)
2013-01-18 23:04

Still does the first connect fails thing.
User avatar (0002536)
sean94z (reporter)
2013-01-18 23:07

I cant reproduce it.... Let me try a couple other rs232 devices.


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker