[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Intermittent timeout problem



Hello Hugues,

> SetError() #6: connect(): A remote host did not respond within the 
> timeout period.

Heh.. this is a little hard to explain (and I have a feeling you won't 
believe me) but this message doesn't mean that the connection is timing 
out.  :)

If the connection was actually timing out, the message in the debug file 
would look like this:

SetError() #7: Timeout occurred while trying to connect to server!

The message you reported, however, is HTTPAPI error #6, not #7.  And #6 
indicates an error that was sent by the networking code in the operating 
system to HTTPAPI. The operating system is sending a CPE3447 (or 
"ETIMEDOUT") to HTTPAPI.  (If you type DSPMSGD CPE3447 you'll see where 
HTTPAPI is getting the error text from!)

The reason that I find this message to be very confusing is that, despite 
the wording, the error doesn't actually mean that the connection timed 
out. It actually means that an excessive number of packets were lost.

The TCP protocol automatically sends and acknowledges packets.  When you 
send data to the Linux machine, the Linux machine automatically 
acknowledges it. If the i5/OS system doesn't receive an acknowledgement, 
it re-sends yhr packet. After it has done this a certain about of times, 
it considers it an "excessive number of lost packets" and gives up.  When 
it gives up for this reason, it returns the ETIMEDOUT error code -- and 
that's what you're seeing.

All of this happens automatically as part of the TCP protocol.  It's not 
handled by code in HTTPAPI or code on the HTTP server of the Linux 
machine.  It's entirely internal to the operating system's network code.

This means that no bug in HTTPAPI could cause it.  And no bug on the Linux 
server's side could cause it. There's a potential for there to be a bug 
in the operating systems themselves, but this is extremely unlikely 
because both undergo rigorous testing, and both operating systems have 
been time tested by millions of users.

The most likely cause of the problem is a bad network device.  A flaky 
(but not yet dead) network cable, network card, hub, switch, router, etc. 
Somewhere between the iSeries where HTTPAPI is running and the Linux 
machine where the HTTP server is running...    It's almost certainly a 
hardware error.

If this is only happening occasionally, you could simply try again... when 
HTTPAPI returns error #6 (you can retrieve that number from the 
HTTP_error() subprocedure) simply call http_url_get() again.  Set up a 
loop to re-try 5 times.  If the error is only sporadic, that should solve 
the problem.

If the problem is happening too frequently, then you'll want to hunt down 
the device and fix it.

If the problem is happening consistently, every time, then it could be a 
firewall issue.  But it doesn't sound like it's happening all the time.
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------