[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ftpapi] GSK_WOULD_BLOCK issue after upgrade to 7.4


This is very strange.  Here's what I see:

1) gsk_secure_soc_read() is trying to read data from the network, and getting GSK_WOULD_BLOCK.  This means there isn't any data to read from the network.  ("Would Block" means it would have to wait for data, but I've told it not to wait, so it returns the message instead.)

2) select() returns that there is data available now, so it does not wait until the timeout.

(Steps 1-2 repeat in a loop.)

My guess is that it needs to receive SSL-related headers, et al, that aren't actually data returned to the program.  There is some data available for these headers (that's why select() tells us there's data immediately) but not enough for the full SSL data.  (So gsk_secure_soc_read would need to block in order to provide data.)

By calling it multiple times, of course, we eventually get the data -- but since this is a tight loop, it could potentially spin for awhile, which isn't good.

I'm guessing this is a bug in the OS, and the gsk_secure_soc_read() should really be pulling the off of the socket and placing it in its internal buffer so that select() works properly.  But, I don't know that for certain.  The fact that this occurred following an OS upgrade seems to corroborate that, though.

Make sure you have all of the latest PTFs installed, as I know there were a lot of bugs in SSL in 7.4.

If that doesn't help, I suspect we'll have to report it to IBM.


On 3/24/2020 2:40 AM, stefan@xxxxxxxxxx wrote:
Hi Scott,

The timeout value is not changed, and looking in the attached log we can see that the application is not waiting for 30 seconds in the select loop. To aid in troubleshooting I have added some debug printouts in the refill subproc , around the gsk_secure_soc_read and in the 502-GSK_WOULD_BLOCK loop and the select. They are all marked with *AL* in the log.
Unfortunately this log is not showing safetynet = 10, which was our starting point, but at a couple of times the safetynet is increased. The application is retrieving pdf-invoices and those are removed from the host after being sucessfully retrieved, we need to wait for some new invoices before trying to debug again.

As always, thank you for your time and cooperation in helping us out. Much appreciated.

No trees were killed in the sending of this message, but a large number of electrons were terribly upset.

Stefan Tageson
+46 732 369934

On 23 March 2020 at 23:10:17 +01:00, Scott Klement <sk@xxxxxxxxxxxxxxxx> wrote:

Hi Stefan,

What do you have the time out set to?  That routine is meant to sit and wait in a loop for the TCP channel to be ready. If you have the timeout set to 30 seconds, for example, it should sit on the select() API for 30 seconds, and should only repeat in an unusual circumstance.  It seems strange that a safetyNet of 10 isn't high enough.

Can you please send a debug/trace file?

If possible, can you tell me how to reproduce the problem?


On 3/23/2020 1:29 PM, stefan@xxxxxxxxxx wrote:
Upgrading a well working https-application from i os 7.2 to i os 7.4 gave us problem while receiving the response-header and the application crashed with "< recvresp(): end with err  " in the debug file.
Having a closer look to the subproc refill we bumped the safetynet counter from 10 to 100 and the application was happy again. Anyone else having the same issue? Is there a better approach to achieve the goal of keeping the application happy? Bumping the timeout value? If so - where is this timeout value set?

All the best,

Ftpapi mailing list

Ftpapi mailing list