[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Memory leak in EXPAT?



Here's a post-mortem of the issue, in case anyone else ever has the same problem:

1. Our process had run satisfactorily for months.  We read a web service continuously.

2.  The vendor recently set up a new data center and a "load balancer".  When you request their web service, now you get a response from one of several different servers.

3. The vendor's servers used various settings.  For example, some servers use cookies while some do not.  Significantly, some servers always returned data 8192 bytes at a time while other servers returned varying (random-sized) packets.

4. As a result, EXPAT experienced frequent parsing errors.  EXPAT thought that the data in odd packet sizes was invalid XML, even though it was actually valid XML which had been broken up into random-sized chunks.

5. The job's "Temporary Storage Used" keeps growing and growing.  Apparently, whenever you pass invalid XML to EXPAT, some amount of memory remains allocated.  

6. After the job's "Temporary Storage Used" exceeds 4352 megabytes, you get a hard error due to max heap size exceeded.

7. I believe that EXPAT eats up (never frees) _all_ memory used by an invalid XML document.  Doing the math, we know approximately how many bytes are sent to EXPAT each minute, approx what percentage of those bytes are from "invalid XML", how many hours the job runs until it croaks, etc.  Based on these measurements, whenever EXPAT gets gobbledy-gook instead of an XML document, _all_ of that memory is used and is nevermore available.  After 4352 megabytes of garbled data, all heap space is exhausted and the job croaks big-time.

8. We solved the problem by using http_url_get_raw, accepting any size buffer that we get, ignoring cookies, and constantly looking for a valid closing tag in the XML document.  We only parse the XML after we're sure that we have a valid opening and closing tag.  Now the program runs 24/7 as per design, and as it did before the vendor instituted a bunch of servers with inconsistent settings.


Nasser Shukayr
    I.T. Application Development Team Lead
Heartland Co-op
http://www.heartlandcoop.com
   2829 Westown Parkway, Suite 350
   West Des Moines, Iowa 50266
NShukayr@xxxxxxxxxxxxxxxxx

-----Original Message-----


Message: 3
Date: Tue, 12 Apr 2016 19:00:40 +0000
From: Nasser Shukayr <nshukayr@xxxxxxxxxxxxxxxxx>
To: "ftpapi@xxxxxxxxxxxxxxxxxxxxxx" <ftpapi@xxxxxxxxxxxxxxxxxxxxxx>
Subject: RE: Memory leak in EXPAT?
	
I would love to set up an environment so that the issue can be easily reproduced on a different system.  Alas, the application is highly proprietary and I just don't know of any way to reproduce it externally.

Remaining realistic, here's a chronology:

1. 	An application program continuously reads a web service.  Each XML response is about 56K bytes.  We believe that the web service occasionally sends mal-formed XML.  (Sometimes the document stops abruptly, and any open tags are never closed.)  We believe that each time EXPAT gets invalid XML, a small amount of memory is consumed (not freed up).  After 11 to 13 hours, all heap space is exhausted and the program crashes.

2.	Using Advanced Job Scheduler, we stop and then re-start the job every 8 hours.  Now the program runs without crashing (however there are three times per day, approx 15 minutes each time, when we don't monitor the service even though we'd like to monitor it 24/7).

3.	We changed the application to use IBM's XML Parser (XML-INTO).  We still get occasional parsing errors when the XML document is incomplete.

4.	We observed that the XML errors seem more frequent when we receive multiple responses during the same clock-second.  We changed the program so that it waits at least one full second between requests.  This seemed to reduce the number of errors, however the parsing errors still occur.  We did NOT run this program until it croaked, although that kind of experiment could be insightful.

5.	We changed the program to make sure that the very last XML tag is present.  If that last tag is missing, then we ignore the document because we know that the XML is malformed and it will not parse properly.  Yet we still get occasional parsing errors.  The XML appears to contain occasional control characters such as LF (Line Feed) and occasional random < and > characters.

6.	We can filter out the control characters (CR, LF, etc.) and see what happens next.

7.	According to the vendor, we're their only customer having issues with the web service.  Except for the fact that the program crashes after 11 to 13 hours, we would be totally unaware of any issues.

8.	Our infrastructure team is checking to determine if there's any kind of "noise" in the line which might corrupt XML responses from a website.  The program reliably croaks after running continuously for 11 to 13 hours . (11 hours on Production, 13 hours on Development) . We believe that the program croaks after a finite number of bad XML responses, and that this number is achieved sooner on Production, because the Production CPU is faster than the Development CPU.

9.	A completely separate application monitors a different web service 24/7, and that application works perfectly.  However, the flawless web service is in JSON while the croaky one is in XML.

10.	The troublesome service offers JSON as an option.  Our next step will be to change the program to request and receive the info in JSON, to see if this solves the issues.

Bottom line:  The evidence suggests that invalid XML can causes EXPAT to consume (not free up) a small amount of memory.  After a large number of requests with invalid XML, EXPAT runs out of heap space.

The best solution is probably to correct the incoming data stream, so that the XML is always proper.  Still, it's certainly interesting that a sufficient number of mal-formed XML messages seems to eventually crash EXPAT.


Nasser Shukayr
   I.T. Application Development Team Lead Heartland Co-op



-----Original Message-----

Date: Sat, 9 Apr 2016 12:14:53 -0500
From: Scott Klement <sk@xxxxxxxxxxxxxxxx>
To: HTTPAPI and FTPAPI Projects <ftpapi@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Memory leak in EXPAT?

Nasser,

There isn't just one place where it allocates memory, there are many 
places.   The little bit of information you provided points to the IBM 
QC2ALLOC routine, which is part of the ILE environment used by all ILE programs on the system, so that isn't specific enough.

I really need to know how to reproduce the problem.

-SK


On 4/8/2016 9:04 AM, Nasser Shukayr wrote:
> Thank you for the rapid reply!
>
> The heap space is exceeded when EXPAT tries to allocate a new buffer.
>
> MCH6903:
>      To module . . . . . . . . . :   QC2ALLOC
>      To procedure  . . . . . . . :   do_malloc_default__FUL
>      Statement . . . . . . . . . :   3
>      Message . . . . :   The heap space has reached its maximum allowable size.
>
> I believe that EXPAT allocates resources when it starts to parse, then frees up those resources after it finishes.
>
> Here are two possible theories for running out of heap space:
>
> 1. Theory:  every call to EXPAT depletes (does not free up) a few bytes.  After many thousands of calls, all available heap space is exhausted.
>
> or
>
> 2. Background:  The XML document (from a web service) could occasionally contain mal-formed XML.  The document is approx. 65K bytes and has info on about 90 different commodity price quotes.  On rare occasions in debug, I observed documents which end abruptly.  Some closing tags, and sometimes half of a data element, are missing.  Theory:  When an invalid document gets passed to EXPAT (and it errors out), it does not always free up all the allocated memory.  It happens every now and then.  After 13 hours, it happens often enough to exhaust all available heap space.
>
> On our production system, the program runs about 12 and a half hours (give or take 45 minutes) until it croaks.  On Development, it runs about an hour longer, i.e. 13 and a half hours, plus or minus 45 minutes.  Measurements were made during an 18-day period.  I believe that development is less active than production.
>
>
> Nasser Shukayr
>      I.T. Application Development Team Lead
>     http://www.heartlandcoop.com
>     2829 Westown Parkway, Suite 350
>     West Des Moines, Iowa 50266
> NShukayr@xxxxxxxxxxxxxxxxx
> Office: 515.309.3857
>
>
> -----Original Message-----
>
> Message: 1
> Date: Thu, 7 Apr 2016 20:01:22 +0000
> From: Nasser Shukayr <nshukayr@xxxxxxxxxxxxxxxxx>
> To: "ftpapi@xxxxxxxxxxxxxxxxxxxxxx" <ftpapi@xxxxxxxxxxxxxxxxxxxxxx>
> Subject: Memory leak in EXPAT?
> 	
> Setup:  Program queries a web service constantly, receiving XML response.  Response is parsed with EXPAT.  Program runs most hours of the day.
> Issue:  After 13 hours, the program croaks with MCH6903 (The heap space has reached its maximum allowable size).  (It's not always exactly 13 hours ; it varies between 11:45 and 15:45 ) Temporary solution:  We changed the program so that after about 8 hours, it exits and then re-starts as a brand-new copy.
>
> Why this is not ideal:  We really need to monitor the web service3 continuously, even during the few seconds needed to exit and restart the program.
>
> Has anyone else had a problem with running out of heap space after 13 hours of continuously using EXPAT?
>
> Many thanks,
>
>
> Nasser Shukayr
>      I.T. Application Development Team Lead
>     http://www.heartlandcoop.com
>     West Des Moines, Iowa 50266
> NShukayr@xxxxxxxxxxxxxxxxx
>
> Message: 3
> Date: Thu, 7 Apr 2016 16:33:39 -0500
> From: Scott Klement <sk@xxxxxxxxxxxxxxxx>
> To: HTTPAPI and FTPAPI Projects <ftpapi@xxxxxxxxxxxxxxxxxxxxxx>
> Subject: Re: Memory leak in EXPAT?
>
> Hello Nasser,
>
> This is the first time I can remember seeing this problem.
>
> So to reproduce it, I should create an XML document, and parse it
> repeatedly for 13 hours?   Does it matter what is in the document? Do I
> have to make an HTTP request each time, or does just parsing the XML sufficient?
>
> -SK


------------------------------

-----------------------------------------------------------------------
This is the FTPAPI mailing list digest.  To unsubscribe, go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------


End of Ftpapi Digest, Vol 114, Issue 12
***************************************
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------