[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: http_url_get problem



Hi Mike,

The 8192 allocation is an optimization in the url_encoder routines. Re-allocating the memory every time the encoder wants to add a byte to the end would be rather inefficient. For example, let's say you had to encode 100k of data. Would you rather re-allocate the memory (and potentially, re-copy the data) 100,000 times? Or would you rather do it 12 times? Which do you think would be more efficient? So that's the idea, by allocating 8k at a time, the routine runs a lot faster.

However, the caller should never rely on this behavior. Heck, if you didn't have the source code for HTTPAPI, you'd have no way to even know this was happening.

In the next release, I might decide to change the way this works -- I might decide to completely revamp the encoder routines... and then where would you be?

The value is not guaranteed to be x'00'. It happens to be x'00' in this case because of the way the operating system works. For security purposes, they don't want users to be able to see memory that was last used by another userid.... so when the OS creates a new "space" from spots in the memory banks or disk that previously were used by another process, it zeroes them out. However, when the memory was last used by the same job/user, then it may not do that. The OS frequently fills memory with zeroes, but it does not _guarantee_ that it will always fill it with zeros.

Ultimately, the point is: If you use these techniques, you run the risk of having problems. If not immediately, then potentially down the road.

And there's no reason to use them. Calling http_url_encoder_getstr() is actually SIMPLER for you to code than working with the raw data pointer -- and it's guaranteed to always work. So it saves you work, and it's not a ticking time bomb.. don't you agree that that's better?!

If you do want to use the raw pointer, you can... but you need to abide by your "agreement" (so to speak) with HTTPAPI. When you call the HTTP_url_encoder (or "webform") routines, HTTPAPI is promising to perform a task for you. Each routine has particular parameters that it passes you that "promise" to provide you particular pieces of information. You should not ignore that information, and assume that it works differently! If HTTPAPI tells you the dataSize is 217 bytes, you have no business assuming that it will always be at least 1000.

Instead of just assuming that the data will be 1000 long, use the length that it provides you. That's exactly why it provides the length -- so you know exactly how much of the memory you're allowed to use.

so if you must work with the raw pointer (which is really only needed if you need to work with strings longer than 32k ) then to do so safely, make sure you observe the length you're passed. For example:

D p_data          s               *
D data            s          32767a   based(p_data)
D size            s             10i 0
D usableData      s          32767a   varying

 /free
     http_url_encoder_getptr( enc : p_data: size );
     if size > %size(data);
        size = %size(data);
     endif;

     usableData = %subst(data: 1: size);

So this code would be okay, because it uses %subst() to make sure it only views data up to the length that HTTPAPI provided.

But again, that example is purely hypothetical. I can't see any reason why you'd want to use the raw pointer, except to simply pass it on to http_url_post() (or other POST functions.) In those cases, you don't need to manipulate it at all.

The only reason HTTPAPI gives you raw pointers at all is so that it can handle strings longer than 65535 (which, at the time HTTPAPI was written, were not available in RPG. Starting withV6R1, of course, they are available...)

Does that help?





On 11/29/2012 5:26 PM, Mike Krebs wrote:
Sorry, Mike... you're not quite correct here.
This turned out to be way long but maybe instructional? At least I hope so!

Thanks for the correction. I was almost there in my thinking... but you can't read what I was thinking only what I type. :)

Just to help me to understand. Based on the "faulty" way the program was coded...The encoded data was in req and req was 1000A variable (after the pointer was assigned). That would be correct? But it would not be correct to say that the encoded data was 1000 long because it was only as long as needed. Even thinking of req being 1000A is kind of wrong. In debug, req was not a "regular" RPG variable as it was undefined when the program first starts running (hence no initialization of values to blanks). But once the pointer gets a value, it does look at the next 1000 uninitialized bytes (except that the bytes in this program are initialized by the encoding routine)?

When I ran the program through debug several times, I was consistently seeing ALL x'00' after the encoded data in the req field. I actually thought I should see "garbage" or spaces out beyond his data but it always seems to be x'00'. So, I took a SWAG that something in the encoder allocation logic was resetting the memory to x'00' out to some magic number larger than 1000.

I couldn't figure out why this would be true and just now verified that the encoding routines only allocate memory as needed.

Except! When I wasn't paying attention to where I was in the code, I fell into a routine where alloc used 8192 for the size (ascii translation routine?). Initially this allocation has what appears to be some garbage bytes. However, the next piece of code uses memset with 0 as the second parameter and it appears to set the memory for all 8192 bytes to x'00'.

And subsequently this value 8192 seems to be used elsewhere:

                              Display Module Source
Program: HTTPAPIR4 Library: LIBHTTP Module: ENCODERR4
    4018                    callp     url_encode( peEncoder
    4019                                        : p_VarX
    4020                                        : wwVarXLen
    4021                                        : dsEnc_Data + dsEnc_Len
    4022                                        : wwLenVar )
    4023                    eval      dsEnc_Len = dsEnc_Len + wwLenVar
    4024
    4025                    eval      p_deref = dsEnc_Data + dsEnc_Len
    4026                    eval      wwDeref = '='
    4027                    eval      dsEnc_Len = dsEnc_Len + %len('=')
    4028
    4029                    callp     url_encode( peEncoder
    4030                                        : p_DataX
    4031                                        : wwDataXLen
    4032                                        : dsEnc_Data + dsEnc_Len
                                                                         More...
  Debug . . .
F3=End program F6=Add/Clear breakpoint F10=Step F11=Display variable
  F12=Resume       F17=Watch variable   F18=Work with watch   F24=More keys
  DSENC_SIZE = 8192                <============================

So while the url_encode routine is passing short data lengths, someplace it keeping hold of the size as 8192.

And just before there:

3993                    eval      dsEnc_Data = xrealloc( dsEnc_data
3994                                                   : wwNewSize )

WWNEWSIZE = 8192

EVAL dsEnc_Data:c 1000     <=======(in theory showing more than 32 bytes of data for the pointer)
DSENC_DATA:C 1000 =
           ....5...10...15...20...25...30...3
      1   'locpartenza=T1&NumSped=520113341  <==== I was in the second add_var routine


At this point my brain is mush from following pointers and routines around but I wonder...Is the "cleared" memory subsequently being used by the encoding routines so that based on program flow there will --most likely (but not absolutely)-- be x'00' after the encoded data? The number of bytes of cleared data is unknown and unpredictable but that might be why I consistently saw x'00' after the data?

In theory, the bytes could be anything - as I saw in the routine where 8192 bytes were allocated. The same would be true with the encoding routines. As the encoded string is built, the data following it could have anything in it. But maybe what usually happens is that the encoded string is being built in the memory that was cleared by the routine that used 8192? And since the program allocated the memory even if deallocated, the memory has become part of the protected program memory. I wonder if on a busy system and slowing down the process by using long debug steps if the memory would eventually "corrupt" beyond the encoded strings.

At least that in theory would explain the several debug sessions in a row that showed x'00' following the data.

Back in the mainline... I can do this:
EVAL myPointer:c 8000
MYPOINTER:C 8000 =
           ....5...10...15...20...25...30...35...40...45...50...
      1   'locpartenza=T1&NumSped=520113341&CodCli=2583

I don't think I should be able to look at 8000 bytes of data unless it is program storage?

And this:
EVAL myPointer:c  10000
MYPOINTER:C  10000 =

This shows no garbage until about 9730 or so.

I can do this and see x'00' after the encoded data.
EVAL myPointer:x  8192
But adding one more byte causes a problem:
EVAL myPointer:x  8193
   Domain violation occurred. (not sure if this is a display issue or a debug issue or an actual domain violation. (why 10000+ character but only 8192 hex?) There are some old PTFs regarding this error but it could be a red herring. We are v7r1 current on PTFs as of 2 weeks ago.)

So maybe, possibly, the 8192 number that is used at some point is wrong and causing "excess" memory allocations and then being carried forward in the realloc for add_var?

And to anyone not *really* following this thread...assuming x'00' beyond your data is wrong - even if it is there! Use the routines provided to get the string data back!
The encoded data is not in a 1000A variable, therefore the data that's
in his "req" variable is "unpredictable".  It could be x'00' if the data
after the part he's actually using happens to be unused memory, then
this is the most likely case.  But, it could contain other byte values
as well.     You're right that the  length of URL is 1000A, but the
length of the data that "req" points to is not -- and you should not
assume that it will be padded with any particular character.
-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------


-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------