[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: which example can i use to access this webpage



hi Henrik,

What happens when the HTML isn't well-formed (by XML rules)? Does your 
tool have a way to handle that?

That's always been my problem with using an XML parser to read HTML. I'd 
have something like this:

<html>
<body>
<img src="test.jpg">
</body>
</html>

And of course, there's no ending tag for the <img> tag, and it causes 
the XML parser to say the document isn't well-formed, and give up.


On 5/18/2012 6:23 AM, Henrik Rützou wrote:
>     Hi Tim and Scott,
>
>     may I suggest that you combine Scott's example with the xmlReader in
>
>     powerEXT Core that reads HTML as XML
>
>     I have made a little example program that reads a HTML result page from
>     the
>
>     search on the site:
>
>     [1]http://89.239.242.111:6382/pextcgiCOR/readhtml.pgm
>
>     The only changes neede to scotts code is to change the second post so
>     it
>
>     stores the result in a temp IFS file
>
>     On Fri, May 18, 2012 at 2:35 AM, Scott Klement<[2]sk@xxxxxxxxxxxxxxxx>
>     wrote:
>
>       Okay. I've attached an example that I hope will point you in the
>       right direction.
>       This type of coding is hard, because this site isn't intended to be
>       called by a computer program -- it's intended to be called by a web
>       browser. �  Accessing a web site (as opposed to a web service)
>       requires you to have a pretty strong knowledge of how a programmer
>       wrote the web page. � And, figuring out how to read the output is
>       challenging, because the output is designed to dictate a screen
>       layout, it's not designed to identify what each field is and what
>       it's for (as would be the case with a web service.) �  So what
>       you're looking for is possible, but it's hard. � Not because of the
>       tool, but because the site just wasn't meant to be used this way.
>       But, the attached example does work. � It's just harder than it
>       would be if it were a web service.
>       1) You connect to the initial web page, and it sets cookies that it
>       uses to identify your browser session. � HTTPAPI will manage the
>       cookie for you -- but make sure you're running version 1.24 or
>       newer, because there have been bugs fixed recently in the cookie
>       support.
>       2) You create a web form containing the fields in the<input>  tags
>       in the HTML. � Web sites can potentially modify this stuff using
>       JavaScript on the page, so the<input>  tags are a good starting
>       point, but you shouldn't rely on them 100%. � Instead, use a tool
>       like the "Live HTTP Headers" plugin for Firefox to see exactly
>       what's sent/received, then copy that in HTTPAPI.
>       3) After submitting the login form, the site receives your session
>       cookie and your login credentials (user/pass) and validates them.
>       � Once that's done, it sets your session ID's status (stored in a
>       file on the server) to "logged in". � From here on, you must
>       re-submit the cookie with each request, or it won't know you're
>       logged in. � That's okay, though, HTTPAPI manages the cookies and
>       resubmits them as long as you're still running in the same
>       activation group.
>       4) The server redirects you to a new page buy sending a 302 HTTP
>       response, and a new URL. � Your code can call http_redir_loc to get
>       the new URL, and one of the http_get routines to follow the
>       redirect. � You'll see that in the sample code. � I always like to
>       limit the number of redirects to prevent the program gettting stuck
>       in a loop if the redirect points to another redirect, et al.
>       5) Submit the form containing the zip code query. �  I coded the
>       program to take the zip code as a parameter and send it as a query.
>       � Again, I looked at the<input>  html tags on the page, and used
>       Live HTTP Headers to make sure I was sending the right things. � The
>       only thing that I made a variable is the zip code, and you supply it
>       like this:
>       �  �  CALL PGM(MYFIRTEST) PARM(71635) �  (where 71635 is the zip
>       code)
>       6) Finally, the response is received (as an HTML document,
>       explaining how to format data on the browser's screen) containing
>       the list of foreclosures. � I simply displayed the raw HTML on the
>       screen -- I'll leave it up to you to figure out how to get the data
>       you need out of that page (by %scan, %subst, etc)
>       Good luck!
>
>     On 5/17/2012 6:00 PM, [3]tim.dclinc@xxxxxxxxx wrote:
>
>       The site in question is [4]http://www.myfir.com/myFir/login.asp
>       you can use [5]tim.dclinc@xxxxxxxxx as user, and "password" as
>       password.
>       Its a public site which anyone can join...i just wanted to
>       programmatically "check" the site.
>
>       --------------------------------------------------------------------
>       ---
>       This is the FTPAPI mailing list. � To unsubscribe, please go to:
>       [6]http://www.scottklement.com/mailman/listinfo/ftpapi
>       --------------------------------------------------------------------
>       ---
>
>     --
>     Regards,
>     Henrik Rützou
>     �
>     [7]http://powerEXT.com
>     �
>     [plogofull200.png]
>
> References
>
>     1. http://89.239.242.111:6382/pextcgiCOR/readhtml.pgm
>     2. mailto:sk@xxxxxxxxxxxxxxxx
>     3. mailto:tim.dclinc@xxxxxxxxx
>     4. http://www.myfir.com/myFir/login.asp
>     5. mailto:tim.dclinc@xxxxxxxxx
>     6. http://www.scottklement.com/mailman/listinfo/ftpapi
>     7. http://powerext.com/
>
>
>
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list.  To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------

-----------------------------------------------------------------------
This is the FTPAPI mailing list.  To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------