[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: which example can i use to access this webpage
hi Henrik,
What happens when the HTML isn't well-formed (by XML rules)? Does your
tool have a way to handle that?
That's always been my problem with using an XML parser to read HTML. I'd
have something like this:
<html>
<body>
<img src="test.jpg">
</body>
</html>
And of course, there's no ending tag for the <img> tag, and it causes
the XML parser to say the document isn't well-formed, and give up.
On 5/18/2012 6:23 AM, Henrik Rützou wrote:
> Hi Tim and Scott,
>
> may I suggest that you combine Scott's example with the xmlReader in
>
> powerEXT Core that reads HTML as XML
>
> I have made a little example program that reads a HTML result page from
> the
>
> search on the site:
>
> [1]http://89.239.242.111:6382/pextcgiCOR/readhtml.pgm
>
> The only changes neede to scotts code is to change the second post so
> it
>
> stores the result in a temp IFS file
>
> On Fri, May 18, 2012 at 2:35 AM, Scott Klement<[2]sk@xxxxxxxxxxxxxxxx>
> wrote:
>
> Okay. I've attached an example that I hope will point you in the
> right direction.
> This type of coding is hard, because this site isn't intended to be
> called by a computer program -- it's intended to be called by a web
> browser. � Accessing a web site (as opposed to a web service)
> requires you to have a pretty strong knowledge of how a programmer
> wrote the web page. � And, figuring out how to read the output is
> challenging, because the output is designed to dictate a screen
> layout, it's not designed to identify what each field is and what
> it's for (as would be the case with a web service.) � So what
> you're looking for is possible, but it's hard. � Not because of the
> tool, but because the site just wasn't meant to be used this way.
> But, the attached example does work. � It's just harder than it
> would be if it were a web service.
> 1) You connect to the initial web page, and it sets cookies that it
> uses to identify your browser session. � HTTPAPI will manage the
> cookie for you -- but make sure you're running version 1.24 or
> newer, because there have been bugs fixed recently in the cookie
> support.
> 2) You create a web form containing the fields in the<input> tags
> in the HTML. � Web sites can potentially modify this stuff using
> JavaScript on the page, so the<input> tags are a good starting
> point, but you shouldn't rely on them 100%. � Instead, use a tool
> like the "Live HTTP Headers" plugin for Firefox to see exactly
> what's sent/received, then copy that in HTTPAPI.
> 3) After submitting the login form, the site receives your session
> cookie and your login credentials (user/pass) and validates them.
> � Once that's done, it sets your session ID's status (stored in a
> file on the server) to "logged in". � From here on, you must
> re-submit the cookie with each request, or it won't know you're
> logged in. � That's okay, though, HTTPAPI manages the cookies and
> resubmits them as long as you're still running in the same
> activation group.
> 4) The server redirects you to a new page buy sending a 302 HTTP
> response, and a new URL. � Your code can call http_redir_loc to get
> the new URL, and one of the http_get routines to follow the
> redirect. � You'll see that in the sample code. � I always like to
> limit the number of redirects to prevent the program gettting stuck
> in a loop if the redirect points to another redirect, et al.
> 5) Submit the form containing the zip code query. � I coded the
> program to take the zip code as a parameter and send it as a query.
> � Again, I looked at the<input> html tags on the page, and used
> Live HTTP Headers to make sure I was sending the right things. � The
> only thing that I made a variable is the zip code, and you supply it
> like this:
> � � CALL PGM(MYFIRTEST) PARM(71635) � (where 71635 is the zip
> code)
> 6) Finally, the response is received (as an HTML document,
> explaining how to format data on the browser's screen) containing
> the list of foreclosures. � I simply displayed the raw HTML on the
> screen -- I'll leave it up to you to figure out how to get the data
> you need out of that page (by %scan, %subst, etc)
> Good luck!
>
> On 5/17/2012 6:00 PM, [3]tim.dclinc@xxxxxxxxx wrote:
>
> The site in question is [4]http://www.myfir.com/myFir/login.asp
> you can use [5]tim.dclinc@xxxxxxxxx as user, and "password" as
> password.
> Its a public site which anyone can join...i just wanted to
> programmatically "check" the site.
>
> --------------------------------------------------------------------
> ---
> This is the FTPAPI mailing list. � To unsubscribe, please go to:
> [6]http://www.scottklement.com/mailman/listinfo/ftpapi
> --------------------------------------------------------------------
> ---
>
> --
> Regards,
> Henrik Rützou
> �
> [7]http://powerEXT.com
> �
> [plogofull200.png]
>
> References
>
> 1. http://89.239.242.111:6382/pextcgiCOR/readhtml.pgm
> 2. mailto:sk@xxxxxxxxxxxxxxxx
> 3. mailto:tim.dclinc@xxxxxxxxx
> 4. http://www.myfir.com/myFir/login.asp
> 5. mailto:tim.dclinc@xxxxxxxxx
> 6. http://www.scottklement.com/mailman/listinfo/ftpapi
> 7. http://powerext.com/
>
>
>
> -----------------------------------------------------------------------
> This is the FTPAPI mailing list. To unsubscribe, please go to:
> http://www.scottklement.com/mailman/listinfo/ftpapi
> -----------------------------------------------------------------------
-----------------------------------------------------------------------
This is the FTPAPI mailing list. To unsubscribe, please go to:
http://www.scottklement.com/mailman/listinfo/ftpapi
-----------------------------------------------------------------------