1

There is an internal site at work that is hosted on a different server than most of the internal sites. The site outputs some information that I want to obtain via screen scraping. I've done screen scraping using an asp.net (C#) page and an HTTPWebRequest on other internal sites, but unlike most, this site requires a username and password.The username and password are not a secret, they are posted along side the login page and everyone uses the same login info.

I've seen some examples on the web that accomplish automatic login, but none of those are were quite what I need. I want to use an aspx page to login to the site and retrieve some data from the next page.

The examples I've seen involve generating a cookie and posting the login data to the HTTPWebRequest Stream. I'm really not sure how to do this in this case.

Is it possible to simply populate the form fields and execute the submitw button (programatically and behind the scenes).

Here is a portion of the code for the login page:

<script>
//StartTranslate:NetLanguage

        function window_onload() {
                  deleteCookie("BodyURL","/Net",0);
                  
                  document.loginform.UserName.focus();
                  document.loginform.UserName.value=sUserName;
                  document.loginform.UserName.select();
        }

        function doSubmit()     {
                var sUserName = SMCookieGetUserName();
                loginform.submit();
        } 
</script>




<form name="loginform" action="/Net//netportal.dll/SubmitLogin" method="post" >

                <input class="textbox" type="text" name="UserName" id="UserName" maxlength="128" tabindex="1" >                                 
                <input class="textbox" type="password" name="Password" id="Password" maxlength="128" tabindex="2" >
                <img onClick="doSubmit();" src='/net/PortalPages/Images/slogin.gif' onselectstart="return false;" tabindex="3">                                                                                                                                 
                
                <input type="hidden" value="" name="Timezone">
                <input type="hidden" value="" name="redirect">
                <input type="hidden" value="true" name="ExplicitLogin">
</form>
Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
SharpBarb
  • 1,590
  • 3
  • 16
  • 40
  • possible duplicate of [Screen-scraping a site with a asp.net form login in C#?](http://stackoverflow.com/questions/901045/screen-scraping-a-site-with-a-asp-net-form-login-in-c) – Kirk Woll Nov 12 '11 at 00:27

1 Answers1

0

I would think with an application like this you'd just need to call the post directly to the server the way a browser would instead of trying to mess with the html. you can just post the expected form values to the action url and it should just work....

so in your code just make a post call to /Net//netportal.dll/SubmitLogin and add the hidden fields, UserName and Password and start your scraping after the server logs you in. here is an example of some code you could use to get started...just change it around a bit. you might also look into using the htmlagilitypack http://htmlagilitypack.codeplex.com/

    private static string Post ( string Url, string Method, string Content, string ContentType = "application/json", WebHeaderCollection headers = null )
    {
        var address = new Uri(Url);
        var request = WebRequest.Create(address) as HttpWebRequest;

        request.Method = Method;

        if (headers != null)
            request.Headers.Add(headers);

        if (!String.IsNullOrEmpty(Content))
        {
            var bytes = Encoding.UTF8.GetBytes(Content);

            request.ContentLength = bytes.Length;
            request.ContentType = ContentType;

            using (var pStream = request.GetRequestStream())
            {
                pStream.Write(bytes, 0, bytes.Length);
            }
        }

        using (var response = request.GetResponse() as HttpWebResponse)
        {
            var reader = new StreamReader(response.GetResponseStream());

            return reader.ReadToEnd();
        }
    }
Timmerz
  • 6,090
  • 5
  • 36
  • 49