1

java script sign in page I am trying to scrape a table from a web page from my work with xml for speed advantages. I have posted the link but unfortunately you will not be able to sign in as it is behind a firewall.

I have manged to get the table with regular ie scraping but once I heard about xml, I knew that that was the way to go.

However, when I try sending credentials and checking the response, I keep seeing the response that I need to send credentials.

Sub test() 

    link = "http://w2tsl72/FAB-2_PROD/DmrListResults.asp?area=FAB&location=EPI&status=OP&Lot=ALL&waitTitle=ENG&waitBadge=18352&OpenFrom=ALL&OpenTo=ALL&CloseFrom=ALL&CloseTo=ALL&defect_cat=ALL&defect_group1=ALL&defect_group2=ALL&EightD=ALL&MRB=ALL&dmrType=ALL&EQUIP=ALL&NCMR=ALL&SOLAN=ALL&prod="

    With CreateObject("MSXML2.XMLHttp")

        .Open "get", link, False, "xxxx", "xxxxx"

        .send

        htmlDoc.body.innerHTML = .responseText

        Debug.Print .responseText

    End With

End Sub

Second attempt:

Sub test23()

link = "http://w2tsl72/FAB-2_PROD/Login.asp"

With CreateObject("MSXML2.XMLHttp")    

    .Open "get", link, False

    .send

    htmlDoc.body.innerHTML = .responseText

    htmlDoc.getElementById("text1").Value = "xxxx"
    htmlDoc.getElementById("password1").Value = "xxxxx"
    htmlDoc.getElementById("submit1").Click




link = "http://w2tsl72/FAB-2_PROD/DmrListResults.asp?area=FAB&location=EPI&status=OP&Lot=ALL&waitTitle=ENG&waitBadge=18352&OpenFrom=ALL&OpenTo=ALL&CloseFrom=ALL&CloseTo=ALL&defect_cat=ALL&defect_group1=ALL&defect_group2=ALL&EightD=ALL&MRB=ALL&dmrType=ALL&EQUIP=ALL&NCMR=ALL&SOLAN=ALL&prod="`
End With

End Sub

I assume that I have signed in and now I want to navigate to the link above with xml.

After making corrections suggested by Wayne, is this what you mean?

Sub test23()

link = "http://w2tsl72/FAB-2_PROD/Login.asp"


Dim response As String
UserName = "xxxx"
Password = "xxxx"
With XMLPage
    .Open "post", link, False
    .setRequestHeader "Cookie", "ASPSESSIONIDQSRASDCT=EPKEOALBCAIKKCHNEGBKJJJG"
    .setRequestHeader "Authorization", "Basic " + Base64Encode(UserName + ":" + Password)
    .send
    htmlDoc.body.innerHTML = .responseText
End With

    link = "http://w2tsl72/FAB-2_PROD/DmrListResults.asp?area=FAB&location=EPI&status=OP&Lot=ALL&waitTitle=ALL&waitBadge=18352&OpenFrom=ALL&OpenTo=ALL&CloseFrom=ALL&CloseTo=ALL&defect_cat=ALL&defect_group1=ALL&defect_group2=ALL&EightD=ALL&MRB=ALL&dmrType=ALL&EQUIP=ALL&NCMR=ALL&SOLAN=ALL&prod="

With XMLPage

  .Open "GET", link, False
  .setRequestHeader "Cookie", "ASPSESSIONIDQSRASDCT=EPKEOALBCAIKKCHNEGBKJJJG"
  .send
   htmlDoc.body.innerHTML = .responseText
End With

End Sub

new code attempt

Sub post_frm()

    Dim objIE As Object, xmlhttp As Object
    Dim response As String

    Set objIE = CreateObject("InternetExplorer.Application")
    objIE.Navigate "about:blank"
    objIE.Visible = True

    Set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")

    '~~> Indicates that page that will receive the request and the type of request being submitted
    xmlhttp.Open "post", "http://w2tsl72/FAB-2_PROD/Login.asp", False
    '~~> Indicate that the body of the request contains form data
    xmlhttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"

    '~~> Send the data as name/value pairs
    xmlhttp.send "text1=xxxx&password1=xxxx"


    response = xmlhttp.responseText
    objIE.Document.write response

    Set xmlhttp = Nothing

End Sub
  • I tried using the get.elementsbyid and setting the values with the login button being clicked and then to try and navigate to the main link to get the table but I don't know how to navigate with xml. – Joseph Davidow Dec 15 '18 at 16:14
  • 1
    Are you getting a 401 error back? I'm assuming you're sending the credentials as user ID/password pairs, encoded using base64? – Wayne Phipps Dec 15 '18 at 16:21
  • Hi Wayne, how do I check the error returned? – Joseph Davidow Dec 15 '18 at 16:23
  • I can't see you sending the credentials in the headers. You are trying to interact with the DOM document as if using a browser rather than xmlhtttp. See here: https://codingislove.com/http-requests-excel-vba/ – QHarr Dec 15 '18 at 17:06
  • I tried that and got an error sub or function not defined for `Base64Encode` – Joseph Davidow Dec 15 '18 at 17:51
  • 1
    You probably missed the link where it says [Here’s a paste of utility function that helps to encode string to Base64](https://bin.codingislove.com/ojojafoziz) – Wayne Phipps Dec 15 '18 at 18:14
  • Theres a similar post here, I'm not sure if it helps: https://stackoverflow.com/questions/20712635/providing-authentication-info-via-msxml2-serverxmlhttp – Wayne Phipps Dec 15 '18 at 18:21
  • Wayne, Thanks! I did miss that part and I have since added the two functions to my code for Base64Encode, however, I am still receiving the same response after I send the xml request. when I debug.print, it shows me the javascript behind the login showing me that it is still the same page. At least I think it is. – Joseph Davidow Dec 15 '18 at 18:50
  • Are you getting a 401 error back? I'm assuming you're sending the credentials as user ID/password pairs, encoded using base64? The status after sending is 200. – Joseph Davidow Dec 15 '18 at 19:01
  • I have downloaded fiddler and I am trying to understand how to determine when the credentials are sent. – Joseph Davidow Dec 15 '18 at 20:13
  • If you're getting status 200, I would expect that the response contains the page content. – Wayne Phipps Dec 15 '18 at 20:23
  • what does that mean? Sorry, I'm really new to this. I have manged nicely to use the std ie scraping but as soon as I discovered the xml, I was "WOW". I have manged to get simple tables with the xml, but getting behind the authorization is where I am stuck. – Joseph Davidow Dec 15 '18 at 20:28
  • You would more that likely be getting a 401 status code back of authentication was an issue. If you're getting status 200 then it's unlikely thats the problem. Perhaps I'm not understanding the issue, what do you mean by *getting behind the authorization is where I am stuck*? – Wayne Phipps Dec 15 '18 at 20:35
  • If I am managing to successfully send the credentials, 1: How can I verify this? I feel blind using xml. Everytime I debug the responsetext it always shows the same java script for the sign in page. 2: How can I then navigate to the next URL which has the user's request to build the appropriate table? – Joseph Davidow Dec 15 '18 at 20:40
  • If you're using the method from the link which QHarr provided, I believe you should get the page content in `xmlhttp.responseText`. This is assuming the page is using Basic HTTP Authentication and not some other method. – Wayne Phipps Dec 15 '18 at 20:44
  • I am using what QHarr provided. I have added an image of the java script in my original question. This does not change no matter what I do. – Joseph Davidow Dec 15 '18 at 20:53
  • This is assuming the page is using Basic HTTP Authentication and not some other method - How can I check this? – Joseph Davidow Dec 15 '18 at 20:56
  • The screenshot is incomplete but the page is not using basic auth. It could be setting some cookie or session. You would need make a HTTP Post sending the form data and capture the response. Take a look at this page https://jigsaw.w3.org/HTTP/Basic/ this is what basic authentication would normally look like. – Wayne Phipps Dec 15 '18 at 21:02
  • To answer your question **How can I then navigate to the next URL** you would simply make a new GET request for the new URL – Wayne Phipps Dec 15 '18 at 21:03
  • I see that there is a cookie. Do I need to set this as a header? – Joseph Davidow Dec 15 '18 at 21:12
  • I tried what you suggested and just ran the code with the authentication and then the next link and I got a 302 status in fiddler. – Joseph Davidow Dec 15 '18 at 21:20
  • status 302 is normally URL redirection, it's probably trying to redirect you to the login page because you've not authenticated – Wayne Phipps Dec 15 '18 at 21:22
  • You probably need to send a POST request to the login URL and include your credentials in the form data. The response should contain the session or cookie data you then need to send on subsequent GET requests – Wayne Phipps Dec 15 '18 at 21:28
  • I have added new code based on what you suggested. I don't know how to add the code after the comments. – Joseph Davidow Dec 15 '18 at 21:45
  • Thats not going to work. You either need to look at the form variables of the login form or look at the request and response in Fiddler when you login. It's likely posting the values you use in the form with specific ID's. Your code will need to mimic that process and capture the response. Take a look at this for example: https://stackoverflow.com/questions/8798661/automate-submitting-a-post-form-that-is-on-a-website-with-vba-and-xmlhttp# – Wayne Phipps Dec 15 '18 at 21:53
  • A similar link from MS about Posting form data: https://support.microsoft.com/en-gb/help/290591/how-to-submit-form-data-by-using-xmlhttp-or-serverxmlhttp-object – Wayne Phipps Dec 15 '18 at 22:08
  • First of all, you have shown me how to debug the xml. THANK YOU for that. Second, I think I understand what you mean. I checked for the ID's in the DOM of the ie page and used them. I have added the new code. However this still doesnt work. – Joseph Davidow Dec 15 '18 at 22:14
  • You will need to make two requests, the first being a Post of the form data to authenticate making sure to capture the response so the session ID etc can be sent in the second Get request. From the look of your new code, you’re only making one request – Wayne Phipps Dec 15 '18 at 22:48
  • Remember you can use F12 dev tools and network tab to inspect web traffic when logging in. Info about what is sent is captured there. – QHarr Dec 16 '18 at 08:03
  • Wayne - you are right. I do need two requests but to start, I just wanted to see if I am able to post the information. When the ie page loads and I write to it, I still do not see any information. QHarr- How can I determine what is being sent to the username and password. I know the ID's for each element but using them hasnt helped – Joseph Davidow Dec 16 '18 at 14:54
  • Joseph, can you capture/compare the request sent to the server when you login via the browser? You should be able to mimic the same format when you send your credentials via VBA and will hopefully get the same response if you do. – Wayne Phipps Dec 17 '18 at 14:17
  • I did capture it and I placed the exact format at the `send` line as I posted in my last posted code. Am I wrong in thinking that if I have managed to send the data to the browser trough the xml, when writing the response to the IE object, I should be able to see the username and password according to what I have sent? – Joseph Davidow Dec 17 '18 at 21:20
  • Wayne, Any feedback? – Joseph Davidow Dec 20 '18 at 19:26
  • You should be able to capture the post request made from both to compare. The first step is getting the post you make from your VBA script to match the format sent from the browser. You should then get the same response back from your auth request. The next step would be to capture and use whatever session ID you get back in any further requests you make. It's going to need allot of experimentation on your part and since nobody but you has access. – Wayne Phipps Dec 24 '18 at 21:34

0 Answers0