0

Been stuck here for a while now. Found few links on SO for this but didn't work for me... In the answer the code is given without comments and since I'm doing this first time I didn't get it... And couldn't get it to work, gives me 403 forbiden error.

C# download file from the web with login

C# https login and download file

http://codesimplified.blogspot.hr/2013/11/asynchronous-file-download-from-web.html

This is my code (part is copy-paste, did some research so I put the comments as my way of thinking -> not sure if they are correct)

        private void button_Click(object sender, RoutedEventArgs e)
    {
        logtsk.Start(); // logtsk = new Task(Login) first time using async methods too (did research, probs there's a better way)
    }

    private async void Login()
    {
        using (var handler = new HttpClientHandler()) //handler is used for extra options and custom stuff to use with client
        {

            var request = new HttpRequestMessage(HttpMethod.Post, "https://somesite.com/login");
            request.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)"); //read this can help with 403 errors
            request.Headers.Add("Accept", "html/txt"); //same, can fix 403

            var CookieJar = new CookieContainer(); //I store cookies from the login request here
            handler.CookieContainer = CookieJar; //bind it to the handler
            var hc = new HttpClient(handler); //create client
            var byteArray = new UTF8Encoding().GetBytes("<user.name@gmail.com>:<password123>"); //creates bytes to send from user:pass pair I guess
            hc.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic", Convert.ToBase64String(byteArray)); //I have no idea what this does... <copied>

            var formData = new List<KeyValuePair<string, string>>(); //don't know why use this after creating the data in the 2 lines above pair O_o... This code is copied...
            formData.Add(new KeyValuePair<string, string>("username", "user.name@gmail.com")); // don't know
            formData.Add(new KeyValuePair<string, string>("password", "password123")); //...
            formData.Add(new KeyValuePair<string, string>("scope", "all")); //nope....

            request.Content = new FormUrlEncodedContent(formData); //creates data again I guess
            var response = await hc.SendAsync(request); //sends request
            MessageBox.Show(response.ToString()); //debug... I like msgbox
            using (FileStream fileStream = new FileStream("c:\\table.xls", FileMode.Create, FileAccess.Write, FileShare.None))
            {
                //copy the content from response to filestream
                var responseFile = await hc.GetAsync("https://somesite.com/subtab/table.xls");
                await responseFile.Content.CopyToAsync(fileStream); //response is gotten by "hc" which has cookie stored so it should be authed and download right?
            }
        }
    }

The code is bad I know that. It got mashed up too I guess, but exceptions thrown are too generic and contain no info. The code now throws errors in HttpClient requests (when sending request) and if I get it to work (don't ask how) it gives 403

Can someone please write it out the way it's supposed to look like /work like with comments and so I can finally understand how to think in HTTP. I want to do it with HttpClient but I'm fine with any other way if explained well. Thank youu!

Community
  • 1
  • 1
Kula
  • 1
  • 1
  • 2
  • A lot of sites will not have login logic on the same page as the file you want to download. You may need to login at their main page, then determine what their cookie structure is and use it to get to the page that has the download. If they don't make it easy by exposing an API, then you need to emulate the moves of a browser going to the site, logging in, moving to the page you want, and then downloading the file. – Shannon Holsinger Sep 08 '16 at 12:38
  • It has to do with sending the same request stream a browser would and reading and manipulating the response headers as necessary for the next request. See https://www.google.com/search?q=c-sharp+using+httpclient+to+mimic+a+browser&ie=&oe= and https://www.google.com/search?q=httpclient+site+with+cookies&ie=&oe= Of course, I'm guessing because I can't see your first response and I don't know the site's URL – Shannon Holsinger Sep 08 '16 at 12:40
  • Can be done via .Net class? I though web browsers do this stuff under the hood just with nice GUI (and scripts ofc) If this is so, it shouldn't be too hard. Also I got a cookie here: http://prntscr.com/cfqe7g from FireFox privacy tab. Should I extract it or something? If I insist on doing this via HttpClient/WebRequest/WebClient in .Net how would the process go? Basic steps? I'm guessing getting the cookie, getting loginbox ids maybe, submit button? Can I generate request so I get the cookie container populated? Thank you @ShannonHolsinger I ll google browser emulation in .Net too! :) – Kula Sep 08 '16 at 18:54
  • Yeah it's not something I can explain in this forum - it's not terribly complicated, but it does involve some code. There are also browser automatons that might help if you just want to one-off something rather than learning how it's all done. You're right - googling that will help. The main thing to learn is how clients and servers communicate through the querystring. Learn that, capture, manipulate, send - that's it. – Shannon Holsinger Sep 08 '16 at 19:04
  • Watch the URL as you log in and navigate to the download page. Copy and paste each new URL into a notepad. See what changes. Close your browser and start again. What's different? Once you get that worked out, you can swap out the new for the old and visit however many times you'd like – Shannon Holsinger Sep 08 '16 at 19:05

0 Answers0