0

I try to load a csv-file into pandas dataframe, but the csv-file is only accessable after login.

So far, it downloads and print(decoded_content) to screen, but I can't figure out how to load the csv into pandas dataframe:

import requests
import urllib2
import pandas as pd
import csv


headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/133.35 (KHTML, like Gecko) Chrome/20.0.2041.120 Safari/521.14'
}

login_data = {
    'username': 'myusername',
    'password': 'mypassword',
    'stayloggedin': '0',
    'login': 'Login'
}

with requests.Session() as s:
    url = 'https://www.domain.tld/en/login.html'
    r = s.get(url, headers=headers)
    r = s.post(url, data=login_data, headers=headers)

    a = s.get('https://www.domain.tld/path/to/file/data.csv')
    decoded_content = a.content.decode('utf-8')

print (decoded_content)

.

Col1;Col2;Col3
0102;120;212
121;122;331
user3200534
  • 103
  • 1
  • 5
  • maybe selenium could help you? – Narcisse Doudieu Siewe Apr 20 '20 at 02:20
  • @user1438644 I'm a total rookie and just began learning python and pandas for data science. I already looked into selenium, but I don't understand it. :/ The above code is fine, but I simply can't figure out how to call the CSV behind the login. – user3200534 Apr 20 '20 at 02:28
  • Could you try this - ```opener = urllib2.URLopener()``` and ```opener.retrieve('https://www.domain.tld/path/to/file/data.csv')``` – tidakdiinginkan Apr 20 '20 at 02:38
  • Try this https://stackoverflow.com/a/41880513/7782271 – Ramin Melikov Apr 20 '20 at 02:44
  • This might help: [Use python requests to download CSV](https://stackoverflow.com/questions/35371043/use-python-requests-to-download-csv) – M.Sqrl Apr 20 '20 at 02:46
  • @ Ramin Melikov: HTTP: 403. @M.Sqrl: This worked and I updated my question, but I still cant get csv into the damn pd dataframe. – user3200534 Apr 20 '20 at 03:03

1 Answers1

0

2020/04/21 Edit

Solution:

I created TestFile.csv with your data:

Col1;Col2;Col3
0102;120;212
121;122;331

Important to note that separator is semi-colon.

import pandas as pd

df = pd.read_csv('TestFile.csv', sep=';')
print(df)
print(type(df))

Output:

   Col1  Col2  Col3
0   102   120   212
1   121   122   331
<class 'pandas.core.frame.DataFrame'>

Process finished with exit code 0

Description of read_csv There are a lot of parameters because .csv files are not governed by a strict set of rules.

M.Sqrl
  • 394
  • 3
  • 12