State of the Art Web Scraping
September 27, 2014
There are tools that are great for this kind of thing. I like selenium using the phantomjs webdriver. The problem is that while selenium is great for complicated logins, it’s terrible at things like downloading files or XHR JSON data.
I need a way to glue requests together with selenium so I can use the right tool for the right job: selenium for login and requests for everything else. Cookie copying to the rescue! I use selenium to login, then copy the cookies to requests for everything else. Here’s the snippet to copy the cookies:
def copy_cookies_to_session(driver, session): """Copy cookies from selenium webdriver to requests.Session""" cookies = driver.get_cookies() for cookie in cookies: session.cookies.set( cookie['name'], cookie['value'], domain=cookie['domain'], path=cookie['path'] )
Now I can use the right tool for the right job.