Python Web Crawler Tutorial 12

Code for tutorials can be found at my github repository. Even more code is available for free here as well. http://github.com/creeveshft Basic web scraping web crawler in python To see my…

This entry was posted in WebCrawler and tagged , , . Bookmark the permalink.

5 Responses to Python Web Crawler Tutorial 12

  1. shills francis says:

    snapdeal,com how to scrape this using python…

  2. shills francis says:

    i am trying to spider a website I am getting below error Traceback (most recent call last): File “C:Python27shillersnapsearch”, line 27, in br.open(url) File “buildbdist.win32eggmechanize_mechanize.py”, line 203, in open return self._mech_open(url, data, timeout=timeout) File “buildbdist.win32eggmechanize_mechanize.py”, line 230, in _mech_open response = UserAgentBase.open(self, request, data) File “buildbdist.win32eggmechanize_opener.py”, line 188, in open req = meth(req) File “buildbdist.win32eggmechanize*urllib2_fork.py”, line 1062, in do_request* for name, value in self.parent.addheaders: ValueError: too many values to unpack

  3. Shripad Deshmukh says:

    where is Python Web Crawler Tutorial 11 ? I am unable to find same

  4. Armagedoom says:

    Hello Chris: Thank you for this awesome tutorial. I would like to ask you something: When I use your script on a spanish website, I get the following error: UnicodeEncodeError: ‘ascii’ codec can’t encode character u’xf1′ in position 9: ordinal not in range(128) This happens because some of the articles names contain “ñ”, and as a result, when I do str(b1), it gives the error. Is there a multy-purpose solution you would use for that problem? Thanks again!

  5. chris reeves says:

    Tutorial 11 is the page spider tutorial. It does not have 11 in the name.

Comments are closed.