• Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival.

    Installing Scrapy

    pip install Scrapy

    screenshot-from-2016-11-12-13-31-59

    Spider crawling process

    • Initialized to the initial URL Request, and set the callback function. When the download is complete the request and returns, generate response, and passed as a parameter to the callback function.
    • Initial spider request by calling start_requests() to get. start_request() to read start_urls the URL, and parse the callback function generates Request.
    • Analysis in the callback function returns (web) content, return Item the object or Request or a two iterations may include a container . After returning to the Request object after Scrapy process, download the appropriate content and calls the callback function set (same function).
    • In the callback function, you can use the selector ( Selector , BeautifulSoup, lxml, etc.) to analyze web content and generate item based on an analysis of the data.
    • Finally, the spider returned item will be saved to the database.

    Example: Spider

    screenshot-from-2016-11-12-13-26-10

    Example: CrawlSpider

    screenshot-from-2016-11-12-13-43-02