Simple Web Crawler Develop with Laravel, PHP, Bootstrap, Mysql, Ajax and Javascript



Action URL Insert at Status HTML Title External Links Google Analytics?
Delete http://5pider.com.br 3 months ago done 5pider – Servidores Amazon e Infraestrutura de TI 103 n/a Crawl Url
Delete http://www.sportket.com 5 months ago done 江苏快3形态走势图 73 n/a Crawl Url
Delete http://linkedin.com 5 months ago done LinkedIn: Log In or Sign Up 58 Yes Crawl Url
Delete http://www.ee.ee 11 months ago done ee.ee 0 n/a Crawl Url
Delete http://www.taniarascia.com/ 1 year ago done Tania Rascia – Web Design and Development 112 Yes Crawl Url
Delete http://thelastcodebender.com 1 year ago new BaeyD International Ltd. 12 Yes Crawl Url
Delete http://oliseglobalagency.org 1 year ago done Olise Global Home 12 n/a Crawl Url
Delete http://jibsengineering.com 1 year ago done JIBS Engineering Service Ltd - Home 26 n/a Crawl Url
Delete http://www.havecv.com/suleiman 1 year ago done Suleiman A Mamman 25 n/a Crawl Url
Delete http://smarbly.com 1 year ago done Smarbly 14 n/a Crawl Url
Delete http://www.havecv.com 1 year ago done HaveCv 28 n/a Crawl Url
Delete http://safsms.com/blog/ 1 year ago done SAFSMS Blog | by FlexiSAF 38 Yes Crawl Url

What is Web Crawler?

A web crawler is a process which is performed by a search engine crawler while it is looking for significant websites paths or links on the index page of a website. This process is called Web crawling or spidering.

Web crawlers can be used to gather specific relevant of information from Web pages, such as harvesting e-mail addresses (usually for spam), or address that links to some specific website or app or your choice. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code.

In the above tutorial demo, i created a simple web crawler that do the following;

The system is going to allow the user to insert URLs via a form. After submitting the URLs will be saved to MySQL table.

After the URLs is saved, the user can see it in a table view. The user can also delete URLs. Every URL will have a default status call “new”. Whenever the Crawling of URL is completed, the status will change to “done”.

During the crawling process, the status should be change to “crawling”. Inside the table, the status of each URL should be visible and the user should be able to filter for the status.

Each result of each crawling will be stored in database table “urls_metrics”. When it is not possible to fetch the metrics (e.g. if the URL is offline) the URL status is going to change to “crawling failed”. The Google Analytics result is going to change to n/a If the URL doesn’t have Google Analytics. The system will also allow the user to fetch all URLs with the status “new, crawling and done”.

You can Download the full code from my github account. https://github.com/suleigolden/webcrawler

NOTE: The github version is develope using my custome PHP MVC Framework. Learn more about my custome PHP MVC Framework here

You can send me a mail if you need the Laravel Verion, i will be happy to send it to you for free. suleimamman@gmail.com