Web Scraping

From P2P Foundation

Jump to: navigation, search

Web scraping, together with Open API 's, allows other sites to use the information stored in other web pages; it turns unstructured information into structured information, and opens up the information stored in websites for new usages.

Read the explanation at http://www.readwriteweb.com/archives/web_30_when_web_sites_become_web_services.php


Definition

"Web Scraping is essentially reverse engineering of HTML pages. It can also be thought of as parsing out chunks of information from a page. Web pages are coded in HTML, which uses a tree-like structure to represent the information. The actual data is mingled with layout and rendering information and is not readily available to a computer. Scrapers are the programs that "know" how to get the data back from a given HTML page. They work by learning the details of the particular markup and figuring out where the actual data is." (http://www.readwriteweb.com/archives/web_30_when_web_sites_become_web_services.php)


Examples

Yahoo Pipes, focuses on remixing RSS feeds


Teqlo, ocuses on letting people create mashups and widgets from web services and rss

Dapper, generic scraping service for any web site

Personal tools
Bookmark and Share