The history of Hadoop: From 4 nodes to the future of data

The history of Hadoop: From 4 nodes to the future of data
Looking back on it today, early iterations of Nutch were kind of laughable. About a year into their work on it, Cutting and Cafarella thought things were going pretty well because Nutch was already able to crawl and index hundreds of millions of pages …
Read more on GigaOM

From Spiders to Elephants: The History of Hadoop
Cutting and Carafella dubbed this project Apache Nutch, and deployed a proof of concept of the indexer on a single machine with about 1GB of RAM and about 1TB of disk, Bonaci writes. Nutch ran pretty well on this setup, and could index about 100 web …
Read more on Datanami

Studying polar data with the help of Apache Tika
In mid-April, members of the open source community will gather in Austin for ApacheCon North America where Annie Bryant Burgess, a postdoctoral fellow in the computer science department at the University of Southern California and project assistant at …
Read more on opensource.com

This entry was posted in Nutch and tagged , , , , , . Bookmark the permalink.