Apache Nutch

Apache Nutch is a highly extensible and scalable open source web crawler software project.

It has a highly modular architecture, allowing developers to create plug-ins for media-type parsing, data retrieval, querying and clustering.

In January, 2005, Nutch joined the Apache Incubator, from which it graduated to become a subproject of Lucene in June of that same year.

Since April, 2010, Nutch has been considered an independent, top level project of the Apache Software Foundation.

[3] While it was once a goal for the Nutch project to release a global large-scale web search engine, that is no longer the case.

Nutch robot mascot