Apache Nutch is a highly extensible and scalable open source web crawler software project.
It has a highly modular architecture, allowing developers to create plug-ins for media-type parsing, data retrieval, querying and clustering.
In January, 2005, Nutch joined the Apache Incubator, from which it graduated to become a subproject of Lucene in June of that same year.
Since April, 2010, Nutch has been considered an independent, top level project of the Apache Software Foundation.
[3] While it was once a goal for the Nutch project to release a global large-scale web search engine, that is no longer the case.