The good, the bad and the ordinary

There are two major Java-based projects that offer a web crawler implementation—Nutch and Heritrix. Nutch is an Apache Lucene subproject. Heritrix is the Internet Archive’s open source web…

Blog comments powered by Disqus