How To Install Nutch 0 8

  • Uploaded by: Sharjeel Sayed
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View How To Install Nutch 0 8 as PDF for free.

More details

  • Words: 271
  • Pages: 3
How to install Nutch 0.8 1) JDK Installation Reference : http://java.sun.com/j2se/1.5.0/install-linux.html cd /usr/local Download JDK 5.0 Update 7 (jdk-1_5_0_07-linux-i586.bin file)from http://java.sun.com/javase/downloads/index.jsp chmod 755 jdk-1_5_0_07-linux-i586.bin ./jdk-1_5_0_07-linux-i586.bin export PATH=/usr/local/jdk1.5.0_07/bin/:$PATH export JAVA_HOME=/usr/local/jdk1.5.0_07 export CLASSPATH=. 2) Tomcat Setup cd /tmp wget http://apache.forbigweb.com/tomcat/tomcat-5/v5.5.17/bin/apache-tomcat5.5.17.tar.gz tar zxvf apache-tomcat-5.5.17.tar.gz mv apache-tomcat-5.5.17 /usr/share/tomcat5 3) Nutch Setup Reference:http://lucene.apache.org/nutch/tutorial8.html http://wiki.apache.org/nutch/FAQ#head-0c5dd359a76f9ac5ed54f9d81d79130e4c9c3302 cd /tmp wget http://mirrors.isc.org/pub/apache/lucene/nutch/nutch-0.8.tar.gz tar zxvf nutch-0.8.tar.gz mv nutch-0.8 /usr/local/nutch cd /usr/local/nutch mkdir urls vi ing #add the line below http://ing.clients.megaesecure.com cp -a crawl-urlfilter.txt crawl-urlfilter.txt.orig vi crawl-urlfilter.txt #Replace *MY.DOMAIN.NAME with your site url cp -a nutch-site.xml nutch-site.xml.orig

vi nutch-site.xml Add the following lines between configuration tag #******************* <property> http.agent.name MES <description>HTTP 'User-Agent' request header. MUST NOT be empty please set this to a single word uniquely related to your organization. NOTE: You should also check other related properties: http.robots.agents http.agent.description http.agent.url http.agent.email http.agent.version and set their values appropriately. <property> http.agent.description MES BOT <description>Further description of our bot- this text is used in the User-Agent header. It appears in parenthesis after the agent name. <property> http.agent.url http://megaesecure.com <description>A URL to advertise in the User-Agent header. This will appear in parenthesis after the agent name. Custom dictates that this should be a URL of a page explaining the purpose and behavior of this crawler. <property> http.agent.email sharjeel at mega dot com <description>An email address to advertise in the HTTP 'From' request header and User-Agent header. A good practice is to mangle this address (e.g. 'info at example dot com') to avoid spamming. #**************** cd /usr/local/nutch bin/nutch crawl urls -dir datacache -depth 2 -topN 50 cd /usr/share/tomcat/webapps

rm -rf ROOT* cp nutch*.war /usr/share/tomcat/webapps/ROOT.war cd /usr/local/nutch/crawl /usr/share/tomcat5/bin/catalina.sh start

Related Documents

How To Install Nutch 0 8
October 2019 19
How To Install Nutch 0 7 2
October 2019 16
How To Install Num2text
November 2019 39
How To Install
November 2019 35
How To Install Lifebars
October 2019 40

More Documents from ""