nutch

Apache Nutch

Github星跟踪图

Apache Nutch README

For the latest information about Nutch, please visit our website at:

https://nutch.apache.org/

and our wiki, at:

https://cwiki.apache.org/confluence/display/NUTCH/Home

To get started using Nutch read Tutorial:

https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial

Contributing

To contribute a patch, follow these instructions (note that installing
Hub is not strictly required, but is recommended).

0. Download and install hub.github.com
1. File JIRA issue for your fix at https://issues.apache.org/jira/projects/NUTCH/issues
- you will get issue id NUTCH-xxx where xxx is the issue ID.
2. git clone https://github.com/apache/nutch.git
3. cd nutch
4. git checkout -b NUTCH-xxx
5. edit files (please try and include a test case if possible)
6. git status (make sure it shows what files you expected to edit)
7. Make sure that your code complies with the [Nutch codeformatting template](https://raw.githubusercontent.com/apache/nutch/master/eclipse-codeformat.xml), which is basially two space indents
8. git add <files>
9. git commit -m “fix for NUTCH-xxx contributed by <your username>”
10. git fork
11. git push -u <your git username> NUTCH-xxx
12. git pull-request

IDE setup

Generate Eclipse project files

ant eclipse

and follow the instructions in Importing existing projects.

IntelliJ IDEA users can also import Eclipse projects using the "Eclipser" pluginhttps://plugins.jetbrains.com/plugin/7153-eclipser), see also Importing Eclipse Projects into IntelliJ IDEA.

Export Control

This distribution includes cryptographic software. The country in which you
currently reside may have restrictions on the import, possession, use, and/or
re-export to another country, of encryption software. BEFORE using any encryption
software, please check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to see if this is
permitted. See https://www.wassenaar.org/ for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has
classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which
includes information security software using or performing cryptographic functions with
asymmetric algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception ENC Technology
Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations,
Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

Apache Nutch uses the PDFBox API in its parse-tika plugin for extracting textual content
and metadata from encrypted PDF files. See https://pdfbox.apache.org/ for more
details on PDFBox.

主要指标

概览
名称与所有者apache/nutch
主编程语言Java
编程语言XSLT (语言数: 7)
平台
许可证Apache License 2.0
所有者活动
创建于2009-05-21 01:17:48
推送于2025-07-22 08:15:38
最后一次提交2025-07-16 12:04:04
发布数56
最新版本名称release-1.21 (发布于 2025-07-16 12:05:28)
第一版名称release-0.7 (发布于 2010-05-12 03:51:23)
用户参与
星数3.1k
关注者数233
派生数1.3k
提交数3.5k
已启用问题?
问题数0
打开的问题数0
拉请求数544
打开的拉请求数13
关闭的拉请求数287
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?