更新回答:NLTK适用于2.7井.我有3.2.我卸载了3.2并安装了2.7.现在它的作品!!
我已经安装了NLTK并试图下载NLTK数据.我所做的是按照本网站上的说明进行操作:http://www.nltk.org/data.html
我下载了NLTK,安装了它,然后尝试运行以下代码:
>>> import nltk >>> nltk.download()
它给了我如下错误信息:
Traceback (most recent call last): File "", line 1, in nltk.download() AttributeError: 'module' object has no attribute 'download' Directory of C:\Python32\Lib\site-packages
都尝试nltk.download()
和nltk.downloader()
,既给了我的错误信息.
然后我习惯help(nltk)
拿出包裹,它显示以下信息:
NAME nltk PACKAGE CONTENTS align app (package) book ccg (package) chat (package) chunk (package) classify (package) cluster (package) collocations corpus (package) data decorators downloader draw (package) examples (package) featstruct grammar help inference (package) internals lazyimport metrics (package) misc (package) model (package) parse (package) probability sem (package) sourcedstring stem (package) tag (package) test (package) text tokenize (package) toolbox tree treetransforms util yamltags FILE c:\python32\lib\site-packages\nltk
我确实在那里看到了Downloader,不知道它为什么不起作用.Python 3.2.2,系统Windows vista.
要下载特定数据集/模型,请使用该nltk.download()
功能,例如,如果您要下载punkt
句子标记器,请使用:
$ python3 >>> import nltk >>> nltk.download('punkt')
如果您不确定需要哪种数据/模型,可以从基本的数据+模型列表开始:
>>> import nltk >>> nltk.download('popular')
它将下载"热门"资源列表,其中包括:
<collection id="popular" name="Popular packages"> <item ref="cmudict" /> <item ref="gazetteers" /> <item ref="genesis" /> <item ref="gutenberg" /> <item ref="inaugural" /> <item ref="movie_reviews" /> <item ref="names" /> <item ref="shakespeare" /> <item ref="stopwords" /> <item ref="treebank" /> <item ref="twitter_samples" /> <item ref="omw" /> <item ref="wordnet" /> <item ref="wordnet_ic" /> <item ref="words" /> <item ref="maxent_ne_chunker" /> <item ref="punkt" /> <item ref="snowball_data" /> <item ref="averaged_perceptron_tagger" /> </collection>
如果有人nltk
从/sf/ask/17360801/下载更大数据集时避免错误
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip $ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite $ python >>> import nltk >>> dler = nltk.downloader.Downloader() >>> dler._update_index() >>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed. >>> dler.download('popular')
从v3.2.5开始,当nltk_data
未找到资源时,NLTK会提供更多信息性错误消息,例如:
>>> from nltk import word_tokenize >>> word_tokenize('x') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load opened_resource = _open(resource_url) File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open return find(path_, path + ['']).open() File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt') Searched in: - '/Users/alvas/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '' **********************************************************************
要查找nltk_data
目录(自动神奇地),请参阅/sf/ask/17360801/
要下载nltk_data
到其他路径,请参阅/sf/ask/17360801/
要配置nltk_data
路径(即为NLTK设置不同的路径nltk_data
),请参阅/sf/ask/17360801/
尝试
nltk.download('all')
这将下载所有数据,无需单独下载.
不要命名你的文件nltk.py我使用相同的代码并将其命名为nltk,并且得到了与你相同的错误,我更改了文件名并且进展顺利.
安装Pip:在终端中运行: sudo easy_install pip
安装Numpy(可选):运行: sudo pip install -U numpy
安装NLTK:运行: sudo pip install -U nltk
测试安装:运行: python
然后输入: import nltk
下载语料库
跑 : python -m nltk.downloader all
您无法调用一个已保存的python文件,nltk.py
因为解释器正在读取该文件,而不是实际文件。
更改python shell正在读取的文件的名称,然后尝试最初执行的操作:
import nltk
然后 nltk.download()
这为我工作:
nltk.set_proxy('http://user:password@proxy.example.com:8080') nltk.download()