我正在尝试在Java中运行Mallet并且收到以下错误.
Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file. Perhaps the 'resources' directories weren't copied into the 'class' directory. Continuing.
我正试图从Mallet的网站(http://mallet.cs.umass.edu/topics-devel.php)运行这个例子.以下是我的代码.任何帮助表示赞赏.
package scriptAnalyzer; import cc.mallet.util.*; import cc.mallet.types.*; import cc.mallet.pipe.*; import cc.mallet.pipe.iterator.*; import cc.mallet.topics.*; import java.util.*; import java.util.regex.*; import java.io.*; public class Mallet { public static void main(String[] args) throws Exception { String filePath = "C:/mallet/ap.txt"; // Begin by importing documents from text to feature sequences ArrayListpipeList = new ArrayList (); // Pipes: lowercase, tokenize, remove stopwords, map to features pipeList.add( new CharSequenceLowercase() ); pipeList.add( new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")) ); pipeList.add( new TokenSequenceRemoveStopwords(new File("stoplists/en.txt"), "UTF-8", false, false, false) ); pipeList.add( new TokenSequence2FeatureSequence() ); InstanceList instances = new InstanceList (new SerialPipes(pipeList)); Reader fileReader = new InputStreamReader(new FileInputStream(new File(filePath)), "UTF-8"); instances.addThruPipe(new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"), 3, 2, 1)); // data, label, name fields // Create a model with 100 topics, alpha_t = 0.01, beta_w = 0.01 // Note that the first parameter is passed as the sum over topics, while // the second is the parameter for a single dimension of the Dirichlet prior. int numTopics = 5; ParallelTopicModel model = new ParallelTopicModel(numTopics, 1.0, 0.01); model.addInstances(instances); // Use two parallel samplers, which each look at one half the corpus and combine // statistics after every iteration. model.setNumThreads(2); // Run the model for 50 iterations and stop (this is for testing only, // for real applications, use 1000 to 2000 iterations) model.setNumIterations(50); model.estimate(); // Show the words and topics in the first instance // The data alphabet maps word IDs to strings Alphabet dataAlphabet = instances.getDataAlphabet(); FeatureSequence tokens = (FeatureSequence) model.getData().get(0).instance.getData(); LabelSequence topics = model.getData().get(0).topicSequence; Formatter out = new Formatter(new StringBuilder(), Locale.US); for (int position = 0; position < tokens.getLength(); position++) { out.format("%s-%d ", dataAlphabet.lookupObject(tokens.getIndexAtPosition(position)), topics.getIndexAtPosition(position)); } System.out.println(out); // Estimate the topic distribution of the first instance, // given the current Gibbs state. double[] topicDistribution = model.getTopicProbabilities(0); // Get an array of sorted sets of word ID/count pairs ArrayList > topicSortedWords = model.getSortedWords(); // Show top 5 words in topics with proportions for the first document for (int topic = 0; topic < numTopics; topic++) { Iterator iterator = topicSortedWords.get(topic).iterator(); out = new Formatter(new StringBuilder(), Locale.US); out.format("%d\t%.3f\t", topic, topicDistribution[topic]); int rank = 0; while (iterator.hasNext() && rank < 5) { IDSorter idCountPair = iterator.next(); out.format("%s (%.0f) ", dataAlphabet.lookupObject(idCountPair.getID()), idCountPair.getWeight()); rank++; } System.out.println(out); } // Create a new instance with high probability of topic 0 StringBuilder topicZeroText = new StringBuilder(); Iterator iterator = topicSortedWords.get(0).iterator(); int rank = 0; while (iterator.hasNext() && rank < 5) { IDSorter idCountPair = iterator.next(); topicZeroText.append(dataAlphabet.lookupObject(idCountPair.getID()) + " "); rank++; } // Create a new instance named "test instance" with empty target and source fields. InstanceList testing = new InstanceList(instances.getPipe()); testing.addThruPipe(new Instance(topicZeroText.toString(), null, "test instance", null)); TopicInferencer inferencer = model.getInferencer(); double[] testProbabilities = inferencer.getSampledDistribution(testing.get(0), 10, 1, 5); System.out.println("0\t" + testProbabilities[0]); } }
小智.. 10
如果未在"系统"属性中指定日志文件,则Mallet会查找日志文件.如果您使用Maven,最简单的解决方法是将文件放入
src/main/resources/cc/mallet/util/resources/logging.properties
这将自动复制它的部分标准Maven构建过程:
target/classes/cc/mallet/util/resources/logging.properties
所以你不需要任何特殊配置.该文件可以为空,但逻辑故意将其遗漏,因此您可以配置自己的日志记录.
对于使用Maven并尝试配置Mallet日志记录的任何其他人,请尝试以下方法:
在src/mallet_resources/logging.properties
.创建一个新的文本文件.它实际上不需要指定任何东西; 一个空文件足以关闭Mallet.
然后修改您的pom.xml
文件以确保将文件复制到另一个答案中提到的位置.为此,请在该<build><plugins>
部分中添加:
<!--Mallet logging is horrifically verbose, and has not easy to configure--> <!--We have to use this complicated process to copy the logging.properties file to the right location --> <plugin> <artifactId>maven-resources-plugin</artifactId> <version>2.6</version> <executions> <execution> <id>copy-resources</id> <phase>validate</phase> <goals> <goal>copy-resources</goal> </goals> <configuration> <outputDirectory> ${basedir}/target/classes/cc/mallet/util/resources </outputDirectory> <resources> <resource> <directory>src/mallet-resources</directory> <filtering>true</filtering> </resource> </resources> </configuration> </execution> </executions> </plugin>
如果您尝试通过下载版本2.0.8-SNAPSHOT(https://github.com/mimno/Mallet)或获取当前最新的maven版本(2.0.7)来运行Mallet,您将收到此错误.
原因是Mallet期望创建的target\classes\cc\mallet\util\resources
文件夹中的文件logging.properties .使用maven构建项目时,不会创建此文件,因此会发生此异常MalletLogger.java
.
有人应该正确配置maven,以便在目标文件夹中创建logging.properties文件.临时解决方案是修改Mallet代码以设置其他路径logging.properties
.
如果未在"系统"属性中指定日志文件,则Mallet会查找日志文件.如果您使用Maven,最简单的解决方法是将文件放入
src/main/resources/cc/mallet/util/resources/logging.properties
这将自动复制它的部分标准Maven构建过程:
target/classes/cc/mallet/util/resources/logging.properties
所以你不需要任何特殊配置.该文件可以为空,但逻辑故意将其遗漏,因此您可以配置自己的日志记录.