我正在使用一个将BinaryFiles(jpegs)转换为Hadoop序列文件(HSF)的映射器:
public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String uri = value.toString().replace(" ", "%20"); Configuration conf = new Configuration(); FSDataInputStream in = null; try { FileSystem fs = FileSystem.get(URI.create(uri), conf); in = fs.open(new Path(uri)); java.io.ByteArrayOutputStream bout = new ByteArrayOutputStream(); byte buffer[] = new byte[1024 * 1024]; while( in.read(buffer, 0, buffer.length) >= 0 ) { bout.write(buffer); } context.write(value, new BytesWritable(bout.toByteArray()));
然后我有一个读取HSF的第二个映射器,因此:
public class ImagePHashMapper extends Mapper{ public void map(Text key, BytesWritable value, Context context) throws IOException,InterruptedException { //get the PHash for this specific file String PHashStr; try { PHashStr = calculatePhash(value.getBytes());
和calculatePhash是:
static String calculatePhash(byte[] imageData) throws NoSuchAlgorithmException { //get the PHash for this specific data //PHash requires inputstream rather than byte array InputStream is = new ByteArrayInputStream(imageData); String ph; try { ImagePHash ih = new ImagePHash(); ph = ih.getHash(is); System.out.println ("file: " + is.toString() + " phash: " +ph); } catch (Exception e) { e.printStackTrace(); return "Internal error with ImagePHash.getHash"; } return ph;
这一切都运行正常,但我希望calculatePhash写出每个jpeg的最后修改日期.我知道我可以file.lastModified()
用来获取文件中的最后一个修改日期但是有没有办法在map或calculatePhash中得到它?我是Java的菜鸟.TIA!