In this post, I make note of hadoop mapper, which are datastore containing key value pairs, similar to Map.
The basic thing to do while writing a mapper class is to extend Mapper. Note this class has type safety, hence pass in appropriate input and output key value pairs. Now override the method, map. The context object stores the output key value pair. Hence any logic extraction to be done is inside this map method.
An example: The mapper works over a log file which has following sample output:
01-Apr-12 00:00:00,026 INFO [scheduler_Worker-4] com.xoom.sync.DistributedXipQuartzJobHandler.executeInternal(97) - Lock was acquired by other xip instance Quartz-lock [bacCheckReversals] at 2012-04-01T00:00:00.026-07:00
The MapClass class's map method extracts anything between [], for the above the line it is scheduler_Worker-4 and stores the value in the context object as 1 (encapsulated in IntWritable object).
The basic thing to do while writing a mapper class is to extend Mapper. Note this class has type safety, hence pass in appropriate input and output key value pairs. Now override the method, map. The context object stores the output key value pair. Hence any logic extraction to be done is inside this map method.
An example: The mapper works over a log file which has following sample output:
01-Apr-12 00:00:00,026 INFO [scheduler_Worker-4] com.xoom.sync.DistributedXipQuartzJobHandler.executeInternal(97) - Lock was acquired by other xip instance Quartz-lock [bacCheckReversals] at 2012-04-01T00:00:00.026-07:00
The MapClass class's map method extracts anything between [], for the above the line it is scheduler_Worker-4 and stores the value in the context object as 1 (encapsulated in IntWritable object).
public static class MapClass extends Mapper<LongWritable, Text, Text, IntWritable> { public void map(LongWritable key, Text value, Context context) throws InterruptedException,IOException { String valueString = value.toString(); String[] parts = valueString.split(" "); //fifth element in parts array should contain name of the thread being executed. // treat it as key. if (parts.length > 4 && parts[4] != null && parts[4].matches("^\\[.+\\]$")) { context.write(new Text(parts[4].replace("[","").replace("]","")),new IntWritable(1)); } else { System.out.println("Value not collected by output collector:"+valueString); } } }Next we move on to writing a simple Junit test case for the mapper class.
No comments:
Post a Comment