Tag Archives: Maven Hadoop Environment

MapReduce running in Hadoop environment with maven

Introducing Maven dependency into POM file

 <build>
        <pluginManagement>
            <plugins>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-jar-plugin</artifactId>
                    <version>2.4</version>
                    <configuration>
                        <archive>
                            <manifest>
                                <addClasspath>true</addClasspath>
                                <classpathPrefix>lib/</classpathPrefix>
                                <mainClass>com.pro.main</mainClass>
                            </manifest>
                        </archive>
                    </configuration>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>

Code in main method

 if(args.length !=2){
            System.out.println("Please enter the path");
            System.exit(-1);
        }
        Job job = Job.getInstance();
        Configuration conf = new Configuration();

        //1. encapsulate the position of the parameter jarbao
        job.setJarByClass(Submitter.class);
        //2. Wrapping parameters The position of the current job mapper implementation class in the position of the reduce implementation class
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordConutReduce.class);
        //3. encapsulate the parameters of the current job map output
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        //4. encapsulate the parameters What is the output of this job reduce
        job.setOutputKeyClass(Text.class);
        job.setOutputKeyClass(IntWritable.class);
        // determine whether there is an output folder
        Path path = new Path(args[1]);
        FileSystem fileSystem = path.getFileSystem(conf);// find this file according to path
        if (fileSystem.exists(path)) {
            fileSystem.delete(path, true);// true means that even if output has something, it is deleted along with it
        }


        //5. encapsulate the parameters where the dataset to be processed by this job is generated paths
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        //6. Wrap parameters Number of multiple reduce tasks started
        job.setNumReduceTasks(1);
        //7. Submit the job
        boolean b = job.waitForCompletion(true);
        System.exit(b ?0 : 1);

Maven is packaged as a jar package and put into the Hadoop environment

Upload text to Hadoop file

hadoop fs -put xxx.info /input

Enter the Hadoop environment and enter the command to start

hadoop jar mapreducedemo-1.0-SNAPSHOT.jar /input /output