MapReduce running in Hadoop environment with maven

Introducing Maven dependency into POM file


Code in main method

 if(args.length !=2){
            System.out.println("Please enter the path");
        Job job = Job.getInstance();
        Configuration conf = new Configuration();

        //1. encapsulate the position of the parameter jarbao
        //2. Wrapping parameters The position of the current job mapper implementation class in the position of the reduce implementation class
        //3. encapsulate the parameters of the current job map output
        //4. encapsulate the parameters What is the output of this job reduce
        // determine whether there is an output folder
        Path path = new Path(args[1]);
        FileSystem fileSystem = path.getFileSystem(conf);// find this file according to path
        if (fileSystem.exists(path)) {
            fileSystem.delete(path, true);// true means that even if output has something, it is deleted along with it

        //5. encapsulate the parameters where the dataset to be processed by this job is generated paths
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        //6. Wrap parameters Number of multiple reduce tasks started
        //7. Submit the job
        boolean b = job.waitForCompletion(true);
        System.exit(b ?0 : 1);

Maven is packaged as a jar package and put into the Hadoop environment

Upload text to Hadoop file

hadoop fs -put /input

Enter the Hadoop environment and enter the command to start

hadoop jar mapreducedemo-1.0-SNAPSHOT.jar /input /output

Read More: