大数据技术原理与应用 - 实验二熟悉常用的 HDFS 操作

2019-03-28 本文已影响7人 _Binguner

实验二熟悉常用的 HDFS 操作

一、实验目的

（1）理解HDFS在Hadoop体系结构中的角色。
（2）熟练使用HDFS操作常用的Shell命令。
（3）熟悉HDFS操作常用的Java API。

二、实验平台

操作系统：Linux。
Hadoop 版本：2.7.3 或以上版本。
JDK 版本：1.7 或以上版本。
Java IDE：IDEA

三、实验内容和要求

（1）编程实现以下指定功能，并利用 Hadopp 提供的 Shell 命令完成相同的任务。

1. 向HDFS中上传任意文本文件，如果指定的文件在 HDFS 中已经存在，由用户指定是追加到原有文件末尾还是覆盖原有的文件。

shell：

hadoop fs -put /User/Binguner/Desktop/test.txt /test
hadoop fs -appendToFile /User/Binguner/Desktop/test.txt /test/test.txt
hadoop fs -copyFromLocal -f /User/Binguner/Desktop/test.txt / input/test.txt

    /**
     * @param fileSystem 
     * @param srcPath 本地文件地址
     * @param desPath 目标文件地址
     */
    private static void test1(FileSystem fileSystem,Path srcPath, Path desPath){
        try {
            if (fileSystem.exists(new Path("/test/test.txt"))){
                System.out.println("Do you want to overwrite the existed file? ( y / n )");
                if (new Scanner(System.in).next().equals("y")){
                    fileSystem.copyFromLocalFile(false,true,srcPath,desPath);
                }else {
                    FileInputStream inputStream = new FileInputStream(srcPath.toString());
                    FSDataOutputStream outputStream  = fileSystem.append(new Path("/test/test.txt"));
                    byte[] bytes = new byte[1024];
                    int read = -1;
                    while ((read = inputStream.read(bytes)) > 0){
                        outputStream.write(bytes,0,read);
                    }
                    inputStream.close();
                    outputStream.close();
                }
            }else {
                fileSystem.copyFromLocalFile(srcPath,desPath);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

运行结果：

HDFS 中原来到文件列表：

第一次运行：

HDFS 中文件列表：

image.png

第二次运行：

此时 HDFS 中的目录：

image.png

2. 从HDFS中下载指定文件，如果本地文件与要下载的文件名称相同，则自动对下载的文件重命名。

shell:

hadoop fs -copyToLocal /input/test.txt /User/binguner/Desktop/test.txt

    /**
     * @param fileSystem
     * @param remotePath HDFS 中文件的地址
     * @param localPath 本地要保存的文件的地址
     */
    private static void test2(FileSystem fileSystem,Path remotePath, Path localPath){
        try {
            if (fileSystem.exists(remotePath)){
                fileSystem.copyToLocalFile(remotePath,localPath);
            }else {
                System.out.println("Can't find this file in HDFS!");
            }
        } catch (FileAlreadyExistsException e){
            try {
                System.out.println(localPath.toString());
                fileSystem.copyToLocalFile(remotePath,new Path("src/test"+ new Random().nextInt()+".txt"));
            } catch (IOException e1) {
                e1.printStackTrace();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }

执行前本地目录：

image.png

第一次执行：

image.png

第二次执行：

image.png

3. 将HDFS中指定文件的内容输出到终端中。

shell:

hadoop fs -cat /test/test.txt

    /** 
     * @param fileSystem
     * @param remotePath 目标文件地址
     */
    private static void test3(FileSystem fileSystem,Path remotePath){
        try {
            FSDataInputStream inputStream= fileSystem.open(remotePath);
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
            String line;
            while ((line = bufferedReader.readLine()) != null){
                System.out.println(line);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }

运行结果：

image.png

4. 显示HDFS中指定的文件的读写权限、大小、创建时间、路径等信息。

shell:

hadoop fs -ls -h /test/test.txt

    /**
     * @param fileSystem
     * @param remotePath 目标文件地址
     */
    private static void test4(FileSystem fileSystem, Path remotePath){
        try {
            FileStatus[] fileStatus = fileSystem.listStatus(remotePath);
            for (FileStatus status : fileStatus){
                System.out.println(status.getPermission());
                System.out.println(status.getBlockSize());
                System.out.println(status.getAccessTime());
                System.out.println(status.getPath());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

运行结果：

5. 给定HDFS中某一个目录，输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息，如果该文件是目录，则递归输出该目录下所有文件相关信息。

shell:

hadoop fs -lsr -h /

    /**
     * @param fileSystem
     * @param remotePath 目标文件地址
     */
    private static void test5(FileSystem fileSystem, Path remotePath){
        try {
            RemoteIterator<LocatedFileStatus> iterator = fileSystem.listFiles(remotePath,true);
            while (iterator.hasNext()){
                FileStatus status = iterator.next();
                System.out.println(status.getPath());
                System.out.println(status.getPermission());
                System.out.println(status.getLen());
                System.out.println(status.getModificationTime());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

运行结果：

image.png

6. 提供一个HDFS内的文件的路径，对该文件进行创建和删除操作。如果文件所在目录不存在，则自动创建目录。

shell:

hadoop fs -touchz /test/test.txt
hadoop fs -mkdir /test
hadoop fs -rm -R /test/text.txt

    /**
     * @param fileSystem
     * @param remoteDirPath 目标文件夹地址
     * @param remoteFilePath 目标文件路径
     */
    private static void test6(FileSystem fileSystem, Path remoteDirPath, Path remoteFilePath){
        try {
            if (fileSystem.exists(remoteDirPath)){
                System.out.println("Please choose your option: 1.create. 2.delete");
                int i = new Scanner(System.in).nextInt();
                switch (i){
                    case 1:
                        fileSystem.create(remoteFilePath);
                        break;
                    case 2:
                        fileSystem.delete(remoteDirPath,true);
                        break;
                }
            }else {
                fileSystem.mkdirs(remoteDirPath);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

第一次执行前：

第一次执行：

第一次执行后自动创建文件目录

第二次执行，选择创建文件：

第三次执行，选择删除文件：

7. 提供一个 HDFS 的文件的路径，对该文件进行创建和删除操作。创建目录时，如果该目录文件所在目录不存在则自动创建相应目录；删除目录时，由用户指定该目录不为空时是否还删除该目录。

shell:

hadoop fs -touchz /test/test.txt
hadoop fs -mkdir /test
hadoop fs -rm -R /test/text.txt

    /**
     * @param fileSystem
     * @param remotePath 目标文件夹地址
     */
    private static void test7(FileSystem fileSystem, Path remotePath){
        try {
            if (!fileSystem.exists(remotePath)){
                System.out.println("Can't find this path, the path will be created automatically");
                fileSystem.mkdirs(remotePath);
                return;
            }
            System.out.println("Do you want to delete this dir? ( y / n )");
            if (new Scanner(System.in).next().equals("y")){
                FileStatus[] iterator = fileSystem.listStatus(remotePath);
                if (iterator.length != 0){
                    System.out.println("There are some files in this dictionary, do you sure to delete all? (y / n)");
                    if (new Scanner(System.in).next().equals("y")){
                        if (fileSystem.delete(remotePath,true)){
                            System.out.println("Delete successful");
                            return;
                        }
                    }
                }
                if (fileSystem.delete(remotePath,true)){
                    System.out.println("Delete successful");
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

执行前的 HDFS 文件列表：

第一次执行（删除所有文件）：

此时 HDFS 中的文件列表：

再次运行程序，自动创建文件夹：

8. 向 HDFS 中指定的文件追加内容，由用户指定追加到原有文件的开头或结尾。

shell:

hadoop fs -get text.txt
cat text.txt >> local.txt
hadoop fs -copyFromLocal -f text.txt text.txt

    /**
     * @param fileSystem
     * @param remotePath HDFS 中文件到路径
     * @param localPath 本地文件路径
     */
    private static void test8(FileSystem fileSystem,Path remotePath, Path localPath){
        try {
            if (!fileSystem.exists(remotePath)){
                System.out.println("Can't find this file");
                return;
            }
            System.out.println("input 1 or 2 , add the content to the remote file's start or end");
            switch (new Scanner(System.in).nextInt()){
                case 1:
                    fileSystem.moveToLocalFile(remotePath, localPath);
                    FSDataOutputStream fsDataOutputStream = fileSystem.create(remotePath);
                    FileInputStream fileInputStream = new FileInputStream("/Users/binguner/IdeaProjects/HadoopDemo/src/test2.txt");
                    FileInputStream fileInputStream1 = new FileInputStream("/Users/binguner/IdeaProjects/HadoopDemo/src/test.txt");
                    byte[] bytes = new byte[1024];
                    int read = -1;
                    while ((read = fileInputStream.read(bytes)) > 0) {
                        fsDataOutputStream.write(bytes,0,read);
                    }
                    while ((read = fileInputStream1.read(bytes)) > 0){
                        fsDataOutputStream.write(bytes,0,read);
                    }
                    fileInputStream.close();
                    fileInputStream1.close();
                    fsDataOutputStream.close();
                    break;
                case 2:
                    FileInputStream inputStream = new FileInputStream("/Users/binguner/IdeaProjects/HadoopDemo/"+localPath.toString());
                    FSDataOutputStream outputStream = fileSystem.append(remotePath);
                    byte[] bytes1 = new byte[1024];
                    int read1 = -1;
                    while ((read1 = inputStream.read(bytes1)) > 0){
                        outputStream.write(bytes1,0,read1);
                    }
                    inputStream.close();
                    outputStream.close();
                    break;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

执行前 HDFS 中文件的内容：

第一次执行，加载文件内容到原有文件开头：

第二次执行，加载文件内容到原有文件末尾：

9. 删除 HDFS 中指定的文件。

shell:

hadoop fs -rm -R /test/test.txt

    private static void test9(FileSystem fileSystem,Path remotePath){
        try {
            if(fileSystem.delete(remotePath,true)){
                System.out.println("Delete success");
            }else {
                System.out.println("Delete failed");
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

原来 HDFS 中到目录结构：

执行删除操作：

10. 在 HDFS 中将文件从源路径移动到目的路径。

shell:

hadoop fs -mv /test/test.txt /test2

    /**
     * @param fileSystem
     * @param oldRemotePath old name
     * @param newRemotePath new name
     */
    private static void test10(FileSystem fileSystem, Path oldRemotePath, Path newRemotePath){
        try {
            if (fileSystem.rename(oldRemotePath,newRemotePath)){
                System.out.println("Rename success");
            }else {
                System.out.println("Rename failed");
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

文件原来的名称：

执行修改操纵：

（2）编程实现一个类 `MyFSDataInputStream`，该类继承`org.apache.hadoop.fs.FSDataInputStream`，要求如下：

实现按行读取HDFS中指定文件的方法 readLine()，如果读到文件末尾，则返回空，否则返回文件一行的文本。
实现缓存功能，即利用 MyFSDataInputStream 读取若干字节数据时，首先查找缓存，如果缓存中所需数据，则直接由缓存提供，否则向 HDFS 读取数据。

import org.apache.hadoop.fs.*;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

public class MyFSDataInputStream extends FSDataInputStream {

    private static MyFSDataInputStream myFSDataInputStream;
    private static InputStream inputStream;

    private MyFSDataInputStream(InputStream in) {
        super(in);
        inputStream = in;
    }

    public static MyFSDataInputStream getInstance(InputStream inputStream){
        if (null == myFSDataInputStream){
            synchronized (MyFSDataInputStream.class){
                if (null == myFSDataInputStream){
                    myFSDataInputStream = new MyFSDataInputStream(inputStream);
                }
            }
        }
        return myFSDataInputStream;
    }

    public static String readline(FileSystem fileStatus){
        try {
//            FSDataInputStream inputStream = fileStatus.open(remotePath);
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
            String line = null;
            if ((line = bufferedReader.readLine()) != null){
                bufferedReader.close();
                inputStream.close();
                return line;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

}

运行结果：

（3）查看Java帮助手册或其它资料，用 `java.net.URL` 和 `org.apache.hadoop.fs.FsURLStreamHandlerFactory` 编程完成输出HDFS中指定文件的文本到终端中。

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.fs.Path;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;

public class ShowTheContent {

    private Path remotePath;
    private FileSystem fileSystem;

    public ShowTheContent(FileSystem fileSystem, Path remotePath){
        this.fileSystem = fileSystem;
        this.remotePath = remotePath;
    }

    public void show(){
        try {
            URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
            InputStream inputStream = new URL("hdfs","localhost",9000,remotePath.toString()).openStream();
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
            String line = null;
            while ((line = bufferedReader.readLine()) != null){
                System.out.println(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

输出结果：

欢迎关注本文作者：

扫码关注并回复「干货」，获取我整理的千G Android、iOS、JavaWeb、大数据、人工智能等学习资源。

大数据技术原理与应用 - 实验二熟悉常用的 HDFS 操作

一、实验目的

二、实验平台

三、实验内容和要求

（1）编程实现以下指定功能，并利用 Hadopp 提供的 Shell 命令完成相同的任务。

1. 向HDFS中上传任意文本文件，如果指定的文件在 HDFS 中已经存在，由用户指定是追加到原有文件末尾还是覆盖原有的文件。

2. 从HDFS中下载指定文件，如果本地文件与要下载的文件名称相同，则自动对下载的文件重命名。

3. 将HDFS中指定文件的内容输出到终端中。

4. 显示HDFS中指定的文件的读写权限、大小、创建时间、路径等信息。

5. 给定HDFS中某一个目录，输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息，如果该文件是目录，则递归输出该目录下所有文件相关信息。

6. 提供一个HDFS内的文件的路径，对该文件进行创建和删除操作。如果文件所在目录不存在，则自动创建目录。

7. 提供一个 HDFS 的文件的路径，对该文件进行创建和删除操作。创建目录时，如果该目录文件所在目录不存在则自动创建相应目录；删除目录时，由用户指定该目录不为空时是否还删除该目录。

8. 向 HDFS 中指定的文件追加内容，由用户指定追加到原有文件的开头或结尾。

9. 删除 HDFS 中指定的文件。

10. 在 HDFS 中将文件从源路径移动到目的路径。

（2）编程实现一个类 `MyFSDataInputStream`，该类继承`org.apache.hadoop.fs.FSDataInputStream`，要求如下：

（3）查看Java帮助手册或其它资料，用 `java.net.URL` 和 `org.apache.hadoop.fs.FsURLStreamHandlerFactory` 编程完成输出HDFS中指定文件的文本到终端中。

猜你喜欢

热点阅读

大数据技术原理与应用 - 实验二熟悉常用的 HDFS 操作

一、实验目的

二、实验平台

三、实验内容和要求

（1）编程实现以下指定功能，并利用 Hadopp 提供的 Shell 命令完成相同的任务。

1. 向HDFS中上传任意文本文件，如果指定的文件在 HDFS 中已经存在，由用户指定是追加到原有文件末尾还是覆盖原有的文件。

2. 从HDFS中下载指定文件，如果本地文件与要下载的文件名称相同，则自动对下载的文件重命名。

3. 将HDFS中指定文件的内容输出到终端中。

4. 显示HDFS中指定的文件的读写权限、大小、创建时间、路径等信息。

5. 给定HDFS中某一个目录，输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息，如果该文件是目录，则递归输出该目录下所有文件相关信息。

6. 提供一个HDFS内的文件的路径，对该文件进行创建和删除操作。如果文件所在目录不存在，则自动创建目录。

7. 提供一个 HDFS 的文件的路径，对该文件进行创建和删除操作。创建目录时，如果该目录文件所在目录不存在则自动创建相应目录；删除目录时，由用户指定该目录不为空时是否还删除该目录。

8. 向 HDFS 中指定的文件追加内容，由用户指定追加到原有文件的开头或结尾。

9. 删除 HDFS 中指定的文件。

10. 在 HDFS 中将文件从源路径移动到目的路径。

（2）编程实现一个类 MyFSDataInputStream，该类继承org.apache.hadoop.fs.FSDataInputStream，要求如下：

（3）查看Java帮助手册或其它资料，用 java.net.URL 和 org.apache.hadoop.fs.FsURLStreamHandlerFactory 编程完成输出HDFS中指定文件的文本到终端中。

猜你喜欢

热点阅读

（2）编程实现一个类 `MyFSDataInputStream`，该类继承`org.apache.hadoop.fs.FSDataInputStream`，要求如下：

（3）查看Java帮助手册或其它资料，用 `java.net.URL` 和 `org.apache.hadoop.fs.FsURLStreamHandlerFactory` 编程完成输出HDFS中指定文件的文本到终端中。