业务实战场景(十九)Java复制压缩文件25倍性能提升

2023-09-12 本文已影响0人后来丶_a24d

背景

业务上有个之前小伙伴写的文件复制以及文件压缩，这个总耗时非常长5-7分钟才执行完。本文记录了优化后的复制和压缩文件性能大幅提升过程
如果懒得看文章的，可以直接看结论，压缩之前是使用了apache common的compress gzip这个是单线程压缩，本地测试1.9G文件要50秒，改用c写的pigz并行压缩之后在32C160G机器上执行耗时只需要1.9秒，compress gzip在同样的机器上执行依然要50秒左右。复制这块是加大了复制时buffer大小，性能也得到很大提升

压缩文件处理

pigz并发压缩性能测试，机器配置高的

注意机器centos7以上，这边机器是32C160G，硬盘不太记得是SSD还是HDD了，但pigz主要受多核CPU的影响，并发执行，硬盘是SDD能提升，但相对CPU来说提升没那么大，下面是文件大小展示

文件大小.png
截图是pigz压缩执行时间，实际执行1.9秒，占用cpu 27秒，受多核CPU的影响，如果程序在多核CPU上运行，并且程序本身具有并行性，那么user时间可能会比real时间长。这是因为在多核CPU上，程序可以同时运行在不同的核心上，从而提高了CPU的使用效率

执行时间.png
time命令的输出通常包括以下内容

real：实际经过的时间（从计时开始到程序完成的总时间）
user：用户态模式下CPU使用的时间
sys：内核态模式下CPU使用的时间

pigz并发压缩性能测试，机器配置一般的

2C4G的机器配置，centos 7，这边展示文件大小

文件大小_1.png
这边展示执行时间，可以看到pigz还是很吃资源的

执行时间_1.png

pigz说明

pigz性能高主要是因为它充分利用了现代多核CPU的性能，与硬盘的关系不大。pigz是一种基于zlib库的压缩工具，具有并行压缩和解压缩的能力
它利用了多核CPU的并行处理能力，可以同时处理多个数据流，从而提高了压缩和解压缩的效率。pigz的性能虽然与硬盘的读写速度有关，但并不是直接关系。
pigz主要利用多核CPU的性能来进行压缩和解压操作，因此其性能更多地取决于CPU的性能和并行处理的能力。在压缩和解压文件时，pigz会直接在硬盘上进行读写操作，而不是将文件全部读取到内存中。由于pigz支持多线程并行处理，因此它会利用多个CPU核心同时进行压缩或解压操作，从而提高了压缩和解压的速度。尽管硬盘的读写速度会对pigz的性能产生一定影响，但影响程度相对较小。
对于大多数情况下，硬盘的读写速度对pigz的性能影响并不明显。但是，如果使用非常古老的硬盘或者硬盘出现故障，可能会对pigz的性能产生不利影响。总的来说，pigz的性能主要取决于多核CPU的性能和并行处理的能力，而不是直接与硬盘的读写速度有关

apache common的gzip，单线程处理

如果测试的时候已经是tar文件就直接用下面压缩代码，不是的话可以先加下这段代码, 具体的文件回收这些可以写更好些，只是给例子

import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;

import java.io.*;
import java.util.List;
public static void archive(File srcFile, File destFile) throws Exception {
        TarArchiveOutputStream taos = new TarArchiveOutputStream(new FileOutputStream(destFile));
        archive(srcFile, taos, BASE_DIR);
        taos.flush();
        taos.close();
    }
private static void archiveFile(File file, TarArchiveOutputStream taos, String dir) throws Exception {
        TarArchiveEntry entry = new TarArchiveEntry(dir + file.getName());
        entry.setSize(file.length());
        taos.putArchiveEntry(entry);
        BufferedInputStream bis = null;
        try {
            bis = new BufferedInputStream(new FileInputStream(file));
            int count;
            byte[] data = new byte[BUFFER];
            while ((count = bis.read(data, 0, BUFFER)) != -1) {
                taos.write(data, 0, count);
            }
        } catch (IOException e) {

        } finally {
            if (bis != null) {
                bis.close();
            }
        }
        taos.closeArchiveEntry();
    }

测试代码，压缩可以这测这段代码执行时间

import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.util.StopWatch;

import java.io.*;

public static void compressFile(File file, boolean delete) throws Exception {
        try (FileInputStream in = new FileInputStream(file); FileOutputStream fout = new FileOutputStream(file.getPath() + EXT); BufferedOutputStream out
                = new BufferedOutputStream(fout); GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(out)) {
            final byte[] buffer = new byte[5 * 1024 * 1024];
            int n = 0;
            while (-1 != (n = in.read(buffer))) {
                gzOut.write(buffer, 0, n);
            }
            gzOut.close();
            in.close();
            StopWatch stopWatch = new StopWatch();
            stopWatch.start();
            if (delete && file.delete()) {
                logger.info("删除");
            }
            stopWatch.stop();
            System.out.println("delete time ____" + stopWatch.getTotalTimeMillis());
        }

    }

maven配置

 <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-compress</artifactId>
            <version>1.21</version>
        </dependency>

        <dependency>
            <groupId>commons-collections</groupId>
            <artifactId>commons-collections</artifactId>
            <version>3.2.1</version>
        </dependency>
        <dependency>
            <groupId>joda-time</groupId>
            <artifactId>joda-time</artifactId>
            <version>2.9.9</version>
        </dependency>
 <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>com.example.demo.GzTest</mainClass> <!-- replace with your main class -->
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>
        </plugins>
    </build>

本文是使用GzTest测试

 // 上面给的
        String tarPath = archive(coverFile);
        System.out.println(" arch time __" + (System.currentTimeMillis() - start));
        long startCompress = System.currentTimeMillis();
        // 上面给的 主要看这块执行时间对比
        compress(tarPath, true);
        System.out.println("compress time __" + (System.currentTimeMillis() - startCompress));
        System.out.println("compress all time __" + (System.currentTimeMillis() - start));

mvn clean compile assembly:single打包之后丢上去，然后

java -jar 你的jar包测

同样在32C160G机器上apache common的压缩执行时间，快42秒

执行时间_2.png
本机是4C16G，SSD硬盘，测试也是快50秒

SSD.png

4核.png

其他

这边说明下临时文件也可以考虑放在某些/dev/shm内存fs，会快很多
复制文件时buffer的增加也能提升不少性能，优化前buffer是1024，后面改成 5 * 1024 *1024性能提升了10倍

byte[] data = new byte[BUFFER];
while ((count = bis.read(data, 0, BUFFER)) != -1) {
    out.write(data, 0, count);
}

优化代码时增加了多线程处理文件复制删除，以提升性能
pigz在centos7安装，我这边试了下直接yum install pigz部分有点文件，用了参考文章详解pigz使用方法方式可以安装成功

参考文章

详解pigz使用方法

业务实战场景(十九)Java复制压缩文件25倍性能提升

目录

背景

压缩文件处理

pigz并发压缩性能测试，机器配置高的

pigz并发压缩性能测试，机器配置一般的

pigz说明

apache common的gzip，单线程处理

其他

参考文章

猜你喜欢

热点阅读