测试工具:FIO --- IO引擎学习
FIO是怎么将I/O发送到文件中的呢?是通过下面的引擎来进行操作的:
**sync**
使用read()、write()、lseek()函数来进行IO的读写和定位
**psync**
使用pread()、pwrite()来进行IO的读写,在所有系统都支持(除了windows)
由于lseek和read 调用之间,内核可能会临时挂起进程,所以对同步造成了问题,调用pread相当于顺序调用了lseek和read,这两个操作相当于一个捆绑的原子操作。
由于lseek和write 调用之间,内核可能会临时挂起进程,所以对同步问题造成了问题,调用pwrite相当于顺序调用了lseek和write,这两个操作相当于一个捆绑的原子操作。
这两个函数无法中断其定位和读写操作,另外不更新文件指针
**vsync**
模仿队列的IO合并功能,尽量减少提交数
readv/writev:在一次函数调用中读、写多个非连续缓冲区,但是这些缓冲区已经用iovec表示好了,减少了系统调用的次数。
**pvsync**
pwritev()系统调用结合了writev()和的功能pwrite ()。它执行与writev()相同的任务,但是添加了第四个参数偏移量,指定输出所在的文件偏移量要进行操作。这些系统调用不会更改文件偏移量。即,读写与定位的原子操作 + 队列合并。
**pvsync2**
同pvsync类似,多了第五个参数,用于设置本次操作的属性,如:
RWF_DSYNC (since Linux 4.7) Provide a per-write equivalent of theO_DSYNC open(2) flag. This flag is meaningful only forpwritev2(), and its effect applies only to the data range written by the system call.(O_DSYNC:每次write都等待物理I/O完成,但是如果写操作不影响读取刚写入的数据,则不等待文件属性更新)
RWF_HIPRI (since Linux 4.6) High priority read/write. Allows block-based filesystems to use polling of the device, which provides lower latency, but may use additional resources. (Currently, this feature is usable only on a file descriptor opened using theO_DIRECTflag.)
RWF_SYNC (since Linux 4.7) Provide a per-write equivalent of theO_SYNC open(2) flag. This flag is meaningful only forpwritev2(), and its effect applies only to the data range written by the system call.(O_SYNC:每次write都等到物理I/O完成,包括write引起的文件属性的更新)
RWF_NOWAIT (since Linux 4.14) Do not wait for data which is not immediately available. If this flag is specified, thepreadv2() system call will return instantly if it would have to read data from the backing stor‐ age or wait for a lock. If some data was successfully read, it will return the number of bytes read. If no bytes were read, it will return -1 and seterrnotoEAGAIN. Currently, this flag is meaningful only forpreadv2().
RWF_APPEND (since Linux 4.16) Provide a per-write equivalent of theO_APPEND open(2) flag. This flag is meaningful only forpwritev2(), and its effect applies only to the data range written by the system call. Theoffsetargument does not affect the write operation; the data is always appended to the end of the file. However, if theoffsetargument is -1, the current file offset is updated
**io_uring**
linux原生AIO的升级版,易用且高效,linux5.1内核版本开始支持.
**libaio**
linux2.6内核之后就有的本地异步非阻塞IO调用,使用此引擎会指定部分选项。
**posixaio**
POSIX1003.1b 实时扩展协议规定的标准异步 I/O 接口,即 aio_read 函数、 aio_write 函数、aio_fsync 函数、aio_cancel 函数、aio_error 函数、aio_return 函数、aio_suspend函数和 lio_listio 函数。这组 API 用来操作异步 I/O。
**solarisaio**
使用Solaris系统本地的异步IO接口
**windowsaio**
windows本地的IO接口
**mmap**
文件通过内存映射到用户空间,使用memcpy写入和读出数据
**splice**
使用splice和vmsplice在用户空间和内核之间传输数据
**sg**
SCSI generic sg v3 io.可以是使用SG_IO ioctl来同步,或是目标是一个sg字符设备,我们使用read和write执行异步IO
**null**
不传输任何数据,只是伪装成这样。主要用于训练使用fio,或是基本debug/test的目的
**net**
根据给定的host:port通过网络传输数据。根据具体的协议,hostname,port,listen,filename这些选项将被用来说明建立哪种连接,协议选项将决定哪种协议被使用。
**netsplice**
像net,但是使用splic/vmsplice来映射数据和发送/接收数据。
**cpuio**
不传输任何的数据,但是要根据cpuload=和cpucycle=选项占用CPU周期.e.g. cpuload=85将使用job不做任何的实际IO,但要占用85%的CPU周期。在SMP机器上,使用numjobs=<no_of_cpu>来获取需要的CPU,因为cpuload仅会载入单个CPU,然后占用需要的比例。
**guasi**
GUASI IO引擎是一般的用于异步IO的用户空间异步系统调用接口
**rdma**
RDMA I/O引擎支持RDMA内存语义(RDMA_WRITE/RDMA_READ)和通道主义(Send/Recv)用于InfiniBand,RoCE和iWARP协议
external指明要调用一个外部的IO引擎(二进制文件)。e.g. ioengine=external:/tmp/foo.o将载入/tmp下的foo.o这个IO引擎
**falloc**
I/O engine that does regular fallocate to simulate data transfer as
fio ioengine.
DDIR_READ
does fallocate(,mode = FALLOC_FL_KEEP_SIZE,).
DDIR_WRITE
does fallocate(,mode = 0).
DDIR_TRIM
does fallocate(,mode = FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE).
**ftruncate**
I/O engine that sends :manpage:`ftruncate(2)` operations in response
to write (DDIR_WRITE) events. Each ftruncate issued sets the file's
size to the current block offset. :option:`blocksize` is ignored.
**e4defrag**
I/O engine that does regular EXT4_IOC_MOVE_EXT ioctls to simulate
defragment activity in request to DDIR_WRITE event.
**rados**
I/O engine supporting direct access to Ceph Reliable Autonomic
Distributed Object Store (RADOS) via librados. This ioengine
defines engine specific options.
**rbd**
I/O engine supporting direct access to Ceph Rados Block Devices
(RBD) via librbd without the need to use the kernel rbd driver. This
ioengine defines engine specific options.
**http**
I/O engine supporting GET/PUT requests over HTTP(S) with libcurl to
a WebDAV or S3 endpoint. This ioengine defines engine specific options.
This engine only supports direct IO of iodepth=1; you need to scale this
via numjobs. blocksize defines the size of the objects to be created.
TRIM is translated to object deletion.
**gfapi**
Using GlusterFS libgfapi sync interface to direct access to
GlusterFS volumes without having to go through FUSE. This ioengine
defines engine specific options.
**gfapi_async**
Using GlusterFS libgfapi async interface to direct access to
GlusterFS volumes without having to go through FUSE. This ioengine
defines engine specific options.
**libhdfs**
Read and write through Hadoop (HDFS). The :option:`filename` option
is used to specify host,port of the hdfs name-node to connect. This
engine interprets offsets a little differently. In HDFS, files once
created cannot be modified so random writes are not possible. To
imitate this the libhdfs engine expects a bunch of small files to be
created over HDFS and will randomly pick a file from them
based on the offset generated by fio backend (see the example
job file to create such files, use ``rw=write`` option). Please
note, it may be necessary to set environment variables to work
with HDFS/libhdfs properly. Each job uses its own connection to
HDFS.
**mtd**
Read, write and erase an MTD character device (e.g.,
:file:`/dev/mtd0`). Discards are treated as erases. Depending on the
underlying device type, the I/O may have to go in a certain pattern,
e.g., on NAND, writing sequentially to erase blocks and discarding
before overwriting. The `trimwrite` mode works well for this
constraint.
**pmemblk**
Read and write using filesystem DAX to a file on a filesystem
mounted with DAX on a persistent memory device through the PMDK
libpmemblk library.
**dev-dax**
Read and write using device DAX to a persistent memory device (e.g.,
/dev/dax0.0) through the PMDK libpmem library.
**external**
Prefix to specify loading an external I/O engine object file. Append
the engine filename, e.g. ``ioengine=external:/tmp/foo.o`` to load
ioengine :file:`foo.o` in :file:`/tmp`. The path can be either
absolute or relative. See :file:`engines/skeleton_external.c` for
details of writing an external I/O engine.
**filecreate**
Simply create the files and do no I/O to them. You still need to
set `filesize` so that all the accounting still occurs, but no
actual I/O will be done other than creating the file.
**filestat**
Simply do stat() and do no I/O to the file. You need to set 'filesize'
and 'nrfiles', so that files will be created.
This engine is to measure file lookup and meta data access.
**libpmem**
Read and write using mmap I/O to a file on a filesystem
mounted with DAX on a persistent memory device through the PMDK
libpmem library.
**ime_psync**
Synchronous read and write using DDN's Infinite Memory Engine (IME).
This engine is very basic and issues calls to IME whenever an IO is
queued.
**ime_psyncv**
Synchronous read and write using DDN's Infinite Memory Engine (IME).
This engine uses iovecs and will try to stack as much IOs as possible
(if the IOs are "contiguous" and the IO depth is not exceeded)
before issuing a call to IME.
**ime_aio**
Asynchronous read and write using DDN's Infinite Memory Engine (IME).
This engine will try to stack as much IOs as possible by creating
requests for IME. FIO will then decide when to commit these requests.
**libiscsi**
Read and write iscsi lun with libiscsi.
**nbd**
Read and write a Network Block Device (NBD).
参考: