how to find out the maximum FLOP

2024-06-19  本文已影响0人  小怪兽狂殴奥特曼

Maximum FLOPS

the maximum FLOP of a GPU can be found out by the following fomula:

maximum_flop = CUDA_core_number * clock_speed *2

let's take RTX3070 as example.
RTX3070 has two types of clock speed:
base clock speed: 1500MHz
boost clock speed: 1725 MHz

and RTX3070 has 5888 cuda core.

for single-precision float32, its maximum_flop = 1725 MHz * 5888 * 2 = 20.32T FLOP/s

why multiply by 2?
CUDA core can perform two floating point operations in each clock cycle. specifically, CUDA core can perform one fused multiply-add(FMA) operations and one addition.

Maximum Bandwidth

the maximum Bandwidth of a GPU can be found out by the following fomula:

maximum_bandwidth = memory_clock_speed * memory_interface_width / 8

RTX3070 has following specification:

then maximum_bandwidth = 14 Gbps * 256 / 8 = 448GB/s

Computing Memory Ratio算存比

computing_memory_ratio = max_flop / max_bandwidth

for rtx3070 on dealing with 32-bit floating point, its computing_memory_ratio = (20.32 T FLOP) / (0.448 TB / 4) = 181.4
which means for each memory accessing, we can perform 181 computing operations.
any operation with computing_memory_ratio exceeds 181.4 is a computing-bound operation, otherwise is a memory-bound operation
refer:CUDA: From Correctness to Performance

上一篇 下一篇

猜你喜欢

热点阅读