SIMD与SIMT的区别
Midgard is also a Single Instruction Multiple Data (SIMD) architecture, such that most instructions operate on multiple data elements packed in 128-bit vector registers.
The example below shows how a vec3 arithmetic operation may map onto a pure SIMD unit (pipeline executes one thread per clock):

... vs a quad-based unit (pipeline executes one lane per thread for four threads per clock):

The advantages in terms of the ability to keep the hardware units full of useful work, irrespective of the vector length in the program, is clearly highlighted by these diagrams.
In SIMD, you need to specify the data array + an instruction (on which to operate the data on) + THE INSTRUCTION WIDTH.
Eg: You might want to add 2 integer arrays of length 16, then a SIMD instruction would look like (the instruction has been cooked-up by me for demo)add.16 arr1 arr2
However, SIMT doesn't bother about the instruction width. So, essentially, you could write the above example as:
arr1[i] + arr2[i]
and then launch as many threads as the length of the array, as you want.
Note that, if the array size was, let us say, 32, then SIMD EXPECTS you to explicitly call two such 'add.16' instructions! Whereas, this is not the case with SIMT.
摘自:http://www.gpgpu-sim.org/micro2012-tutorial/4-Microarchitecture.pptx