22.opengl高级-实例化
一、原理
绘制有共同特征,或者按照一定规则变化的图形阵列,如果挨个按照普通流程来绘制:绑定VAO、绑定纹理、设置uniform-->调用glDrawArrays(GL_TRIANGLES, 0, amount_of_vertices)性能上会比较差,opengl渲染管线流程中,CPU<-->GPU数据通信是很大的开销。
所以,引入了实例化方案,“实例化”听起来并不能见文知意,本质是设计一个新的API接口,可以一次性把数据从CPU传输到GPU,提升性能。
二、demo1-绘制100个方形的阵列
直接上demo代码,一看便知
2.1. 添加方形位置和颜色数据
float quadVertices[] = {
// 位置 // 颜色
-0.05f, 0.05f, 1.0f, 0.0f, 0.0f,
0.05f, -0.05f, 0.0f, 1.0f, 0.0f,
-0.05f, -0.05f, 0.0f, 0.0f, 1.0f,
-0.05f, 0.05f, 1.0f, 0.0f, 0.0f,
0.05f, -0.05f, 0.0f, 1.0f, 0.0f,
0.05f, 0.05f, 0.0f, 1.0f, 1.0f
};
2.2. 片段着色器,很普通,没有额外工作
#version 330 core
out vec4 FragColor;
in vec3 fColor;
void main()
{
FragColor = vec4(fColor, 1.0);
}
2.3.顶点着色器有差别,通过gl_InstanceID递增来控制position的偏移
gl_InstanceID和后面主程序中的代码有管理
#version 330 core
layout (location = 0) in vec2 aPos;
layout (location = 1) in vec3 aColor;
out vec3 fColor;
uniform vec2 offsets[100];
void main()
{
vec2 offset = offsets[gl_InstanceID];
gl_Position = vec4(aPos + offset, 0.0, 1.0);
fColor = aColor;
}
2.4. 主程序中增加偏移数组,并传递给顶点着色器
两个for循环,简单的生成偏移数组,这里有点意思,index x和y定义从int -10开始,猜测是因为int 值的计算性能高一些,也可能是for循环中一般都是int 值,很少看到有float类型的,可能是习惯吧。
glm::vec2 translations[100];
int index = 0;
float offset = 0.1f;
for(int y = -10; y < 10; y += 2)
{
for(int x = -10; x < 10; x += 2)
{
glm::vec2 translation;
translation.x = (float)x / 10.0f + offset;
translation.y = (float)y / 10.0f + offset;
translations[index++] = translation;
}
}
通过字符串拼接把数组里的值挨个传递给顶点着色器中的统一变量,这里看起来有点啰嗦,看起来也没办法,着色器设置值,没有向量数组
shader.use();
for(unsigned int i = 0; i < 100; i++)
{
stringstream ss;
string index;
ss << i;
index = ss.str();
shader.setVec2(("offsets[" + index + "]").c_str(), translations[i]);
}
最后调用glDrawArraysInstanced方法绘制,注意最后一个参数设置成100,和前面顶点着色器中的gl_InstanceID值会绑定,opengl帮我们干了这个事。
注意,drawArrays一次是读取6个顶点,所以gl_InstanceID的值会在顶点着色器的main执行6次之后 +1,即步幅是6。
glBindVertexArray(quadVAO);
glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 100);

主程序完整代码在文末
三、 demo2 实例化数组
如果渲染实例远超过100,最终会超过着色器中uniform变量传递数据的上限,替代方案是实例化数组,这是一个顶点数组,在内存中单独开辟一块区域能够存储更多的数据,而且可以自动更新
跟着代码理解
3.1. 片段着色器不变。更新顶点着色器,增加aOffset,实例化数组会自动给aOffset赋值,赋值的节奏可以自己设置
#version 330 core
layout (location = 0) in vec2 aPos;
layout (location = 1) in vec3 aColor;
layout (location = 2) in vec2 aOffset;
out vec3 fColor;
void main()
{
gl_Position = vec4(aPos + aOffset, 0.0, 1.0);
fColor = aColor;
}
3.2. 主程序中增加实例化数组逻辑
生成一个buffer
unsigned int instanceVBO;
glGenBuffers(1, &instanceVBO);
glBindBuffer(GL_ARRAY_BUFFER, instanceVBO);
glBufferData(GL_ARRAY_BUFFER, sizeof(glm::vec2) * 100, &translations[0], GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
绑定到前面声明的VAO上
// 顶点着色器中aOffset变量的location = 2
glEnableVertexAttribArray(2);
// 绑定顶点数组 instanceVBO像是一个队列,不断取出数据赋值给aOffset
glBindBuffer(GL_ARRAY_BUFFER, instanceVBO);
glVertexAttribPointer(2, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(float), (void*)0);
// 养成好习惯,用完了buffer,切回到0,避免数据绑定出错
glBindBuffer(GL_ARRAY_BUFFER, 0);
// 制定切割策略,第一个参数表示一次读取两个参数,vec2是两个数据
// 第二个参数表示循环频次,1表示draw一次更新数组里的数据,0表示取一个点更新一次,2表示draw两个更新数据
glVertexAttribDivisor(2, 1);
最后时限效果同demo1
3.3. demo2 增加点趣味性,结合gl_InstanceID,从右到左下逐渐缩小四边形
只需要修改顶点着色器
void main()
{
vec2 pos = aPos * (gl_InstanceID / 100.0);
gl_Position = vec4(pos + aOffset, 0.0, 1.0);
fColor = aColor;
}

四、 demo3-小行星带
4.1. 不使用实例化数组实现,自己实现的话,也并不是很简单,需要把思路捋顺,顶点着色器和片段着色器没有额外的工作,主要看主程序
测试绘制5000个石头开始卡顿,10 000个石头非常明显的卡顿,移动鼠标变化坐标有非常明显的丢帧
加载行星和石头模型
Shader shader("1.colors.vs", "1.colors.fs");
Model rock("resource/rock/rock.obj");
Model planet("resource/planet/planet.obj");
行星只有一个,绘制比较简单。石头需要沿行星环绕轨道模拟随机位置,下面说明关键的算法实现
-
x-z平面 360°划分成1000份,每次画一个石头
-
分别计算x y z方向的偏移,模拟石头随机的分布在环状区域里
1)x 轴偏移量控制在[-2.5f, 2.5f]范围内,重点看随机数的实现//实现[0, 5.0f]的随机数
// rand()返回的是int值,int值映射到浮点值,通过取模,对2offset100(500)取模得到0--500的整数,除以100.0f得到[0.0f-5.0f]的随机数
// 这里 100 还是1000 或是10,主要看想精确到几位,100精确到小数点后2位
(rand() % (int)(2 * offset * 100)) / 100.0f
2) y轴偏移算法和x相似,区别是x轴是sin(angle),y轴向上,displacement * 0.4取值范围[-1.0f,1.0f]
3) z轴和x轴相同,区别是z轴坐标是cos(angle) -
每个石头相对于自身的角度随机取一个旋转值,很好理解
float rotAngle = (rand() % 360);
model = glm::rotate(model, rotAngle, glm::vec3(0.4f, 0.6f, 0.8f));
主程序生成随机石头代码完整实现
unsigned int amount = 1000;
glm::mat4* modelMatrices;
modelMatrices = new glm::mat4[amount];
srand(glfwGetTime());
float radius = 50.0;
float offset = 2.5f;
// x-z平面 360划分成1000份,每次画一个石头
for (unsigned int i = 0; i < amount; i++) {
glm::mat4 model = glm::mat4(1.0f);
// 1. translation:displace along circle with ‘radius’ in range[-offset, offset]
float angle = (float)i / (float)amount * 360.0f;
float displacement = (rand() % (int)(2 * offset * 100)) / 100.0f - offset;
float x = sin(angle) * radius + displacement;
displacement = (rand() % (int)(2 * offset * 100))/ 100.0f - offset;
float y = displacement * 0.4f;
displacement = (rand() % (int)(2 * offset * 100)) / 100.0f - offset;
float z = cos(angle) * radius + displacement;
model = glm::translate(model, glm::vec3(x, y, z));
// 2. scale Scale between 0.05 and 0.25f
float scale = (rand() % 20) / 100.0f + 0.05;
model = glm::scale(model, glm::vec3(scale));
// 3. rotation:add random rotaion around a (semi)randomly picked rotation axis vector
float rotAngle = (rand() % 360);
model = glm::rotate(model, rotAngle, glm::vec3(0.4f, 0.6f, 0.8f));
// 4. now add to list of matrices
modelMatrices[i] = model;
}

4.2 使用实例化数组来绘制石头
在我的机器上使用实例化数组并没有提升性能,绘制5w以上颗石头,和常规方法绘制一样卡,会不会是mac上对opengl的支持不太好呢?
代码实现如下:
原来的着色器不变,用来绘制行星,新增石头着色器,为一个mat4本质上是4个vec4,我们需要为这个矩阵预留4个顶点属性。因为我们将它的位置值设置为3,矩阵每一列的顶点属性位置值就是3、4、5和6。
#version 330 core
layout (location = 0) in vec3 aPos;
layout (location = 2) in vec2 aTexCoords;
layout (location = 3) in mat4 instanceMatrix;
out vec2 TexCoords;
uniform mat4 projection;
uniform mat4 view;
void main()
{
gl_Position = projection * view * instanceMatrix * vec4(aPos, 1.0);
TexCoords = aTexCoords;
}
绑定数据到顶点缓冲对象,顶点数组对象在Mesh中
// 顶点缓冲对象
unsigned int buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_ARRAY_BUFFER, buffer);
glBufferData(GL_ARRAY_BUFFER, amount * sizeof(glm::mat4), &modelMatrices[0], GL_STATIC_DRAW);
for(unsigned int i = 0; i < rock.meshes.size(); i++)
{
unsigned int VAO = rock.meshes[i].VAO;
glBindVertexArray(VAO);
// 顶点属性
GLsizei vec4Size = sizeof(glm::vec4);
glEnableVertexAttribArray(3);
glVertexAttribPointer(3, 4, GL_FLOAT, GL_FALSE, 4 * vec4Size, (void*)0);
glEnableVertexAttribArray(4);
glVertexAttribPointer(4, 4, GL_FLOAT, GL_FALSE, 4 * vec4Size, (void*)(vec4Size));
glEnableVertexAttribArray(5);
glVertexAttribPointer(5, 4, GL_FLOAT, GL_FALSE, 4 * vec4Size, (void*)(2 * vec4Size));
glEnableVertexAttribArray(6);
glVertexAttribPointer(6, 4, GL_FLOAT, GL_FALSE, 4 * vec4Size, (void*)(3 * vec4Size));
glVertexAttribDivisor(3, 1);
glVertexAttribDivisor(4, 1);
glVertexAttribDivisor(5, 1);
glVertexAttribDivisor(6, 1);
glBindVertexArray(0);
}
绘制小行星,注意,一个石头实际上有多个面的网格,每一个网格有VAO,遍历石头的网格,每次绘制所有石头的一个面,最后拼成行星带。每次for循环都得绘制10w次
instanceShader.use();
for(unsigned int i = 0; i < rock.meshes.size(); i++)
{
glBindVertexArray(rock.meshes[i].VAO);
glDrawElementsInstanced(
GL_TRIANGLES, rock.meshes[i].indices.size(), GL_UNSIGNED_INT, 0, amount
);
}
注意调整行星带的半径,让石头分散一点,半径为150.0f
4.完整代码
1. demo1主程序完整代码
#include <glad/glad.h>
#include <GLFW/glfw3.h>
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#include <glm/glm.hpp>
#include <glm/gtc/matrix_transform.hpp>
#include <glm/gtc/type_ptr.hpp>
#include "Shader.h"
#include "camera.h"
#include "model.h"
#include <iostream>
void framebuffer_size_callback(GLFWwindow* window, int width, int height);
void mouse_callback(GLFWwindow* window, double xpos, double ypos);
void scroll_callback(GLFWwindow* window, double xoffset, double yoffset);
void processInput(GLFWwindow *window);
unsigned int loadTexture(const char *path);
unsigned int loadCubemap(vector<std::string> faces);
// settings
const unsigned int SCR_WIDTH = 800;
const unsigned int SCR_HEIGHT = 600;
// camera
Camera camera(glm::vec3(0.0f, 0.5f, 30.0f));
float lastX = (float)SCR_WIDTH / 2.0;
float lastY = (float)SCR_HEIGHT / 2.0;
bool firstMouse = true;
// timing
float deltaTime = 0.0f;
float lastFrame = 0.0f;
int main()
{
// glfw: initialize and configure
// ------------------------------
glfwInit();
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
#ifdef __APPLE__
glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
#endif
// glfw window creation
// --------------------
GLFWwindow* window = glfwCreateWindow(SCR_WIDTH, SCR_HEIGHT, "天哥学opengl", NULL, NULL);
if (window == NULL)
{
std::cout << "Failed to create GLFW window" << std::endl;
glfwTerminate();
return -1;
}
glfwMakeContextCurrent(window);
glfwSetFramebufferSizeCallback(window, framebuffer_size_callback);
glfwSetCursorPosCallback(window, mouse_callback);
glfwSetScrollCallback(window, scroll_callback);
// tell GLFW to capture our mouse
// glfwSetInputMode(window, GLFW_CURSOR, GLFW_CURSOR_DISABLED);
// glad: load all OpenGL function pointers
// ---------------------------------------
if (!gladLoadGLLoader((GLADloadproc)glfwGetProcAddress))
{
std::cout << "Failed to initialize GLAD" << std::endl;
return -1;
}
// glPolygonMode(GL_FRONT_AND_BACK ,GL_LINE );
// configure global opengl state
// -----------------------------
glEnable(GL_DEPTH_TEST);
float quadVertices[] = {
// 位置 // 颜色
-0.05f, 0.05f, 1.0f, 0.0f, 0.0f,
0.05f, -0.05f, 0.0f, 1.0f, 0.0f,
-0.05f, -0.05f, 0.0f, 0.0f, 1.0f,
-0.05f, 0.05f, 1.0f, 0.0f, 0.0f,
0.05f, -0.05f, 0.0f, 1.0f, 0.0f,
0.05f, 0.05f, 0.0f, 1.0f, 1.0f
};
// VAO
unsigned int VAO, VBO;
glGenVertexArrays(1, &VAO);
glGenBuffers(1, &VBO);
glBindVertexArray(VAO);
glBindBuffer(GL_ARRAY_BUFFER, VBO);
glBufferData(GL_ARRAY_BUFFER, sizeof(quadVertices), &quadVertices, GL_STATIC_DRAW);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 5 * sizeof(float), (void*)0);
glEnableVertexAttribArray(0);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 5 * sizeof(float), (void*)(2 * sizeof(float)));
glEnableVertexAttribArray(1);
glBindVertexArray(0);
// build and compile shaders
// -------------------------
Shader shader("1.colors.vs", "1.colors.fs");
// set up vertex data (and buffer(s)) and configure vertex attributes
// ------------------------------------------------------------------
glm::vec2 translations[100];
int index = 0;
float offset = 0.1f;
for (int y = -10; y < 10; y += 2) {
for (int x = -10; x < 10; x += 2) {
glm::vec2 translation;
translation.x = (float)x / 10.0f + offset;
translation.y = (float)y / 10.0f + offset;
translations[index++] = translation;
}
}
shader.use();
for (unsigned int i = 0; i < 100; i++) {
stringstream ss;
string index;
ss << i;
index = ss.str();
shader.setVec2(("offsets[" + index + "]").c_str(), translations[i]);
}
// render loop
// -----------
while (!glfwWindowShouldClose(window))
{
glClearColor(0.1f, 0.1f, 0.1f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
processInput(window);
glm::mat4 projection = glm::perspective(glm::radians(45.0f), (float)SCR_WIDTH / (float)SCR_HEIGHT, 1.0f, 100.0f);
glm::mat4 view = camera.GetViewMatrix();
glm::mat4 model = glm::mat4(1.0f);
shader.use();
// shader.setMat4("projection", projection);
// shader.setMat4("view", view);
// shader.setMat4("model", model);
glBindVertexArray(VAO);
glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 100);
// glfw: swap buffers and poll IO events (keys pressed/released, mouse moved etc.)
// -------------------------------------------------------------------------------
glfwSwapBuffers(window);
glfwPollEvents();
}
// optional: de-allocate all resources once they've outlived their purpose:
// ------------------------------------------------------------------------
glfwTerminate();
return 0;
}
// process all input: query GLFW whether relevant keys are pressed/released this frame and react accordingly
// ---------------------------------------------------------------------------------------------------------
bool startRecord = false;
void processInput(GLFWwindow *window)
{
if (glfwGetKey(window, GLFW_KEY_Y))
{
std::cout << "Y" << std::endl;
startRecord = true;
}
if (glfwGetKey(window, GLFW_KEY_N))
{
std::cout << "N" << std::endl;
startRecord = false;
}
if (startRecord) {
return;
}
if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS)
glfwSetWindowShouldClose(window, true);
if (glfwGetKey(window, GLFW_KEY_W) == GLFW_PRESS)
camera.ProcessKeyboard(FORWARD, deltaTime);
if (glfwGetKey(window, GLFW_KEY_S) == GLFW_PRESS)
camera.ProcessKeyboard(BACKWARD, deltaTime);
if (glfwGetKey(window, GLFW_KEY_A) == GLFW_PRESS)
camera.ProcessKeyboard(LEFT, deltaTime);
if (glfwGetKey(window, GLFW_KEY_D) == GLFW_PRESS)
camera.ProcessKeyboard(RIGHT, deltaTime);
}
// glfw: whenever the window size changed (by OS or user resize) this callback function executes
// ---------------------------------------------------------------------------------------------
void framebuffer_size_callback(GLFWwindow* window, int width, int height)
{
// make sure the viewport matches the new window dimensions; note that width and
// height will be significantly larger than specified on retina displays.
glViewport(0, 0, width, height);
}
// glfw: whenever the mouse moves, this callback is called
// -------------------------------------------------------
void mouse_callback(GLFWwindow* window, double xpos, double ypos)
{
// std::cout << "xpos : " << xpos << std::endl;
// std::cout << "ypos : " << ypos << std::endl;
if (startRecord) {
return;
}
if (firstMouse)
{
lastX = xpos;
lastY = ypos;
firstMouse = false;
}
float xoffset = xpos - lastX;
float yoffset = lastY - ypos; // reversed since y-coordinates go from bottom to top
lastX = xpos;
lastY = ypos;
// std::cout << "xoffset : " << xoffset << std::endl;
// std::cout << "yoffset : " << yoffset << std::endl;
camera.ProcessMouseMovement(xoffset, yoffset);
}
// glfw: whenever the mouse scroll wheel scrolls, this callback is called
// ----------------------------------------------------------------------
void scroll_callback(GLFWwindow* window, double xoffset, double yoffset)
{
camera.ProcessMouseScroll(yoffset);
}
// utility function for loading a 2D texture from file
// ---------------------------------------------------
unsigned int loadTexture(char const * path)
{
unsigned int textureID;
glGenTextures(1, &textureID);
int width, height, nrComponents;
unsigned char *data = stbi_load(path, &width, &height, &nrComponents, 0);
if (data)
{
GLenum format;
if (nrComponents == 1)
format = GL_RED;
else if (nrComponents == 3)
format = GL_RGB;
else if (nrComponents == 4)
format = GL_RGBA;
glBindTexture(GL_TEXTURE_2D, textureID);
glTexImage2D(GL_TEXTURE_2D, 0, format, width, height, 0, format, GL_UNSIGNED_BYTE, data);
glGenerateMipmap(GL_TEXTURE_2D);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
stbi_image_free(data);
}
else
{
std::cout << "Texture failed to load at path: " << path << std::endl;
stbi_image_free(data);
}
return textureID;
}
unsigned int loadCubemap(vector<std::string> faces)
{
unsigned int textureID;
glGenTextures(1, &textureID);
glBindTexture(GL_TEXTURE_CUBE_MAP, textureID);
int width, height, nrChannels;
for (unsigned int i = 0; i < faces.size(); i++) {
unsigned char *data = stbi_load(faces[i].c_str(), &width, &height, &nrChannels, 0);
if (data)
{
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + i, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, data);
stbi_image_free(data);
}
else
{
std::cout << "Cubemap texture failed to load at path: " << faces[i] << std::endl;
stbi_image_free(data);
}
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE);
}
return textureID;
}