C++速度很快
2021-12-27 本文已影响0人
疾风2018
测试内存访问密集型
计算10亿个整型数的和,开了OpenMP,花费时间是700毫秒,不开OpenMP则更快,150毫秒。这是因为内存访问占主要时间,多核计算并不会提高内存访问效率。
补充一点:Java计算1亿个整型数的和,用了300毫秒。可以反衬C++的效率。
代码
代码贴在下面。
// cpp2.cpp : 此文件包含 "main" 函数。程序执行将在此处开始并结束。
//
#include <chrono>
#include <iostream>
#include <math.h>
int main()
{
using std::chrono::system_clock;
using namespace std::chrono;
short * data = new short[1000000000];
for (size_t i = 0; i < 1000000000; i++)
{
data[i] = (short)std::rand();
}
auto now = std::chrono::system_clock::now();
long sum = 0;
# ifdef _OPENMP
printf_s("Compiled by an OpenMP-compliant implementation.\n");
# endif
#pragma omp parallel for
for (long i = 0; i < 1000000000; i++)
{
int a = data[i];
sum += a;
}
auto now2 = system_clock::now();
std::cout << "Sum:" << sum << std::endl << "time: " << duration_cast<milliseconds>(now2 - now).count();
delete data;
}
测试计算密集型
生成10亿个整型数的随机值,开了OpenMP,花费时间是9秒,不开OpenMP则需要4倍时间(程序运行在4核CPU上),36秒。只有计算密集型的程序才能发挥多核的优势。
代码
// cpp2.cpp : 此文件包含 "main" 函数。程序执行将在此处开始并结束。
//
#include <chrono>
#include <iostream>
#include <math.h>
#include <variant>
void func1()
{
std::variant<int, double, float> a, b;
a.emplace<int>(10);
b.emplace<float>(1.1);
std::cout << std::get<float>(b) << " , " << std::get<int>(a);
}
int func2(int one, int two)
{
return one + two;
}
int main()
{
using std::chrono::system_clock;
using namespace std::chrono;
short * data = new short[1000000000];
#pragma omp parallel for
for (long i = 0; i < 1000000000; i++)
{
data[i] = (short)std::rand();
}
auto now = std::chrono::system_clock::now();
long sum = 0;
# ifdef _OPENMP
printf_s("Compiled by an OpenMP-compliant implementation.\n");
# endif
#pragma omp parallel for
for (long i = 0; i < 1000000000; i++)
{
data[i] = (short)std::rand();
}
auto now2 = system_clock::now();
std::cout << "Sum:" << sum << std::endl << "time: " << duration_cast<milliseconds>(now2 - now).count();
delete[] data;
}
测试file mapping
先往文件里写一亿个Int值,再通过file mapping的方式加载这个文件里的数据,对其数据做sum聚合运算。在启动OpenMP以及各项优化选项全部拉满的情况下,时间耗费200ms。这样比起来,也就跟Java差不多了。
代码
int main()
{
using std::chrono::system_clock;
using namespace std::chrono;
using namespace boost::interprocess;
const unsigned long data_size = 100000000;
const std::string file_path = "C:\\Users\\DELL\\cx1.bin";
constexpr unsigned long file_size = data_size * sizeof(int);
std::filebuf fbuf;
fbuf.open(file_path, std::ios_base::in | std::ios_base::out | std::ios_base::trunc | std::ios_base::binary);
fbuf.pubseekoff(file_size-1, std::ios_base::beg);
fbuf.sputc(0x88);
fbuf.close();
auto now = std::chrono::system_clock::now();
file_mapping m_file(file_path.c_str(), read_write);
//Map the whole file with read-write permissions in this process
mapped_region region(m_file, read_write, 0, file_size);
void* data = region.get_address();
std::size_t count = region.get_size() / sizeof(int);
long sum = 0;
# ifdef _OPENMP
printf_s("Compiled by an OpenMP-compliant implementation.\n");
# endif
#pragma omp parallel for
for (long i = 0; i < count; i++)
{
sum = sum + *((int*)data + i);
}
auto now2 = system_clock::now();
std::cout << "Sum:" << sum << std::endl << "time: " << duration_cast<milliseconds>(now2 - now).count();
return 0;
}