c++ | 为什么 emplace_back 更快？

5185

2022.10.20

2022.11.19

发布于未知归属地

本来以为自己对 push_back 和 emplace_back 的理解还行，直到我室友伦伦问了一个关于 push_back 和 emplace_back 的问题。死去的 effective modern c++ 记忆又开始攻击我...因此，我痛定思痛，在阅读了大量文献之后总结并写下本文。

本文大概 1200 字，全篇阅读大约需要 8 分钟。

撰写本文出于两点原因：

该问题在面试频频出现
网上的众说纷纭，缺乏系统全面的解释

什么是 push_back？什么是 emplace_back?

push_back

我们在使用 STL 容器时，以 std::vector 为例，经常会用到 push_back 向数组尾部添加一个新的元素。这就是 push_back 的作用：向容器尾部添加一个新的元素。

如果我们熟悉 c++ 模板编程，并且对底层实现有兴趣不妨看看它的定义：(没兴趣可以直接看下一节)

_CONSTEXPR20 void push_back(const _Ty& _Val) { // insert element at end, provide strong guarantee
    _Emplace_one_at_back(_Val);
}

_CONSTEXPR20 void push_back(_Ty&& _Val) {
    // insert by moving into element at end, provide strong guarantee
    _Emplace_one_at_back(_STD move(_Val));
}

除了 vector<bool> 之外，push_back 函数只存在上述两种重载版本。
上述两种版本分别代表传入左值和右值（由于篇幅问题本文不作讨论）。
我们传入的类型必须是 _Ty

什么是 _Ty：在使用模板容器时传入到模板中的类型，比如 vector<int> 那么 _Ty 将会是 int。

emplace_back

emplace_back 在功能上和 push_back 没有区别，在使用上你完全可以放心地把 push_back 替换成 emplace_back（不可逆）。

emplace_back 的定义稍微复杂一些：

template <class... _Valty>
_CONSTEXPR20 decltype(auto) emplace_back(_Valty&&... _Val) {
    _Ty& _Result = _Emplace_one_at_back(_STD forward<_Valty>(_Val)...);
    return _Result;
}

除了 vector<bool> 之外，emplace_back 函数只存在上述一种版本。
可以看到与 push_back 不同的是，它接受可变参数（由于篇幅不作展开，有兴趣可以了解 universal reference, variadic arguments）。
内部实现上两者没有区别，都调用了 _Emplace_one_at_back() 函数。

为什么需要 emplace_back?

这个时候有人就要问了，明明 push_back 就可以实现功能了为什么要整出一个 emplace_back? 不妨一起看看下面这个例子：

int main() {
    std::vector<std::string> sentences;
    sentences.push_back("hello, world");
    return 0;
}

// compile success!

上述代码通过编译，一切看起来万事大吉，但是存在一个隐患：从源码上可以看到 push_back 只接受 _Ty 类型（也就是 std::string），但是 "hello, world" 是 char [13] 类型，显然不满足条件，为啥可以编译通过，这是因为类的隐式转换（见附录）。因此上述代码实际上等价于：

int main() {
    std::vector<std::string> sentences;
    sentences.push_back(std::string("hello, world"));
    return 0;
}

可以看到，上述代码其实在 main 函数里构造了一个临时的 std::string 对象，这是我们不想要看到的（试想某个对象的构造非常消耗时间）。如果利用 emplace_back 就可以完美解决这个问题:

int main() {
    std::vector<std::string> sentences;
    sentences.emplace_back("hello, world");
    return 0;
}

这时我们不再需要隐式转换，因为 emplace_back 接受可变参数，因此直接传入到 emplace_back 函数中进行构造（接下来的操作和 push_back 如出一辙）。

最后看一个综合的例子：

class Person
{
    string name;

public:
    Person(const char *p)
    {
        cout << "construct" << endl;
    }

    Person(const Person &p)
    {
        cout << "Person(const Person&)" << endl;
    }
};

int main()
{
    std::vector<Person> persons;
    persons.reserve(2);
    cout << "---------emplace back---------" << endl;
    persons.emplace_back("John");
    cout << "---------push back---------" << endl;
    persons.push_back("mike");
    return 0;
}

// Output:
// ---------emplace back---------
// construct
// ---------push back---------
// construct
// Person(const Person&)

可以看到 emplace_back 只会调用一次构造函数，不会在原地构造完之后再次调用复制构造函数。

emplace_back 的优势

可以避免创建临时对象，造成性能损失

后记

尽情地把 push_back 替换为 emplace_back 吧~

虽然 emplace_back 仿佛明显优于 push_back，但是这并不意味这 emplace 就一定优于 insert，因为具体的对象类型，和容器内部的组织形式决定了其有效性。因此，在使用过程中我们还需对 emplace 替换 insert 多加考虑。（可以参考《effective modern c++》)

Appendix

类的隐式转换：某个类存在单个参数的构造函数，那么我们可以将该类型的参数隐式转换为一个该类对象。比如：

class Person
{
    string name;

public:
    Person(const char *p) {
        for (int i = 0; i < strlen(p); ++i)
        {
            name.push_back(p[i]);
        }
    }
    string getName() {
        return name;
    }
};

int main() {
    Person mike = "mike";
    cout << mike.getName() << endl;
    return 0;
}

// output:
// mike

如果我们不想要这种隐式转换出现，可以用 explicit 来修饰构造函数：

class Person
{
    string name;

public:
    explicit Person(char *p) {
        for (int i = 0; i < strlen(p); ++i)
        {
            name.push_back(p[i]);
        }
    }
    string getName() {
        return name;
    }
};

int main() {
    Person mike = "mike";
    cout << mike.getName() << endl;
    return 0;
}

// compile error
// vector_push_back_test.cpp:62:19: error：conversion from ‘const char [5]’ to non-scalar type ‘Person’ requested