在Python中,如何将一个字符串附加到另一个字符串?

  • 问题:
  • 我想要一种高效的方法,在Python中将一个字符串附加到另一个字符串,而不是下面的方法

    var1 = "foo"
    var2 = "bar"
    var3 = var1 + var2

    有什么好的内置方法可以使用吗?在

  • 答案:
  • 如果您只有一个对字符串的引用,并且将另一个字符串连接到末尾,CPython现在会对其进行特殊处理,并尝试在适当的位置扩展该字符串

    最终的结果是,该业务被摊销为O(n)

    e、 g

    s = ""
    for i in range(n):
    s+=str(i)

    以前是O(n^2),现在是O(n)

    源代码(bytesobject.c):

    void
    PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
    {
    PyBytes_Concat(pv, w);
    Py_XDECREF(w);
    }


    /* The following function breaks the notion that strings are immutable:
    it changes the size of a string. We get away with this only if there
    is only one module referencing the object. You can also think of it
    as creating a new string object and destroying the old one, only
    more efficiently. In any case, don't use this if the string may
    already be known to some other part of the code...
    Note that if there's not enough memory to resize the string, the original
    string object at *pv is deallocated, *pv is set to NULL, an "out of
    memory" exception is set, and -1 is returned. Else (on success) 0 is
    returned, and the value in *pv may or may not be the same as on input.
    As always, an extra byte is allocated for a trailing \0 byte (newsize
    does *not* include that), and a trailing \0 byte is stored.
    */

    int
    _PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
    {
    register PyObject *v;
    register PyBytesObject *sv;
    v = *pv;
    if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
    *pv = 0;
    Py_DECREF(v);
    PyErr_BadInternalCall();
    return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
    PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
    if (*pv == NULL) {
    PyObject_Del(v);
    PyErr_NoMemory();
    return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyBytesObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '\0';
    sv->ob_shash = -1; /* invalidate cached hash value */
    return 0;
    }

    从经验上验证是很容易的


    $ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
    1000000 loops, best of 3: 1.85 usec per loop
    $ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
    10000 loops, best of 3: 16.8 usec per loop
    $ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
    10000 loops, best of 3: 158 usec per loop
    $ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
    1000 loops, best of 3: 1.71 msec per loop
    $ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
    10 loops, best of 3: 14.6 msec per loop
    $ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
    10 loops, best of 3: 173 msec per loop

    重要的是但是要注意的是,这种优化并不是Python规范的一部分,据我所知,它只存在于cPython实现中。例如,在pypy或jython上进行的相同的经验测试可能会显示旧的O(n**2)性能


    $ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
    10000 loops, best of 3: 90.8 usec per loop
    $ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
    1000 loops, best of 3: 896 usec per loop
    $ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
    100 loops, best of 3: 9.03 msec per loop
    $ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
    10 loops, best of 3: 89.5 msec per loop

    到目前为止还不错,但是


    $ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
    10 loops, best of 3: 12.8 sec per loop

    哎哟,比二次曲线还要糟糕。所以pypy所做的事情对短字符串很有效,但是对于较大的字符串执行得很差