Cysharp/ZString

ZString.Join is slower than String.Join

udaken opened this issue · 4 comments

ZString.Join is slower than String.Join when combining a list of String elements.
I think that a list with String as an element needs improvement because it is used in many workloads.


Below is the benchmark code.

using BenchmarkDotNet.Attributes;
using Cysharp.Text;
using System;
using System.Collections.Generic;
using System.Linq;
namespace PerfBenchmark.Benchmarks
{
    [Config(typeof(BenchmarkConfig))]
    public class StringListJoinBenchmark
    {
        //private const char Separator = ',';
        private const string Separator = ",";

        List<string> _emptyList;
        IEnumerable<string> _enum1;
        string[] _array1;
        List<string> _list1;
        IEnumerable<string> _enum2;
        string[] _array2;
        List<string> _list2;
        IEnumerable<string> _enum10;
        List<string> _list10;
        string[] _array10;

        public StringListJoinBenchmark()
        {
            _emptyList = new List<string>();
            _enum1 = Enumerable.Repeat(Guid.NewGuid().ToString(), 1);
            _list1 = _enum1.ToList();
            _array1 = _enum1.ToArray();
            _enum2 = Enumerable.Repeat(Guid.NewGuid().ToString(), 2);
            _list2 = _enum2.ToList();
            _array2 = _enum2.ToArray();
            _enum10 = Enumerable.Repeat(Guid.NewGuid().ToString(), 10);
            _list10 = _enum10.ToList();
            _array10 = _enum10.ToArray();
        }

        [Benchmark]
        public string JoinEmptyList() => String.Join(Separator, _emptyList);

        [Benchmark]
        public string ZJoinEmptyList() => ZString.Join(Separator, _emptyList);

        [Benchmark]
        public string JoinList1() => String.Join(Separator, _list1);

        [Benchmark]
        public string ZJoinList1() => ZString.Join(Separator, _list1);

        [Benchmark]
        public string JoinArray1() => String.Join(Separator, _array1);

        [Benchmark]
        public string ZJoinArray1() => ZString.Join(Separator, _array1);

        [Benchmark]
        public string JoinEnumerable1() => String.Join(Separator, _enum1);

        [Benchmark]
        public string ZJoinEnumerable1() => ZString.Join(Separator, _enum1);

        [Benchmark]
        public string JoinList2() => String.Join(Separator, _list2);

        [Benchmark]
        public string ZJoinList2() => ZString.Join(Separator, _list2);

        [Benchmark]
        public string JoinArray2() => String.Join(Separator, _array2);

        [Benchmark]
        public string ZJoinArray2() => ZString.Join(Separator, _array2);

        [Benchmark]
        public string JoinEnumerable2() => String.Join(Separator, _enum2);

        [Benchmark]
        public string ZJoinEnumerable2() => ZString.Join(Separator, _enum2);

        [Benchmark]
        public string JoinList10() => String.Join(Separator, _list10);

        [Benchmark]
        public string ZJoinList10() => ZString.Join(Separator, _list10);

        [Benchmark]
        public string JoinArray10() => String.Join(Separator, _array10);

        [Benchmark]
        public string ZJoinArray10() => ZString.Join(Separator, _array10);

        [Benchmark]
        public string JoinEnumerable10() => String.Join(Separator, _enum10);

        [Benchmark]
        public string ZJoinEnumerable10() => ZString.Join(Separator, _enum10);

    }
}

The following is the result of the execution. (.NET framework has a similar result.)

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=3.1.400-preview-015151
  [Host]   : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
  ShortRun : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT

Job=ShortRun  IterationCount=1  LaunchCount=1  
WarmupCount=1  
Method Mean Error Gen 0 Gen 1 Gen 2 Allocated
JoinEmptyList 19.697 ns NA 0.0048 - - 40 B
ZJoinEmptyList 11.102 ns NA - - - -
JoinList1 29.702 ns NA 0.0048 - - 40 B
ZJoinList1 51.796 ns NA 0.0114 - - 96 B
JoinArray1 3.841 ns NA - - - -
ZJoinArray1 51.001 ns NA 0.0114 - - 96 B
JoinEnumerable1 23.128 ns NA 0.0048 - - 40 B
ZJoinEnumerable1 81.611 ns NA 0.0162 - - 136 B
JoinList2 93.200 ns NA 0.0248 - - 208 B
ZJoinList2 97.387 ns NA 0.0200 - - 168 B
JoinArray2 25.569 ns NA 0.0201 - - 168 B
ZJoinArray2 92.378 ns NA 0.0200 - - 168 B
JoinEnumerable2 77.429 ns NA 0.0248 - - 208 B
ZJoinEnumerable2 128.068 ns NA 0.0248 - - 208 B
JoinList10 526.850 ns NA 0.2842 0.0019 - 2384 B
ZJoinList10 400.351 ns NA 0.0906 - - 760 B
JoinArray10 97.722 ns NA 0.0908 0.0002 - 760 B
ZJoinArray10 411.841 ns NA 0.0906 - - 760 B
JoinEnumerable10 426.296 ns NA 0.2847 0.0024 - 2384 B
ZJoinEnumerable10 483.942 ns NA 0.0954 - - 800 B

solution.

In a generic method, branch when T is a string.

Change the sb.Append(values[i]); to the following.

if (typeof(T) == typeof(string))
{
    sb.Append(Unsafe.As<string>(values[i]));
}
else
{
    sb.Append(values[i]);
}

The benchmark results are detailed below.

Method Mean Error Gen 0 Gen 1 Gen 2 Allocated
JoinEmptyList 21.415 ns NA 0.0048 - - 40 B
ZJoinEmptyList 15.892 ns NA - - - -
JoinList1 27.790 ns NA 0.0048 - - 40 B
ZJoinList1 34.238 ns NA 0.0114 - - 96 B
JoinArray1 3.721 ns NA - - - -
ZJoinArray1 39.208 ns NA 0.0114 - - 96 B
JoinEnumerable1 23.032 ns NA 0.0048 - - 40 B
ZJoinEnumerable1 54.425 ns NA 0.0162 - - 136 B
JoinList2 91.815 ns NA 0.0248 - - 208 B
ZJoinList2 54.395 ns NA 0.0200 - - 168 B
JoinArray2 26.446 ns NA 0.0201 - - 168 B
ZJoinArray2 63.063 ns NA 0.0200 - - 168 B
JoinEnumerable2 75.099 ns NA 0.0248 - - 208 B
ZJoinEnumerable2 76.057 ns NA 0.0248 - - 208 B
JoinList10 514.121 ns NA 0.2842 0.0019 - 2384 B
ZJoinList10 177.401 ns NA 0.0908 0.0002 - 760 B
JoinArray10 106.759 ns NA 0.0908 0.0002 - 760 B
ZJoinArray10 202.832 ns NA 0.0908 0.0002 - 760 B
JoinEnumerable10 413.451 ns NA 0.2847 0.0024 - 2384 B
ZJoinEnumerable10 218.206 ns NA 0.0956 0.0002 - 800 B

I edited the first comment to change the amount of data and add an array case.

System.Join seems to be optimized for arrays.

In #18, it was always faster than in version 2.1.2 when using string as a type argument.

However, String.Join is faster in some cases.

Ratio(ZString Ver 2.1.2, base is CLR) Ratio(udaken/ZString@6b94c72, base is CLR))
Join(string, List<string>) with empty List 56.36% 23.08%
Join(string, List<string>) with 1 element 174.39% ⚠ 23.35%
Join(string, string[]) with 1 element 1327.81% ⚠ 754.88%
Join(string, IEnumerable<string>) with 1 element 352.87% 226.50%
Join(string, List<string>) with 2 elements 104.49% 59.39%
Join(string, string[]) with 2 elements 361.29% 297.61%
Join(string, IEnumerable<string>) with 2 elements 165.40% 101.47%
Join(string, List<string>) with 10 elements 75.99% 34.03%
Join(string, string[]) with 10 elements 421.44% 196.26%
Join(string, IEnumerable<string>) with 10 elements 113.52% 49.99%

⚠ indicates that allocation will occur.

The benchmark results for the udaken/ZString@6b94c72 are detailed below.

Method Mean Error Gen 0 Gen 1 Gen 2 Allocated
JoinEmptyList 17.441 ns NA 0.0048 - - 40 B
ZJoinEmptyList 4.136 ns NA - - - -
JoinList1 25.531 ns NA 0.0048 - - 40 B
ZJoinList1 6.585 ns NA - - - -
JoinArray1 3.665 ns NA - - - -
ZJoinArray1 27.358 ns NA - - - -
JoinEnumerable1 23.882 ns NA 0.0048 - - 40 B
ZJoinEnumerable1 56.520 ns NA 0.0162 - - 136 B
JoinList2 95.549 ns NA 0.0248 - - 208 B
ZJoinList2 61.030 ns NA 0.0200 - - 168 B
JoinArray2 26.032 ns NA 0.0201 - - 168 B
ZJoinArray2 75.796 ns NA 0.0200 - - 168 B
JoinEnumerable2 72.158 ns NA 0.0248 - - 208 B
ZJoinEnumerable2 82.503 ns NA 0.0248 - - 208 B
JoinList10 519.987 ns NA 0.2842 0.0019 - 2384 B
ZJoinList10 193.093 ns NA 0.0908 0.0002 - 760 B
JoinArray10 108.597 ns NA 0.0908 0.0002 - 760 B
ZJoinArray10 184.746 ns NA 0.0908 0.0002 - 760 B
JoinEnumerable10 426.553 ns NA 0.2847 0.0024 - 2384 B
ZJoinEnumerable10 217.014 ns NA 0.0956 0.0002 - 800 B

I've released 2.1.3, thanks for your great contribution.