Quansight-Labs/numpy.net

[Requirement]Support serialization

ChengYen-Tang opened this issue · 14 comments

I want ndarray, shape, np.random, dtype, serializable using system.text.json
https://learn.microsoft.com/en-us/dotnet/api/system.text.json?view=net-7.0

Thanks

I like the idea of being able to serialize the ndarray objects but it is going to take some work. I have attached a quick sample application serializes the shape class. I am testing it with both System.Text.Json and NewtonSoft.Json.

Note: System.Text.Json is only supported on later versions of .NET. I build numpydotnet for .net standard 2.0 which does not support this. I think there are users of the library that can't move to a new version of the .NET so it can't be built into the library.

what I propose is modifying the various objects so that the application can successfully be serialized by at least these two most commonly used serializers. We probably should ensure that XML can properly serialize it as well.

Does this like a good approach for your needs?

NumpSerializationTest.zip

Let me think about it again, because another framework I use is to use System.Text.Json to serialize

Set latest in csproj
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/configure-language-version

I pushed up a new release with support for serialization. It seems to work well for Newtonsoft.JSON, XML and your choice of System.Text.JSON as indicated in the sample below.

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Text;
using System.Linq;
using Newtonsoft;
using npy_intp = System.Int64;
using NumpyDotNet;

namespace ConsoleApp2
{
    internal class Program
    {
        static void Main(string[] args)
        {
            shape a = new shape(2, 3, 4, 5);

            System.Text.Json.JsonSerializerOptions options = new System.Text.Json.JsonSerializerOptions();
            options.IncludeFields = true;
            options.DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull;

            string jsonString = System.Text.Json.JsonSerializer.Serialize(a);
            Console.WriteLine(jsonString);

            jsonString = System.Text.Json.JsonSerializer.Serialize(a, options);
            Console.WriteLine(jsonString);

            string jsonString2 = Newtonsoft.Json.JsonConvert.SerializeObject(a);
            Console.WriteLine(jsonString2);


            shape b = System.Text.Json.JsonSerializer.Deserialize<shape>(jsonString);
            shape c = Newtonsoft.Json.JsonConvert.DeserializeObject<shape>(jsonString2);


            ndarray aa = np.array(new int[] {0,1,2,3,4,5,6,7,8}).reshape(3, 3);
            var x = System.Text.Json.JsonSerializer.Serialize(aa.ToSerializable(), options);
            Console.WriteLine("AA");
            Console.WriteLine(x);

            var x1 = System.Text.Json.JsonSerializer.Deserialize<ndarray_serializable>(x, options);

            ndarray bb = np.FromSerializable(x1);
            var y = System.Text.Json.JsonSerializer.Serialize(bb.ToSerializable(), options);
            Console.WriteLine("\n\nBB");
            Console.WriteLine(y);


            int IsSame = string.Compare(x, y);
            if (IsSame != 0)
            {
                Console.WriteLine("Bad conversion");

                for (int i = 0; i < x.Length; i++)
                {
                    if (x[i] != y[i])
                    {
                        Console.WriteLine(i.ToString());
                    }
                }

            }


            Console.Read();


        }
    }

  
}

Here are some unit tests showing serialization via newtonsoft and XML.

[TestMethod]
    public void test_dtype_serialization_newtonsoft()
    {
        var a = np.arange(9).reshape(3, 3);
        AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

        var A_DtypeSerializedFormat = a.Dtype.ToSerializable();

        var A_Serialized = SerializationHelper.SerializeNewtonsoftJSON(A_DtypeSerializedFormat);
        var A_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<dtype_serializable>(A_Serialized);

        dtype b = new dtype(A_Deserialized);

        Assert.AreEqual(a.Dtype.TypeNum, b.TypeNum);
        Assert.AreEqual(a.Dtype.str, b.str);
        Assert.AreEqual(a.Dtype.alignment, b.alignment);
        Assert.AreEqual(a.Dtype.ElementSize, b.ElementSize);
        Assert.AreEqual(a.Dtype.Kind, b.Kind);

    }

    [TestMethod]
    public void test_dtype_serialization_XML()
    {
        var a = np.arange(9).reshape(3, 3);
        AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

        dtype_serializable A_DtypeSerializedFormat = np.ToSerializable(a.Dtype);

        var A_Serialized = SerializationHelper.SerializeXml(A_DtypeSerializedFormat);
        var A_Deserialized = SerializationHelper.DeserializeXml<dtype_serializable>(A_Serialized);

        dtype b = np.FromSerializable(A_Deserialized);

        Assert.AreEqual(a.Dtype.TypeNum, b.TypeNum);
        Assert.AreEqual(a.Dtype.str, b.str);
        Assert.AreEqual(a.Dtype.alignment, b.alignment);
        Assert.AreEqual(a.Dtype.ElementSize, b.ElementSize);
        Assert.AreEqual(a.Dtype.Kind, b.Kind);

    }

    [TestMethod]
    public void test_ndarray_serialization_newtonsoft()
    {
        var a = np.array(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8 }).reshape(3,3);
        AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

        var A_ArraySerializedFormat = a.ToSerializable();
        var A_Serialized = SerializationHelper.SerializeNewtonsoftJSON(A_ArraySerializedFormat);
        var A_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<ndarray_serializable>(A_Serialized);

        Console.WriteLine("AA");
        print(A_Serialized);

        var b = new ndarray(A_Deserialized);

        var B_ArraySerializedFormat = b.ToSerializable();
        var B_Serialized = SerializationHelper.SerializeNewtonsoftJSON(B_ArraySerializedFormat);
        var B_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<ndarray_serializable>(B_Serialized);
        Console.WriteLine("\n\nBB");
        print(B_Serialized);

        Assert.AreEqual(0, string.Compare(A_Serialized, B_Serialized));
        Assert.AreEqual(a.Dtype.TypeNum, b.Dtype.TypeNum);
        Assert.AreEqual(a.Dtype.str, b.Dtype.str);
        Assert.AreEqual(a.Dtype.alignment, b.Dtype.alignment);
        Assert.AreEqual(a.Dtype.ElementSize, b.Dtype.ElementSize);
        Assert.AreEqual(a.Dtype.Kind, b.Dtype.Kind);

    }

    [TestMethod]
    public void test_ndarray_serialization_XML()
    {
        var a = np.array(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8 }).reshape(3, 3);
        AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

        var A_ArraySerializedFormat = a.ToSerializable();
        var A_Serialized = SerializationHelper.SerializeXml(A_ArraySerializedFormat);
        var A_Deserialized = SerializationHelper.DeserializeXml<ndarray_serializable>(A_Serialized);

        Console.WriteLine("AA");
        print(A_Serialized);

        var b = new ndarray(A_Deserialized);

        var B_ArraySerializedFormat = b.ToSerializable();
        var B_Serialized = SerializationHelper.SerializeXml(B_ArraySerializedFormat);
        var B_Deserialized = SerializationHelper.DeserializeXml<ndarray_serializable>(B_Serialized);
        Console.WriteLine("\n\nBB");
        print(B_Serialized);

        //Assert.AreEqual(0, string.Compare(A_Serialized, B_Serialized));
        Assert.AreEqual(a.Dtype.TypeNum, b.Dtype.TypeNum);
        Assert.AreEqual(a.Dtype.str, b.Dtype.str);
        Assert.AreEqual(a.Dtype.alignment, b.Dtype.alignment);
        Assert.AreEqual(a.Dtype.ElementSize, b.Dtype.ElementSize);
        Assert.AreEqual(a.Dtype.Kind, b.Dtype.Kind);

    }``

However, if the ndarray is wrapped as an attribute of a object, and I want to serialize this object, how can I do it.

public class Test
{
    public ndarray Array { get; set; }
}

Test test = new();
string jsonString = System.Text.Json.JsonSerializer.Serialize(test, options);

Unfortunately I could not come up with a way to serialize the ndarray object in a way that would allow me to recreate the ndarray on deserialization. The data structures are very complex and twisted inside the library. There are lots of pointers to functions that get set up based on the data type. That is how numpy is implemented in python. I ported how it is implemented.

Can np.random support serialization?

I just pushed up a new version that supports serialization for the np.random. Note: It works for the built in RandomState algorithm. If you are using a custom random generator, you will need to implement a couple of new APIs to enable serialization of that algorithm state. The sample custom random generator in my unit tests does not support serialization because I don't have access to the internals of the .NET random generator.

Here are some sample unit tests with the new functionality:

   [TestMethod]
       public void test_nprandom_serialization_newtonsoft()
       {
           var Rand1 = new np.random();
           Rand1.seed(1234);

           var Rand1Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand1.ToSerialization());
           print(Rand1Serialized);

           double fr = Rand1.randn();
           print(fr);
           Assert.AreEqual(0.47143516373249306, fr);
           fr = Rand1.randn();
           print(fr);
           Assert.AreEqual(-1.1909756947064645, fr);

           var Rand1Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<np.random_serializable>(Rand1Serialized);
           var Rand2 = new np.random();
           Rand2.FromSerialization(Rand1Deserialized);
           fr = Rand2.randn();
           print(fr);
           Assert.AreEqual(0.47143516373249306, fr);
           fr = Rand2.randn();
           print(fr);
           Assert.AreEqual(-1.1909756947064645, fr);


           Rand1Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand1.ToSerialization());
           print(Rand1Serialized);

           var Rand2Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand2.ToSerialization());
           print(Rand2Serialized);

           Assert.AreEqual(0, string.Compare(Rand1Serialized, Rand2Serialized));

       }

       [TestMethod]
       public void test_nprandom_serialization_xml()
       {
           var Rand1 = new np.random();
           Rand1.seed(1234);

           var Rand1Serialized = SerializationHelper.SerializeXml(Rand1.ToSerialization());
           print(Rand1Serialized);

           double fr = Rand1.randn();
           print(fr);
           Assert.AreEqual(0.47143516373249306, fr);
           fr = Rand1.randn();
           print(fr);
           Assert.AreEqual(-1.1909756947064645, fr);

           var Rand1Deserialized = SerializationHelper.DeserializeXml<np.random_serializable>(Rand1Serialized);
           var Rand2 = new np.random();
           Rand2.FromSerialization(Rand1Deserialized);
           fr = Rand2.randn();
           print(fr);
           Assert.AreEqual(0.47143516373249306, fr);
           fr = Rand2.randn();
           print(fr);
           Assert.AreEqual(-1.1909756947064645, fr);


           Rand1Serialized = SerializationHelper.SerializeXml(Rand1.ToSerialization());
           print(Rand1Serialized);

           var Rand2Serialized = SerializationHelper.SerializeXml(Rand2.ToSerialization());
           print(Rand2Serialized);

           Assert.AreEqual(0, string.Compare(Rand1Serialized, Rand2Serialized));

       }

       [TestMethod]
       public void test_nprandom_serialization_newtonsoft_2()
       {
           var Rand1 = new np.random();
           Rand1.seed(701);
           ndarray arr1 = Rand1.randint(2, 3, new shape(4), dtype: np.Int32);

           var Rand1Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand1.ToSerialization());
           var Rand1Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<np.random_serializable>(Rand1Serialized);
           var Rand2 = new np.random();
           Rand2.FromSerialization(Rand1Deserialized);


           ndarray arr = Rand1.randint(9, 128000, new shape(5000000), dtype: np.Int32);
           Assert.AreEqual(arr.TypeNum, NPY_TYPES.NPY_INT32);
           var amax = np.amax(arr);
           Assert.AreEqual((Int32)127999, amax.GetItem(0));

           arr = Rand2.randint(9, 128000, new shape(5000000), dtype: np.Int32);
           Assert.AreEqual(arr.TypeNum, NPY_TYPES.NPY_INT32);
           amax = np.amax(arr);
           Assert.AreEqual((Int32)127999, amax.GetItem(0));

           Rand1Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand1.ToSerialization());
           print(Rand1Serialized);

           var Rand2Serialized = SerializationHelper.SerializeNewtonsoftJSON(Rand2.ToSerialization());
           print(Rand2Serialized);

           Assert.AreEqual(0, string.Compare(Rand1Serialized, Rand2Serialized));

       }

Ok, Thank you.

I get this error. When I serializable ndarray.
My array shape is (2, 2) and value all is np.inf.

JsonSerializer.Serialize(value.ToSerializable(), options)

System.ArgumentException: '.NET number values such as positive and negative infinity cannot be written as valid JSON. To make it work when using 'JsonSerializer', consider specifying 'JsonNumberHandling.AllowNamedFloatingPointLiterals' (see https://docs.microsoft.com/dotnet/api/

This is not a numpydotnet issue. It is a serialization issue that can be resolved by enabling AllowNamedFloatingPointLiterals as below. Note, this seems to work without any changes in Newtonsoft. It should be noted that the JSON conversion of infinity values may not be supported by different JSON implementations.

```
    System.Text.Json.JsonSerializerOptions options = new System.Text.Json.JsonSerializerOptions();
        options.IncludeFields = true;
        options.DefaultIgnoreCondition = System.Text.Json.Serialization.JsonIgnoreCondition.WhenWritingNull;
        options.NumberHandling = System.Text.Json.Serialization.JsonNumberHandling.AllowNamedFloatingPointLiterals;

        ndarray aa = np.array(new double[] {double.NegativeInfinity,double.PositiveInfinity, double.NegativeInfinity, 
                                            double.PositiveInfinity, double.NegativeInfinity, double.PositiveInfinity, 
                                            double.NegativeInfinity, double.PositiveInfinity, double.NegativeInfinity }).reshape(3, 3);
        var x = System.Text.Json.JsonSerializer.Serialize(aa.ToSerializable(), options);

Ooh!! Thanks for helping me out