SOCI/soci

SOCI doesn't support automatic type conversion

Sildra opened this issue · 14 comments

Sildra commented

Currently, SOCI doesn't support any kind of automatic conversion between integrals.

This issue forces developpers to perform database-specific conversions notably when using sqlite and another db simultaneously.

Providing a feature performing this automatic conversion between int and long long will enable sqlite to change it's default type from 32bit to 64bit and use dates without breaking existing current user code.

// Required in the user code to perform type safe conversions when using multiple backends
// Note: the name lookup is performed twice
template<typename T>
T get_safe_integer(const soci::values& v, const std::string& name) {
    switch (v.get_properties(name).get_data_type()) {
        case soci::dt_integer: return (T)v.get<int>(name);
        case soci::dt_unsigned_long_long: return (T)v.get<unsigned long long>(name);
        case soci::dt_long_long: return (T)v.get<long long>(name);
        default: return T();
    }
}

I kind of attempted to add this feature by performing a dynamic cast + virtual call and it is working as intended.

std::string to std::string: OK
std::string to int: Got a bad cast
int to int: OK
int to float: Got a bad cast
double to float: OK
float to double: Got a bad cast
int to long long: Got a bad cast
long long to int: OK
//
// Copyright (C) 2004-2008 Maciej Sobczak, Stephen Hutton
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
// http://www.boost.org/LICENSE_1_0.txt)
//

#ifndef SOCI_TYPE_HOLDER_H_INCLUDED
#define SOCI_TYPE_HOLDER_H_INCLUDED

// std
#include <typeinfo>
#include <string>
#include <memory>
#include <iostream>


// Base class holder + derived class type_holder for storing type data
// instances in a container of holder objects
template <typename T>
class type_holder;

template <typename T>
class type_getter
{
public:
    virtual T value(T*) const = 0;
};


class holder
{
public:
    holder() {}
    virtual ~holder() {}

    template<typename T>
    T get()
    {
        if (auto* p = dynamic_cast<type_getter<T>*>(this))
            return p->value((T*)nullptr);
        throw std::bad_cast();
    }
};

template <typename T>
class type_holder : public holder, public type_getter<T>
{
public:
    type_holder(T * t) : t_(t) {}
    ~type_holder() override { delete t_; }

    T value(T*) const override { return *t_; }

private:
    T * t_;
};

#define GET_VALUE_OVERRIDE(type) type value(type*) const override \
    { return static_cast<type>(*t_); }

template <>
class type_holder<double> : public holder, public type_getter<float>, public type_getter<double>
{
public:
    typedef double T;
    type_holder(T * t) : t_(t) {}
    ~type_holder() override { delete t_; }

    T value(T*) const override { return *t_; }
    GET_VALUE_OVERRIDE(float);

private:
    T * t_;
};

template <>
class type_holder<unsigned long long> : public holder, public type_getter<unsigned long long>, public type_getter<long long>, public type_getter<int>
{
public:
    typedef unsigned long long T;
    type_holder(T * t) : t_(t) {}
    ~type_holder() override { delete t_; }

    T value(T*) const override { return *t_; }
    GET_VALUE_OVERRIDE(long long);
    GET_VALUE_OVERRIDE(int);

private:
    T * t_;
};

template <>
class type_holder<long long> : public holder, public type_getter<unsigned long long>, public type_getter<long long>, public type_getter<int>
{
public:
    typedef long long T;
    type_holder(T * t) : t_(t) {}
    ~type_holder() override { delete t_; }

    T value(T*) const override { return *t_; }
    GET_VALUE_OVERRIDE(unsigned long long);
    GET_VALUE_OVERRIDE(int);

private:
    T * t_;
};

#endif // SOCI_TYPE_HOLDER_H_INCLUDED

template<typename S, typename D>
void tryGet(const std::string& info) {
    std::cout << info;
    try {
        std::unique_ptr<holder> h = (std::unique_ptr<holder>)std::make_unique<type_holder<S>>(new S());
        h->get<D>();
        std::cout << ": OK\n";
    } catch (std::bad_cast&) {
        std::cout << ": Got a bad cast\n";
    }
}

int main() {
    tryGet<std::string, std::string>("std::string to std::string");
    tryGet<std::string, int>("std::string to int");
    tryGet<int, int>("int to int");
    tryGet<int, float>("int to float");
    tryGet<double, float>("double to float");
    tryGet<float, double>("float to double");
    tryGet<int, long long>("int to long long");
    tryGet<long long, int>("long long to int");

    return 0;
}
vadz commented

Currently, SOCI doesn't support any kind of automatic conversion between integrals.

Before discussing this further, I'd like to understand whether this is really a problem. What conversions exactly do you think should exist and why?

Sildra commented

column type int, target type long long :

  • you put a dt_integer on oracle, you get a long long,
  • you put it on sqlite you get an int, without automatic conversion you get a bad_cast if you use the same code as the oracle one

column type long long, target type int :

  • soci_sqlite doesn't support 64bit and every user code is already in int, changing the type to long long will break existing code
  • basically the receiver of the get should always be of the same type as the get so performing the narrowing in the get or the assignment shoud hold the same result
Sildra commented

All of the types are really inconsistent between backends with a lot of data truncation or miss-handling :

Backend SourceType DestinationType SourceMax DestinationMax
sqlite (2)double (2)double 1.79769e+308 1.79769e+308
oracle (2)double Error 1.79769e+308 Error
sqlite (3)integer (3)integer 2147483647 2147483647
oracle (3)integer (4)long_long 2147483647 2147483647
sqlite (4)long_long (3)integer 9223372036854775807 -1
oracle (4)long_long (2)double 9223372036854775807 -9223372036854775808
sqlite (5)ulong_long (3)integer 18446744073709551615 18446744073709551615
oracle (5)ulong_long (2)double 18446744073709551615 0
#include "soci/soci.h"
#include "soci/sqlite3/soci-sqlite3.h"
#include "soci/oracle/soci-oracle.h"
#include <iostream>
#include <istream>
#include <ostream>
#include <string>
#include <exception>
#include <chrono>
#include <cstdio>

template<typename T>
struct myType {
    typedef T val_type;
    soci::data_type inType;
    soci::data_type outType;
    T inMax = std::numeric_limits<T>::max();
    T outMax;
};

namespace soci {
template<typename T>
void get_safe_data(T& t, const soci::values& v) {
    t.outType = v.get_properties("MAX_").get_data_type();
    switch (t.outType) {
        case soci::dt_string: break;
        case soci::dt_double:
            t.outMax = static_cast<typename T::val_type>(v.get<double>("MAX_"));
            break;
        case soci::dt_integer:
            t.outMax = static_cast<typename T::val_type>(v.get<int>("MAX_"));
            break;
        case soci::dt_unsigned_long_long:
            t.outMax = static_cast<typename T::val_type>(v.get<unsigned long long>("MAX_"));
            break;
        case soci::dt_long_long:
            t.outMax = static_cast<typename T::val_type>(v.get<long long>("MAX_"));
            break;
        default: break;
    }
}

template<typename T>
struct type_conversion<myType<T>> {
    typedef soci::values base_type;
    static void from_base(const soci::values& v, soci::indicator ind, myType<T>& t)
    { get_safe_data(t, v); }
    static void to_base(const myType<T>& t, soci::values& v, soci::indicator& ind)
    { v.set("MAX_", t.inMax); }
};
}

std::string to_str(soci::data_type t) {
    switch (t) {
        case soci::dt_string:    return "(" + std::to_string((int)t) + ")string    ";
        case soci::dt_double:    return "(" + std::to_string((int)t) + ")double    ";
        case soci::dt_integer:   return "(" + std::to_string((int)t) + ")integer   ";
        case soci::dt_long_long: return "(" + std::to_string((int)t) + ")long_long ";
        case soci::dt_unsigned_long_long: return "(" + std::to_string((int)t) + ")ulong_long";
        default: return "(" + std::to_string((int)t) + ")unknown";
    }
}

template<typename T>
void testType(soci::data_type t) {
    myType<T> val;
    val.inType = t;
    for (const auto& str : { "sqlite3://db=:memory:", "oracle:" /*???*/}) {
        try {
            std::cout << (str[0] == 's' ? "sqlite" : "oracle");
            std::string table = "TEST_";

            soci::session sql(str);
            try { sql << "DROP TABLE " << table; } catch(...) {}

            sql.create_table(table).column("MAX_", val.inType);
            sql << "INSERT INTO " << table << "(MAX_) VALUES (:MAX_)", soci::use(val.inMax);
            soci::statement stmt = (sql.prepare << "SELECT * FROM " << table);
            stmt.exchange(soci::into(val));
            stmt.define_and_bind();
            stmt.execute();
            stmt.fetch();

            std::cout << "  " << to_str(val.inType)
                << " " << to_str(val.outType)
                << " " << val.inMax
                << " " << val.outMax
                << "\n";
        } catch (std::exception& e) {
            std::cout << "  Error: " << e.what() << "\n";
        }
    }
}

int main()
{
    soci::register_factory_sqlite3();
    soci::register_factory_oracle();

    testType<double>(soci::dt_double);
    testType<int>(soci::dt_integer);
    testType<long long>(soci::dt_long_long);
    testType<unsigned long long>(soci::dt_unsigned_long_long);
    return 0;
}
vadz commented

Are you testing all this with or without #954?

column type int, target type long long :

* you put a dt_integer on oracle, you get a long long,

* you put it on sqlite you get an int, without automatic conversion you get a bad_cast if you use the same code as the oracle one

As SQLite INTEGER is polymorphic, we probably do need to perform some dynamic type conversions for it, but it's an exception, all the other databases use fixed types.

IOW maybe the real title of this issue should be "SQLite backend doesn't support automatic type conversions"?

Sorry but I don't have time to dive into this now, but I'd just like to at least formulate the problem exactly before -- hopefully -- fixing it.

Sildra commented

Unfortunately, github is not previewing the code but it actually the same mess that is used here. Introducing the new types will actually make the conversion even more complicated.

The current behavior is either you get the data in the native format or you get a bad_cast. Maybe introducing a safe_get would be better but in this case you will never have a reason to use the unsafe get.

https://github.com/SOCI/soci/pull/954/files#r1355873352

Regarding sqlite having a polymorphic integer is great but the underlying type_holder is int, meaning soci is locking you in 32 bit and there is no alternative.

My previous comment was actually not accurate, with oracle, long long type is treated as double meaning that when you get data from the database, you loose precision in soci.

Sildra commented

This commit is an implementation of automatic conversion : Sildra@10d2f77

It works with all the c++ numeric types by performing the appropriate conversions and will reject unknown types with a static assert.

Strings are not implicitly converted and will throw the usual bad_cast + some details.

The implem is faster than the 4.0.3 implementation with the dynamic cast and should be no less safe than the master implementation with the typeid vodoo.

vadz commented

I'm sorry if I missed something because there are several semi-related or at least overlapping issues/PRs, but I still don't understand what are we even trying to do here. Could you please explain what is your desired behaviour?

Sildra commented

Actually, the commit doesn't contain unrelated issue. It is all about not having useless bad_cast when retrieving data from the db.

If I do this :

sql.create_table("TEST").column("VAL", soci::dt_long_long);
sql << "INSERT INTO TEST VALUES (1234)";

In oracle :

soci::row r;
sql << "SELECT VAL FROM TEST", soci::into(r);
r.get<long long>(0); // std::bad_cast -> why ? I have declared my column as a long long in the DDL
r.get<double>(0); // OK -> why ? It is not event an integral type ?

In sqlite :

soci::row r;
sql << "SELECT VAL FROM TEST", soci::into(r);
r.get<long long>(0); // std::bad_cast -> why ? I have declared my column as a long long in the DDL
r.get<int>(0); // OK -> why ? SOCI is truncating my data by itself

After my proposal, whatever the DB, because I don't really care about my source type :

soci::row r;
sql << "SELECT VAL FROM TEST", soci::into(r);
r.get<long long>(0); // OK
r.get<int>(0); // OK
r.get<double>(0); // OK
r.get<int32_t>(0); // OK
r.get<int64_t>(0); // OK
r.get<bool>(0); // OK
r.get<std::string>(0); // bad_cast: source type is <depedent on the DB>, target type is std::string

And if I still want to perform checks :

soci::row r;
sql << "SELECT VAL FROM TEST", soci::into(r);
auto props = r.get_properties(0); // OK, the source type from the DB is <depedent on the DB> and I may want to check overflow/underflow/signedness/whatever, but everyone know that I will just perform a static_cast
switch (props.get_data_type()) {
// Usual code that is currently repeated in every code that is using SOCI just to perform a static_cast
}
vadz commented

Thanks, so your idea is that creating a column with dt_xxx should allow retrieving its value with get<xxx>(), which is something I can agree with. But then you also put a "why?" comment on get<double>() working for a dt_long_long column (which is something I can get behind, again), and yet as the result of your changes, it's going to always work even when the column was created using something different from dt_double, which seems contradictory (and to me, actively undesirable).

I don't really know what is the right way to map dt_xxx to column types, but I definitely need to merge #954 first before even thinking about it.

Sildra commented

Personally, I think merging #954 before this is actually worse because you will have a bunch of types that will never have any equivalent in major DB. You can distinguish uint64 and int64 because they don't have the same number of digits, but you won't be able to do this with int32 and uint32.

As a result, you will always get a bad_cast when performing get<uint32>.

What I did is the most compliant between numeric types buy I can separate floating and integral types and keep the bad_cast.

The reason why I am proposing this is that integrating #1087 or #1085 will break existing code that don't use the properties to get the values.

Remember that dt_long_long is currently mapped to get<double> for Oracle.

vadz commented

I think what you're proposing is an alternative solution to the problem with #954 solves and at least so far I don't see why is it better. C++ tries to be a strongly typed language and so it makes much more sense to me to insist on strict mapping between C++ and database types rather than converting between them willy-nilly.

I do hope to merge #954 soon and I believe get<xxx> should work with it when you use a column of type db_xxx, but perhaps I'm missing something.

Sildra commented

Actually, I am proposing a fix for #954 that is likely to break everything.

When you are using the rowset functionality, it will use the prepare_for_describe function that get the description for both data_type and db_type. The db_type is taking priority while constructing the type_holder and will be used to perform the right get<>.

Current code uses a switch over the data_type :

switch (row.describe_column(0).get_data_type()) {
  case dt_integer: row.get<int>(0);
}

New code need to use the db_type :

switch (row.describe_column(0).get_db_type()) {
  case db_int8: row.get<int8_t>(0);
}

See the problem here ? One of them will throw a bad_cast.

vadz commented

OK, I do see a problem. It seems like we need to make both of them work, which could be achieved by allowing promotion to the bigger type, i.e. get<int>() should basically do (lossless) cast of get<int8_t>() to int.

Is this problem specific to soci::row API or does it affect the basic API too? I.e. will into(int_var) still work even if the column type is (the equivalent of) int8_t? It should, ideally...

Sildra commented

I need to check the code as I am not familiar with 'normal' binding but I think it is using the get_from_uses which performs the type_conversion<T>::from_base(baseValue, ind, val);