HPI-Information-Systems/Metanome

Allow to reduce number of database connections when using DatabaseConnectionGenerator

Opened this issue · 1 comments

For instance, IND discovery algorithms typically read multiple table. When profiling a database, this means that we deal with multiple DatabaseConnectionGenerators. Potentially, each of those opens its own connection to the database. This can be problematic, e.g., when there is a threshold on the number of connections for a (maybe even shared) database.

Therefore, it is crucial to control the number of connections. This can either be done by the algorithm through an appropriate API. Or the Metanome backend itself could avoid opening duplicate connections. Clearly, the second option is more desirable, as it (i) does not require adapting existing algorithms and (ii) entails less overhead for establishing and tearing down connections. However, it is trickier to implement. In particular, it must be ensured that connections are local to a single algorithm for security and isolation reasons.

Each InputGenerator opens only one connection. So there are exactly as many connections as input sources for each execution, which should be no problem. But the InputGenerators can open multiple statements depending on how often the algorithms query an input. These statements are wrapped in AutoCloseable objects, so they should create no resource leaks. To also close the connections appropriately, I made them AutoCloseable, too.

Still, being more thoughtful with the number of connections is a good idea. The DefaultTableInputGenerator could get a constructor that injects the DefaultDatabaseConnectionGenerator, which holds the connection. Then, Metanome could initialize a single DefaultDatabaseConnectionGenerator and use this to construct all DefaultTableInputGenerators for one algorithm execution.