deneb-viz/deneb

Feature request: Ability to turn off _formatted and other generated columns

Opened this issue · 4 comments

When adding a column to the deneb visual, e.g. amount, extra columns are generated like amount__formatted. These are useful however when the data sizes grow it would be good to keep it trim to help performance and make developing easier (by not having to scroll so much in the data viewer). I couldn't find a way to turn off having these added so I use the 'project' transformation in Vega to only keep the columns I need from dataset. Please add the option to only import the columns added.

dm-p commented

The v2 visual (currently being scoped and prototyped; no ETA) will include this as part of its dataset implementation, and the template definition will be extended to ensure that it carries over to any visuals that use them.

dm-p commented

Just to round out the above comment, Deneb's action to create the format columns and add them to the dataset does not affect the query performance. I did profile the difference between versions when we introduced them, and the overhead for a 200K row dataset with ten measures was negligible in terms of in-visual processing to generate them (< 10ms). There are more glaring performance issues in the dataset generation. We will be able to address these in the v2 design (and already have an optimized approach for the core query data that is up to 3x faster for some of the largest datasets we know about from real-world examples).

I really appreciate that the columns add more to the dataset than some folks need, and I wanted to provide this info proactively. In v2, "support columns" will be configurable by function (although some will be mandatory based on what features you enable, i.e., we will always need the __selected__ column if cross-filtering is enabled, but you may only want some of the columns for cross-highlight). Once we get through prototyping, we will know more about how this is going to work.

Thanks, that sounds wonderful. At the moment I hit the 10K limit regularly - when turning it off the performance is really impacted. I need so much data because I give the user the ability to view multi year time series for multiple objects, which is zoomable with an overview chart. The issue is that the number of objects loaded can be up to 50-100, and with 3 years that makes 50-100K rows! I built a custom vega selector for users to choose which objects to display - having it done in power bi resets the deneb visual for each change and is so much slower than keeping it all in deneb. So the current work around is informing users they have to select smaller time periods when loading more objects which is a UX pain.

dm-p commented

Just to clarify, loading a lot of data into a Power BI visual goes against Power BI's way of working with data (aggregation). The limit and performance of fetching more data would not change in a future state without MS making improvements, and that is why the core visuals are artificially very limited in how many rows they allow. We cannot do much from a visual perspective as we have no control over the query performance, and this is the trade-off of increasing it in Deneb to more than visuals typically allow. The processing of the data when we get it will be faster in the future state of Deneb, but the loading lies squarely with Power BI/MS. Apologies if I didn't make this clear.