jupyterlab/lumino

DataGrid is slow when cells contain a lot of (long) text

krassowski opened this issue ยท 13 comments

Description

Rendering, and scrolling within DataGrid is very lagy when large text is present in cells. This is due to CanvasRenderingContext2D.measureText invocations.

  • Firefox: 8.5 seconds
  • Chrome: 11.5 seconds

Reproduce

https://github.com/jupyterlab/jupyterlab-plugin-playground with:

import {
  JupyterFrontEnd,
  JupyterFrontEndPlugin,
} from '@jupyterlab/application';
import { MainAreaWidget } from '@jupyterlab/apputils';
import { DataGrid, DataModel, BasicKeyHandler, BasicMouseHandler, BasicSelectionModel } from '@lumino/datagrid';

class LargeDataModel extends DataModel {
  rowCount(region: DataModel.RowRegion): number {
    return region === 'body' ? 1000 : 1;
  }

  columnCount(region: DataModel.ColumnRegion): number {
    return region === 'body' ? 10 : 1;
  }

  data(region: DataModel.CellRegion, row: number, column: number): any {
    if (region !== 'body') {
      return `${row}, ${column}`;
    }
    return 'x'.repeat(10*1024);
  }
}

const extension: JupyterFrontEndPlugin<void> = {
  id: 'datagrid',
  autoStart: true,
  activate: (
    app: JupyterFrontEnd
  ) => {
    const model = new LargeDataModel();
    const content = new DataGrid();
  
    content.dataModel = model;
    // (handlers are not required for reproduction)
    content.keyHandler = new BasicKeyHandler();
    content.mouseHandler = new BasicMouseHandler();
    content.selectionModel = new BasicSelectionModel({
      dataModel: model
    });
    const widget = new MainAreaWidget({content});
    widget.id = 'datagrid-example';
    widget.title.label = 'Slow Datagrid Example';
    app.shell.add(widget, 'main');
  },
};

export default extension;

This is a bit pathological, but I got worse with just one column of long strings (CSS content of lab one sheet per row)

Expected behavior

DataGrid loads fast (<2s for dataset of this size) and scrolls smoothly. For scroll, the text size is cached or something.

Context

Only tested with lumino 1.x.

Firefox:

Screenshot from 2022-10-27 00-15-09

Chrome:

Screenshot from 2022-10-27 00-15-55

Chrome, zoom-in:

Screenshot from 2022-10-27 00-16-16

friendly ping to @martinRenou and @ibdafna

Thanks for the ping.

@krassowski do you need text wrapping (line return for long text in cells) in your case? I see there is one measureText call that we could probably get rid of if you don't need text wrapping.

In the case of text wrapping enabled, we can perhaps come up with a better algorithm that uses less measureText calls.
We could also have some tricks to work around measureText slowness in the case of mono fonts.

Thanks for reporting! I'd argue this is an extreme case when each string has a length of 10240 characters - not justifying the performance, but just putting it out there. One simple fix would be to put a cap on the maximum string length we apply wrapping to and just disable the functionality if it's longer than the max.

EDIT: I don't think there's text wrapping enabled in this case - just eliding (which also measures the text).

We measure the text length in all cases - eliding or wrapping. textWrapping is optional, but eliding isn't. We should probably add a boolean for eliding, and if both wrapping and eliding are set to false then no text measurements calls need to be made. That's a useful but not sufficient approach. We can also apply some caching for {string:length}, although if n * m strings are all unique, we would end up with n * m entries which is not good for memory footprint.

let textWidth = gc.measureText(text).width;

I see there is one measureText call that we could probably get rid of if you don't need text wrapping.

Indeed, this one. No maybe not? Sorry I don't see it anymore!

EDIT: I don't think there's text wrapping enabled in this case - just eliding (which also measures the text).

I agree, this looks like an issue in eliding, but when I skimmed in the code I saw that wrapping is implemented with naive linear search but could use binary search to reduce the cost to log(n):

if (wordsInColumn.length === 0) {
let curLineTextWidth = gc.measureText(textInCurrentLine).width;
while (curLineTextWidth > boxWidth && textInCurrentLine !== '') {
// Iterating from the end of the string until we find a
// substring (0,i) which has a width less than the box width.
for (let i = textInCurrentLine.length; i > 0; i--) {
const curSubString = textInCurrentLine.substring(0, i);
const curSubStringWidth = gc.measureText(curSubString).width;
if (curSubStringWidth < boxWidth || curSubString.length === 1) {
// Found a substring which has a width less than the current
// box width. Rendering that substring on the current line
// and setting the remainder of the parent string as the next
// string to iterate on for the next line.
const nextLineText = textInCurrentLine.substring(
i,
textInCurrentLine.length
);
textInCurrentLine = nextLineText;
curLineTextWidth = gc.measureText(textInCurrentLine).width;
gc.fillText(curSubString, textX, curY);
curY += textHeight;
// No need to continue iterating after we identified
// an index to break the string on.
break;
}
}
}
}
// Multiple words in column header. Fitting maximum
// number of words possible per line (box width).
else {
while (wordsInColumn.length !== 0) {
// Processing the next word in the queue.
const curWord = wordsInColumn.shift();
// Joining that word with the existing text for
// the current line.
const incrementedText = [textInCurrentLine, curWord].join(' ');
const incrementedTextWidth = gc.measureText(incrementedText).width;
if (incrementedTextWidth > boxWidth) {
// If the newly combined text has a width larger than
// the box width, we render the line before the current
// word was added. We set the current word as the next
// line.
gc.fillText(textInCurrentLine, textX, curY);
curY += textHeight;
textInCurrentLine = curWord!;
} else {
// The combined text hasd a width less than the box width. We
// set the the current line text to be the new combined text.
textInCurrentLine = incrementedText;
}
}
}

We should look to improve the wrapping algorithm for sure, but I guess my point here is that we should eliminate or significantly speed up the measureText call which happens regardless or wrapping/eliding

Not sure how much we can speed up measureText as it is pretty close to the metal.

Eliding a sentence with 1024 characters to a 1-character size column with current binary search takes 10 measureText calls. We could reduce it to 1 call (on the happy path) with:

// If dealing with a very large string
if (textWidth > 4 * boxWidth) {
   const approximateLengthToKeep = Math.ceil(text.length * boxWidth/textWidth);
   // it is probably wise to add a small margin (increasing the number of calls on the happy path to two)
   // in order to reduce the chance of going into worst case scenario of the old algorithm.
   const margin = 1;
   const candidateText = text.substring(0, approximateLengthToKeep + margin) + elide;
   const candidateWidth = gc.measureText().width;
   // Keep the result if we did not overshoot because of unfavourable string composition;
   // the old algorithm will fine-tune it to remove the few remaining extra characters
   if (newWidth >= boxWidth) {
      text = candidateText;
      textWidth = newWidth;
   } else {
      // no-op
      // we would end up here for strings like: `iiiiimmmmmm` if we are unlucky.
      // Of course we could try to continue with a new best guess, but we should bail
      // after a few trials to avoid particularly nasty strings (worst case scenario is
      // alternating between overshoot and undershoot).
   }
}
// keep old algorithm below as it was
while (textWidth > boxWidth && text.length > 1) { 
     if (text.length > 4 && textWidth >= 2 * boxWidth) { 
       // If text width is substantially bigger, take half the string 
       text = text.substring(0, text.length / 2 + 1) + elide; 
     } else { 
       // Otherwise incrementally remove the last character 
       text = text.substring(0, text.length - 2) + elide; 
     } 
     textWidth = gc.measureText(text).width; 
   } 
 }

The bigger issue is a wide column with the text, say equivalent to 200 characters (this is what I saw locally). Then if you start with 399 characters you would call measureText 199 times. This looks like a realistic scenario for many users: you add a column with one paragraph but want to display only the size of about one sentence.

The bigger issue is a wide column with the text, say equivalent to 200 characters (this is what I saw locally). Then if you start with 399 characters you would call measureText 199 times. This looks like a realistic scenario for many users: you add a column with one paragraph but want to display only the size of about one sentence.

Example in details:

import {  JupyterFrontEnd, JupyterFrontEndPlugin } from '@jupyterlab/application';
import { MainAreaWidget } from '@jupyterlab/apputils';
import { DataGrid, DataModel, BasicKeyHandler, BasicMouseHandler, BasicSelectionModel } from '@lumino/datagrid';

class LargeDataModel extends DataModel {
  rowCount(region: DataModel.RowRegion): number {
    return region === 'body' ? 100 : 1;
  }

  columnCount(region: DataModel.ColumnRegion): number {
    return region === 'body' ? 1 : 1;
  }

  data(region: DataModel.CellRegion, row: number, column: number): any {
    if (region !== 'body') {
      return `${row}, ${column}`;
    }
    return 'x'.repeat(250);
  }
}

const extension: JupyterFrontEndPlugin<void> = {
  id: 'datagrid',
  autoStart: true,
  activate: (
    app: JupyterFrontEnd
  ) => {
    const model = new LargeDataModel();
    const content = new DataGrid();
  
    content.dataModel = model;
    // (handlers are not required for reproduction)
    content.keyHandler = new BasicKeyHandler();
    content.mouseHandler = new BasicMouseHandler();
    content.selectionModel = new BasicSelectionModel({
      dataModel: model
    });
    content.resizeColumn('body', 0, 800);
    const widget = new MainAreaWidget({content});
    widget.id = 'datagrid-example';
    widget.title.label = 'Slow Datagrid Example';
    app.shell.add(widget, 'main');
  },
};

export default extension;

Screenshot from 2022-10-27 20-38-37

Loading freezes the UI for 1.5 seconds on Chrome.

measureText can be sped up via a shim which does some caching or other internal accounting to reduce the CPU time. Just to iterate again, the issue raised above is separate from wrapText, which is another topic. The performance bottleneck you are experiencing above is not caused by the wrapText algorithm. We should look at fixing this issue first, before moving to wrapText. wrapText is disabled by default for now.

Yes I agree that the issue I saw is not in wrapText (I never said it was).

As for caching, it might help a bit with scrolling (at cost of doubling the RAM if not much worse) but as soon as user resizes the column we are back at square one.

Just to spell it out, I was referring to eliding in last few comments and eliding is enabled by default and inefficient in the worst case scenario (as described above).

Okay, understood ๐Ÿ˜Š I think I have some ideas about how to fix this. Shouldn't be anything difficult. Will report back!

@krassowski @fcollonval I've done more profiling to see what the performance is like without eliding or wrapping, and unfortunately this is still quite laggy even with zero calls to measureText. The paint call seems to take a long time. I am wondering why we're calling paint so much - we should be able to blit the content.

lumino_perf

image

This is for resize, but it's even worse for scrolling.

Digging deeper, I think this may have to do with our rendering optimizations. When resizing column/row headers or scrolling, we trigger a full on-screen buffer paint - this is extremely slow. We should be writing to the off-screen buffer and copying any changes from that buffer to the on-screen one.

lumini_perf_2