yWorks/svg2pdf.js

How to ignore invalid dataurl

tangenttechno opened this issue · 6 comments

I am using Apache PDFbox to generate SVG from PDF, but the generated SVGs can have invalid PNG dataurl, when I try to export to pdf using doc.svg() I am getting the error 'addImage does not support files of type 'UNKNOWN', please ensure that a plugin for 'UNKNOWN' support is added'. how to ignore this error if dataurl is not valid?

yGuy commented

I don't think there is a way. It's hard enough already to render valid SVGs properly. This library does not have the goal to validate and fix invalid SVGs. So I propose a preprocessing that removes or fixes or whatever is appropriate the broken images before passing it to this library. Maybe you can fix Apache PDFbox, instead?

sample
@yGuy Thank you for the quick reply, one thing I noticed is library is trying to fetch invalid dataURL, please see attached SVG for the original, below is the URL library is trying to fetch

data:image/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACEAAAAhCAYAAABX5MJvAAAA1UlEQVR4Xu2WQQqD%0AMBBFsyh4A8Vd915Aj2oPUc9U7Dk6/bMohGFak1GZQPPgI2rmTRA1CaFSqZwAUbgg%0AVz7Ke6eChiNyQ54IReFzvj7KmsOAvEPuovG38LhOOnYB4YA8lGa/wuMH6TIBUWuY%0AQDyRVjqzgWRR5DlZpDMLCCZFaskk3cmgeFaElszSnQyKV0VoySrdSaCwQV6K0BL2%0ANLLHJijqFdme9LLHJlTCk2DI+51gqJCvw/8/wZD3H5OhEtYOhrxX0Q/kvZ+IIc+d%0AlQZ57TErlb/hDbwZp8G79kS6AAAAAElFTkSuQmCC;base64,iVBORw0KGgoAAAANSUhEUgAAACEAAAAhCAYAAABX5MJvAAAA1UlEQVR4Xu2WQQqDMBBFsyh4A8Vd915Aj2oPUc9U7Dk6/bMohGFak1GZQPPgI2rmTRA1CaFSqZwAUbggVz7Ke6eChiNyQ54IReFzvj7KmsOAvEPuovG38LhOOnYB4YA8lGa/wuMH6TIBUWuYQDyRVjqzgWRR5DlZpDMLCCZFaskk3cmgeFaElszSnQyKV0VoySrdSaCwQV6K0BL2NLLHJijqFdme9LLHJlTCk2DI+51gqJCvw/8/wZD3H5OhEtYOhrxX0Q/kvZ+IIc+dlQZ57TErlb/hDbwZp8G79kS6AAAAAElFTkSuQmCC
Screenshot 2023-07-21 at 3 59 54 PM
Screenshot 2023-07-21 at 4 00 31 PM

all the invalid URL has two 'data:image' appended which is not in the original, is it related to the library since what I am doing is downloading SVG and exporting to PDF.

`

<script src="node_modules/jspdf/dist/jspdf.umd.min.js"></script> <script src="node_modules/svg2pdf.js/dist/svg2pdf.umd.min.js"></script>
<script>
    // Function to download SVG from URL and save as PDF
    function downloadSVGAsPDF(url, x, y, width, height) {
      fetch(url)
        .then((response) => response.text())
        .then((svgContent) => {
          const doc = new jspdf.jsPDF();
  
          // Create a temporary div element to render the SVG
          const div = document.createElement('div');
          div.innerHTML = svgContent;
          const element = div.querySelector('svg');
  
          doc.svg(element, {
            x,
            y,
            width,
            height,
          }).then(() => {
            // Save the created PDF
            doc.save('myPDF.pdf');
          });
        })
        .catch((error) => {
          console.error('Error fetching SVG:', error);
        });
    }
  
    // Call the function with the URL of the SVG file and coordinates
    const svgUrl = 'sample.svg';
    const x = 0;
    const y = 0;
    const width = 200;
    const height = 200;
  
    downloadSVGAsPDF(svgUrl, x, y, width, height);
  </script>
`

The "data:image" is added here:

const dataUri = `data:image/${format};base64,${btoa(data)}`

And I suppose this method fails when trying to parse the broken data URL:

static async fetchImageData(imageUrl: string): Promise<{ data: string; format: string }> {
let data, format
const match = imageUrl.match(dataUriRegex)
if (match) {
const mimeType = match[2]
const mimeTypeParts = mimeType.split('/')
if (mimeTypeParts[0] !== 'image') {
throw new Error(`Unsupported image URL: ${imageUrl}`)
}
format = mimeTypeParts[1]
data = match[5]
if (match[4] === 'base64') {
data = atob(data)
} else {
data = decodeURIComponent(data)
}
} else {
data = await ImageNode.fetchImage(imageUrl)
format = imageUrl.substring(imageUrl.lastIndexOf('.') + 1)
}
return {
data,
format
}
}

However, I agree with @yGuy. You should pass valid URLs to svg2pdf. Svg2pdf cannot detect or fix broken URLs.

@HackbrettXXX Thank you for the reply, if we inspect the SVG file I attached, we cannot find any png data url given above. Not sure how it came, if you execute the simple sample code i have provided, you will get the same error. Can you try?

yGuy commented

I checked the SVG file you sent and the problematic data URL is likely only problematic due to the new lines in the attribute value. This results in whitespaces being inserted by the XML parser and the regex won't match anymore.
It seems browsers are pretty lenient here and base64 strings should just be stripped from any whitespace before being interpreted.

We could/maybe should add this functionality to svg2pdf. But of course the workaround is to preprocess and parse the svg dom, select all image hrefs and strip away the whitespace in the data urls. That should make the lib happy.

@yGuy Thank you for the reply. Yes, your solution worked. I minified the SVG and now its rendering fine. Thank you.