rlv-dan/Snap2HTML

Can the output be converted back to a simple list of files

danieltroger opened this issue · 5 comments

please. I need HTML2snap. Someone gave me a html file with 400k files and I need to compare it to some filelists I have. Doing it manually would take an eternity.
I don't want any fancy fucking html, I just want a list of the files.

The html document is 200MB and I finally managed to write the dirs object to a file as json with a blob, but it's still in some fucked array format. It's impossible to work with. jq can't parse the JSON and I don't want to reverse engineer snap2html.

Is there anything that can convert this shit to a simple list in this format:

/path/to/file
/path/to/next/file
/path/to/another/file

Your tool seriously lacks that option or I'm missing it

Also: I saw the option to print the current view to a file, but I want the whole tree printed to a file without having to click on 20000 folders

Also, store your dates in ISO 8601

Thanks for detailing the data format. I can now export dirs to a json via a blob and write a parser in node. The format's still kinda retarded though

Data format:
				Each index in "dirs" array is an array representing a directory:
					First item: "directory path*always 0*directory modified date"
						Note that forward slashes are used instead of (Windows style) backslashes
					Then, for each each file in the directory: "filename*size of file*file modified date"
					Seconds to last item tells the total size of directory content
					Last item refrences IDs to all subdirectories of this dir (if any).
						ID is the item index in dirs array.

Ok guys, I wrote a parser and finally got my file list. Here are the steps:

  1. In the dev console in the snap2html html file run this:
const blob = new Blob([JSON.stringify(dirs)], {type : 'application/json'});
const link = document.body.appendChild(document.createElement("a"));
link.href = URL.createObjectURL(blob);
link.innerText = "right click me pls";

  1. RIGHT CLICK the link that was added to the documents and select "save target as" or similar.
  2. Save the file to 'dirs.json'
  3. Install nodejs if you haven't already
  4. Put below into parser.js in the same directory as dirs.json
const fs = require('fs');
let rawdata = fs.readFileSync('dirs.json');
let dirs = JSON.parse(rawdata), split, name, size, modified, currentDirname;
dirs.forEach(dir => {
  dir.splice(-2,2);
  currentDirname = "";
  dir.forEach((file,index) => {
    split = file.split("*");
    name = split[0];
    size = split[1];
    modified = split[2];
    if(index === 0){
      currentDirname = name;
      console.log(currentDirname);
    } else {
      console.log(currentDirname+'/'+name);
    }
  });
});
  1. cd into the directory and run node parser.js > filelist.txt
  2. Open another issue and tell @rlv-dan to add this as a feature so the next person doesn't have to waste 35 minutes on making Snap2HTML not suck / on finding this issue which he's gonna close just like the last person's cry for help

Don't get me wrong, Snap2HTML might be great. But most of us just want a simple list of files

I don't like your tone. Make something better yourself if you don't like it.