webtorrent/webtorrent

file.deselect supposedly not working

Closed this issue ยท 23 comments

Heard from someone on IRC that after calling file.deselect on a file, it still gets fully downloaded.

I also didn't see an option to download torrent metadata only as well as a ways to decide which files/parts should (not) be downloaded.

@feross The problem is in the conditional in torrent.deselect(), which needs to be rewritten. Alternatively, you can possibly rewrite the code that handles the default selection & storage of the entire piece range in the torrent when client.add() is called.

If you put a debug line inside the conditional and then attempt to download only particular files in a torrent via deselecting all of the files which you do not want, you will never see it -- because the conditional is never met!

The issue appears to be that the conditional is trying to match the piece range of the file you wish to deselect, against the stored piece range. However, it will never find a match, as the only stored range is the piece range of the torrent in its entirety.

A way to demonstrate how this is broken (and a hacky fix to get the effective functionality of deselect() in the meantime) is to deselect the entire piece range, then select the files that you want. However, the piece count is not exposed in the library, so you can't call deselect(), because you don't know the end of the piece range! So, in lieu of modifying deselect(), setting torrent._selections = []; will do the trick.

You also might want to change the readme section for torrent.deselect(). Right now it only mentions that it can "deprioritize a range". Perhaps say something along the lines of "Use a non-zero value for priority -- zero means that the pieces will not be downloaded."

I did notice that unfortunately, you will get "false" copies of the files on either side of the piece range you selected, due to the nature of files not beginning or ending exactly where pieces do. This should probably be opened as a separate bug unless you can easily add in code for cleanup of those / prevent the creation of those false-files during this bugfix.

Credit goes to AlliedEnvy for figuring this out.

Yeah, the deselect() API is janky.

You have to deselect the whole torrent first, then select individual files.

Here's how we made it work in WebTorrent Desktop:

  // Remove default selection (whole torrent)
  torrent.deselect(0, torrent.pieces.length - 1, false)

  // Add selections (individual files)
  for (let i = 0; i < selections.length; i++) {
    const file = torrent.files[i]
    if (selections[i]) {
      file.select()
    } else {
      console.log('deselecting file ' + i + ' of torrent ' + torrent.name)
      file.deselect()
    }
  }

@feross we should probably improve this API before WebTorrent 1.0, because afterward we don't want to change it.

Is there a reason to let people deselect individual blocks within a file? What if the API only allowed selecting which files in a torrent to download?

Here's an API proposal

Remove the select and deselect methods from File and Torrent.

Let users specify which files they want when adding a torrent:

client.add('magnet://...', { fileSelections: [true, true, false, true] })

When you do that, the files that aren't selected are never created on disk.

Let users specify which files they want after adding a torrent:

torrent.setFileSelections([true, true, false, true])

Doing that doesn't delete any files. Partially downloaded files just stay there, unless the user deletes them separately. This lets you pause and resume individual files in a torrent.

This alternate API would require the library to do a bit more work to keep track of pieces overlapping a file boundary. It's simpler, though. You always declare what you want to download for the whole torrent, instead of calling select() and deselect() on individual files and piece ranges. This API also lets you download a torrent without creating all the files on disk.

@feross

Or maybe:

torrent.setFileSelections([12,23]);

So if a torrent have big number of files, you don't have to do [false, false, false, false, false, false, ... , true];

torrent.setFileDeselections([4,6,9]);

I like the idea that @kocoten1992 has, which is how the transmission rpc is also. You define an array of wanted/unwanted file indices. Having this option when adding torrents would be amazing!

Some news?

I started implementing @dcposch proposal, in the fashion described by @kocoten1992, with a list of wanted files.
It's over there - it needs a bit more testing but seem to work fine on my side.

Is there a reason to let people deselect individual blocks within a file? What if the API only allowed selecting which files in a torrent to download?

@dcposch - Yes there could be. Given that torrent runs in a browser with limited storage, it might be useful to distribute storage of a single large file, eg a hi-def video, across a swarm. Therefore nodes in a swarm only download and maintain a smaller set of blocks within a file rather than the whole, providing them as needed.

I can confirm in v0.108 that torrent.deselect(0, torrent.pieces.length - 1, false) doesn't work. And I remember this line working time ago. Is there any alternative to this? any workaround?

To support this idea with more info, in the line of @Wingman4l7 the problem is the selection. The callback we get on .add is already late to execute deselect, this is the code:

_updateSelections() {
!this.ready || this.destroyed || (process.nextTick(()=>{
    this._gcSelections()
}
),
this._updateInterest(),
this._update())
}

the state of wt.add(..., function(){}) it's "done", so it's late to mark them with deselect.
if I do it like this, also doesn't work:

    let torrent = client.add(latestTorrentId);
    //
    // Criteria to select files, first we deselect all of them
    //
    torrent.files.forEach(function (file, index, arr) 
    {
        file.deselect();
    });
    // Will help to stop the download as I'm still waiting for the finished implementation
    torrent.deselect(0, this_torrent.pieces.length - 1, false);  

The following code, doesn't help either:

var files = torrent.files.filter(function (file) {
      var ext = path.extname(file)
      return ext === '.article'
    })

This might be a pretty easy fix @feross , and will help us to keep all-in-one package the content. Any progress with this? or any tip/tick to solve this situation?

Doing tests and tests and tests, I can confirm this seems to work:

        let content_torrent = content_wt.add(content_magnet_link, function(this_torrent)
	{
		//
	    // Criteria to select files, first we deselect all of them
	    //
	    this_torrent.files.forEach(function (file, index, arr) 
	    {
	        file.deselect();
	    });
	    /*
	    // Will help to stop the download as I'm still waiting for the finished implementation
	    this_torrent.deselect(0, this_torrent.pieces.length - 1, false);  
	    */
	});

    content_torrent.on('ready', function() 
    {
    	//
		// DEBUG
		//
		function on_progress() 
	    {

	        console.log(`d: ${pretty_bytes(content_torrent.downloadSpeed)}/s - u: ${pretty_bytes(content_torrent.uploadSpeed)}/s - u: `);
	    }

    	on_progress();
    	setInterval(on_progress, 500);
	}

Fixed by this:

const file = torrent.files[fileIndex];

// Deselect all files on initial download
torrent.files.forEach(file => file.deselect());
torrent.deselect(0, torrent.pieces.length - 1, false);

// Select file with provided index
if (file) torrent.select(file._startPiece, file._endPiece, false);

Thanks @PavelShar I just tried that with a full example https://codepen.io/Jolg42/pen/qBOeYej?editors=1010

const client = new WebTorrent()

const torrentId = 'https://webtorrent.io/torrents/wired-cd.torrent'

client.add(torrentId, function (torrent) {
  torrent.on('done', function(){
    console.log('torrent finished downloading')
  })

  torrent.on('download', function (bytes) {
    console.log('total downloaded: ' + torrent.downloaded)
    console.log('progress: ' + torrent.progress)
  })

  console.log('Torrent name:', torrent.name)
  console.log('Files:')
  torrent.files.forEach(file => {
    console.log('- ' + file.name)
  })
  
  // Deselect all files on initial download
  torrent.files.forEach(file => file.deselect());
  torrent.deselect(0, torrent.pieces.length - 1, false);

  // Torrents can contain many files. Let's use the .mp4 file
  const file = torrent.files.find(function (file) {
    console.log(`We will only download and play ${file.name}`)
    file.select()
    return file.name.endsWith('.mp3')
  })

  // Display the file by adding it to the DOM. Supports video, audio, image, etc. files
  file.appendTo('body', {autoplay: true, muted: true})
})

While I combed through the source, I found a block which could enable selection of part of the torrent.
But the option this.so, which is supposed to make it work is unused.
There is no way to pass it in options like new Torrent(torrent, client, opts); There is no way to set this.so through opts.

// https://github.com/webtorrent/webtorrent/blob/7aee819796c540df0b247fec1853098f9a591d4c/lib/torrent.js#L482
if (this.so) {
  // this block is never executed
  const selectOnlyFiles = parseRange(this.so)

  this.files.forEach((v, i) => {
    if (selectOnlyFiles.includes(i)) this.files[i].select(true)
  })
}

I tried many things and I found kind of a workaround, maybe.
It requires parsing the torrent file, magnet or url, beforehand using parse-torrent library.
And adding a property so = '0,1,2,3' , to the parsed torrent, and then pass that to the client.add
where so contains index of files to be downloaded. It can also be a range like 2...3 or 1...5, 7, 8...10 etc
Set it to -1 to select no files.

const meta = parseTorrent(magnet);
meta.so = '-1'; // to deselect all files

client.add(meta, onTorrent);

Note

This thing, works fine. But before all files are deselected with file.deselect() in a loop, immediate-chunk-store creates some files in the filesystem.
Setting so = '-1' on the other hand does not create any files, unless it is selected.

Regarding #164 (comment)

Setting so = -1 doesn't seem to be preventing the creation of files on disk for me. This is with or without the deselect stuff in #164 (comment)

My code looks something like

client.add({
  infoHash,
  so: '-1'
}, opts, onTorrent);

@RangerMauve Unfortunately you aren't going to be able to completely prevent the files themselves from being created. The real problem is that Webtorrent actually has no concept of selecting a file. Instead, you are only allowed to select the pieces of the torrent which you wish to download. Since a piece can overlap multiple files (a piece may contain the end of one file and the beginning of the next), Webtorrent will end up writing the bytes to their corresponding files, which will likely result in creating files that are directly adjacent to the selected files.

So the result is that you may get some extra files created that you didn't want, but the size of them should be fairly negligible as they shouldn't really be more than a few kb in size (256kb in the worst case if I'm correct, but I could be wrong). Your best bet is to maintain the selected files yourself (you'll have to anyway to figure out which are selected), and just throw the other files away when you're done with them. If you plan on seeding the torrent, you'll want to keep them around as they have parts of the pieces.

I also want to note that I had found issues with using so = -1 due to how it was searching for selected torrents. I recommend using a non-numerical value (something like so = '-')

Are there any developments on this? I desperately need this to work and all of the solutions provided above do not work. The reason deselecting is so useful is because some torrents might include bloat or additional packages. For example, I'm trying to host a server and you can shave down the file size significantly from 12GB -> 2GB by removing/deselecting certain things because the stuff you can remove is additional content. As a result, this would be very useful to have as it would help me dramatically decrease DL times.

This is what I have, and I have added a print after the file selection (removed here). It does only print the ones I want but downloads other files, the ones which I have ignored. Additionally, it fully downloads supposed deselected files.

    const torrent = TorrentClient.add(TorrentURL, {
        path: Path,
    }, function (torrent) {
        // Make sure is server - for ignoring
        if (Server == false) return

        // Deselect everything
        torrent.files.forEach(file => file.deselect())
        torrent.deselect(0, torrent.pieces.length - 1, <any>false)

        //
        for (const file of torrent.files) {
            // Check path
            const ParentPath = path.basename(path.dirname(file.path))

            // Removing files
            const EnglishFileCheck = ParentPath == "english" && KeepEnglish.includes(file.name) == false
            const AllFileCheck = ParentPath == "all" && KeepAll.includes(file.name) == false
            const FolderCheck = RemoveFolders.includes(ParentPath)
            if (AllFileCheck || EnglishFileCheck || FolderCheck) {
                file.deselect()
                continue
            }

            // File is wanted, add to selections
            file.select()
        }
    })

so its been over 8 years since this issue was reported and still no proper solution was implemented?! ๐Ÿคฃ
does anyone know what file.deselect() actually does?

so its been over 8 years since this issue was reported and still no proper solution was implemented?! ๐Ÿคฃ

does anyone know what file.deselect() actually does?

Unfortunately. I've found that even if you deselect, it still downloads 10% of the deselected files.

well actually that doesn't sound so bad... even other clients such as qBittorrent does that

@feross I'm currently trying to fix this issue, by introducing a discrete interval list data type to store the selections. this could help us fix the issue with deselect.
I'm following Alex's approach, except I don't plan on using any external libraries for this (though in exchange it might not perform as well. but that's for the code review).

I do have a question:
Let's say we have an existing selection, that looks like this:

{ from: 100, to: 150, priority: 10 }

Now we want to insert a new selection, that looks like this:

{ from: 125, to: 175, priority: 5 }

We want to keep our selections non-overlapping, so we will have to split one off. But which approach do you consider "correct" for webtorrent?
A)

[
    { from: 100, to: 124, priority: 10 }
    { from: 125, to: 175, priority: 5 } // newest insertion is saved as it is, old selection is modified accordingly
]

B)

[
    { from: 100, to: 150, priority: 10 }
    { from: 151, to: 175, priority: 5 } // newest insertion is modified, since part of it overlapped with existing selection
]

To me option A) looks like the expected one, since the latest insert overrides the existing data, but maybe it's not the best fit for webtorrent(?). Idk, please let me know what you think.

Just to give my two cents your comment above @detarkende, I would say A and B are both valid depending on the scenario.

Given a torrent which has the piece layout:

Piece 1 | <---file 1---> <---file 2---> |
Piece 2 | <--------- file 3 ----------> |

Calling file1.select() would select piece 1.
Calling file2.select() would select piece 1.

Calling file1.deselect() would currently lead to file2 being deselected which is not desired behaviour as both files depend fully on the same piece.

Say we had the further piece layout:

Piece 1 | <--------- file 1 ----------> |
Piece 2 | <---file 1---> <---file 2---> |
Piece 3 | <--------- file 2 ----------> |

Calling file1.select() would select piece 1 and 2.
Calling file2.select() would select piece 2 and 3.

Both file1 and file2 depend on piece 2, but deselecting either would currently cause a piece required by another file to be deselected, this again would not be desired behaviour.

I believe Alex's changes are an attempt to allow overlapping ranges and priorities, so that multiple selections can include the same piece, and the algorithm used will flatten this to the linear ranges based on ranges and priorities required, e.g.

In the second example above, calling file1.select() and torrent.select(2, 3, 1) would look like this, as the higher priority takes precedent:

[{ from:1, to: 1, priority: 0 }, { from: 2, to: 3, priority: 1 }]

But calling file1.select() and file2.select() could look like either:

[{ from:1, to: 1, priority: 0 }, { from: 2, to: 3, priority: 0 }]
// or
[{ from:1, to: 2, priority: 0 }, { from: 3, to: 3, priority: 0 }] 
// Note: It's probably easier to continue from the existing selection as to not have to update + insert.

And when deselecting content, all individual ranges are evaluated, so calling file1.deselect() would update the selections to look like:

// With priorities
[{ from:1, to: 1, priority: 0 }, { from: 2, to: 3, priority: 1 }]
// to
[{ from: 2, to: 3, priority: 1 }]

// Without priorities
[{ from:1, to: 2, priority: 0 }, { from: 3, to: 3, priority: 0 }] 
// to
[{ from: 2, to: 3, priority: 0 }] 

and calling file2.deselect() would leave look like:

// With priorities
[{ from:1, to: 2, priority: 0 }, { from: 3, to: 3, priority: 1 }]
// to
[{ from:1, to: 2, priority: 0 }, { from: 3, to: 3, priority: 1 }] // Same

// Without priorities
[{ from:1, to: 2, priority: 0 }, { from: 3, to: 3, priority: 0 }] 
// to
[{ from:1, to: 2, priority: 0 }] 

Note: The lack of changes in // With priorities is due to file.deselect() only deselecting with a priority of 0, not 1, which typically means it's being used by a stream that we don't want to interrupt.

If that all makes sense?