dat-ecosystem-archive/docs

Can't see shared data from hypercored

at88mph opened this issue · 9 comments

I'm using dat 13.9.0 with hypercored on Debian 9.2 and OS X.

If I run dat create in my /data directory, followed by dat share, I can see the data on my local network. However, if instead of dat share I run hypercored, the same dat clone ... command that works with the dat share no longer works.

I have questions about how it's supposed to work:

  • Am I supposed to clone the same ID, or use the Archiver key as printed out after starting hypercored?
  • What relevance does the Swarm port have? Am I supposed to connect to it somehow?
  • For dats to work, do we need bi-directional access? Meaning, in order to have a share on a public machine, I can only clone to another public machine?

Thanks!
Dustin

hypercored functions mostly as a backup or server mirroring tool, it can't create dats or read existing dats. You have to always create a dat first then copy it to hypercored. So the flow would be:

  • dat share /data -> prints a key, lets call it dat://my-dat-key
  • add dat://my-dat-key to hypercored ./feeds file (while dat share is still running)
  • hypercored will duplicate data to the ./archiver directory
  • stop running dat share when uploading is stopped (not very easy for this use case)
  • hypercored will make dat://my-dat-key available over the network, without needing to run dat share again
  • to update your "backup", run dat share again in /data until files are backed up by hypercored

Hopefully I am understanding your question correctly! Using Dat and hypercored on the same machine is not great right now, we'd like to build this into the main CLI so it is great to see this use case. We mostly use hypercored on servers where we want full backups and dats to be always available whereas dat share only makes a dat available while the command is running.

On your other questions:

  • Am I supposed to clone the same ID, or use the Archiver key as printed out after starting hypercored?

You should use the same ID, the archiver key is a hypercore feed that can only be used by other hypercored or related tools.

  • What relevance does the Swarm port have? Am I supposed to connect to it somehow?

The ports shouldn't matter, Dat should be able to connect to the peers - they advertise whatever port they are available on.

  • For dats to work, do we need bi-directional access? Meaning, in order to have a share on a public machine, I can only clone to another public machine?

Not necessarily. We have some tools to get around firewalls, using hole-punching. But Dat works best if at least one of the machines is public. Bi-directional hole punching is more hit or miss depending on the network.

You can run dat doctor to see if you can connect to our public peer and then use the key printed to see if your two computers can connect to each other.

That clarifies it, thank you for the detailed reply @joehand.

I have a Dat share (not hypercored, just dat share) on a VM on OpenStack with a Public IP, which runs happily. My desktop cannot see it with dat doctor or dat clone, but other Public IP machines can, which leads me to believe that it needs to communicate back and forth like that and both need to be able to see each other.

My full use case will look like this:

  • Create Docker image for a Dat share. Since the Docker container processes run in the foreground, a simple dat share should suffice. For clarification, is the dat create only necessary when you want the dat.json created for its metadata?
  • Run a Docker container for each share.
  • Enable replication amongst the shares.
  • Have the ability to download the closest proximity file as we will be delivering data from multiple sites globally. I like that the BitTorrent libraries are in use here. Is the system smart enough to detect what file I'm looking for and direct me to the closest one?

Many thanks for the reply.

My desktop cannot see it with dat doctor or dat clone, but other Public IP machines can, which leads me to believe that it needs to communicate back and forth like that and both need to be able to see each other.

Can you connect to the public peer in the first part of the test (from desktop)? It should look like this:

❯ dat doctor
[info] Testing connection to public peer
[info] UTP ONLY - success!
[info] TCP ONLY - success!

If you are running inside Docker on your desktop, that is likely the issue. We haven't been able to figure out how to get around Docker's NAT without switching to host networking mode.

For clarification, is the dat create only necessary when you want the dat.json created for its metadata?

Yep, exactly.

Have the ability to download the closest proximity file as we will be delivering data from multiple sites globally. I like that the BitTorrent libraries are in use here. Is the system smart enough to detect what file I'm looking for and direct me to the closest one?

There is no network prioritization yet. But it should download more data from whatever is the fastest peer, just by the nature of how the requests work.

Thanks, @joehand. I've gone down the dat doctor route, and I'm convinced that it's our lousy network here. Thank you for explaining how hypercored works, too.

I've been running Docker with --net host without issue on hosts outside of our network. Running with --net host isn't a big issue for us.

Also, because we have a requirement to serve files from the fastest peer, how is that set up? If there are multiple dats registered in the same place and share the same path, will the download be distributed amongst them?

Thank you again! You've been extremely helpful.

I've gone down the dat doctor route, and I'm convinced that it's our lousy network here.

Not ideal, but glad you figured it out. Feel free to run p2p test, it may give us some more data on your network:

npm install -g p2p-test
p2p-test [optional-short-message-description-the-network]

Also, because we have a requirement to serve files from the fastest peer, how is that set up? If there are multiple dats registered in the same place and share the same path, will the download be distributed amongst them?

Not quite sure I understand this question pre-coffee, but I'll give it a shot.

Dat networks are only created around a specific key. So peers never connect to other peers unless they are sharing/downloading the same dat. If dats share a path, then the downloads will be completely unaware of each other (it may cause some weird problems too but it should work eventually - we'll have something that locks a given path for writes while Dat is running eventually). They will write to the same metadata, in the .dat folder. Which says what is downloaded, but it won't be coordinated.

Prioritizing the fastest peer for a single key is definitely a good feature, feel free to open an issue in datproject/dat. But prioritization across keys will need to be something more custom.

About fastest peers, if you have two sources online, the downloader will connect to the two sources and begin downloading from them at the same time. Since each individual block downloaded is quite small, the downloader will see that the blocks have downloaded mostly from the faster source.

Thanks @Karissa , that's exactly what my use case is. We will have a West Coast site and an East Coast site in Canada, with data replicated across them. If a user requests a file, how does Dat know to pull from those multiple sources?

Thank you!

If a user requests a file, how does Dat know to pull from those multiple sources?

Dat automatically connects to all the sources sharing a specific key (similar to bittorrent). So if both your west cost + east coast are sharing dat://xyz and you dat clone dat://xyz, it'll connect to both sources.

@joehand Excellent! Thank you both for helping. I thought one had to run multiple dats, but in reality one is just a clone of another, which gives me access to both sources. Thank you!

Closing this issue.