Can't see shared data from hypercored
at88mph opened this issue · 9 comments
I'm using dat
13.9.0 with hypercored
on Debian 9.2 and OS X.
If I run dat create
in my /data
directory, followed by dat share
, I can see the data on my local network. However, if instead of dat share
I run hypercored
, the same dat clone ...
command that works with the dat share
no longer works.
I have questions about how it's supposed to work:
- Am I supposed to clone the same ID, or use the
Archiver key
as printed out after startinghypercored
? - What relevance does the Swarm port have? Am I supposed to connect to it somehow?
- For
dat
s to work, do we need bi-directional access? Meaning, in order to have a share on a public machine, I can only clone to another public machine?
Thanks!
Dustin
hypercored
functions mostly as a backup or server mirroring tool, it can't create dats or read existing dats. You have to always create a dat first then copy it to hypercored
. So the flow would be:
dat share /data
-> prints a key, lets call itdat://my-dat-key
- add
dat://my-dat-key
tohypercored
./feeds
file (whiledat share
is still running) hypercored
will duplicate data to the./archiver
directory- stop running
dat share
when uploading is stopped (not very easy for this use case) hypercored
will makedat://my-dat-key
available over the network, without needing to rundat share
again- to update your "backup", run
dat share
again in/data
until files are backed up by hypercored
Hopefully I am understanding your question correctly! Using Dat and hypercored
on the same machine is not great right now, we'd like to build this into the main CLI so it is great to see this use case. We mostly use hypercored
on servers where we want full backups and dats to be always available whereas dat share
only makes a dat available while the command is running.
On your other questions:
- Am I supposed to clone the same ID, or use the Archiver key as printed out after starting hypercored?
You should use the same ID, the archiver key is a hypercore feed that can only be used by other hypercored
or related tools.
- What relevance does the Swarm port have? Am I supposed to connect to it somehow?
The ports shouldn't matter, Dat should be able to connect to the peers - they advertise whatever port they are available on.
- For dats to work, do we need bi-directional access? Meaning, in order to have a share on a public machine, I can only clone to another public machine?
Not necessarily. We have some tools to get around firewalls, using hole-punching. But Dat works best if at least one of the machines is public. Bi-directional hole punching is more hit or miss depending on the network.
You can run dat doctor
to see if you can connect to our public peer and then use the key printed to see if your two computers can connect to each other.
That clarifies it, thank you for the detailed reply @joehand.
I have a Dat share (not hypercored
, just dat share
) on a VM on OpenStack with a Public IP, which runs happily. My desktop cannot see it with dat doctor
or dat clone
, but other Public IP machines can, which leads me to believe that it needs to communicate back and forth like that and both need to be able to see each other.
My full use case will look like this:
- Create Docker image for a Dat share. Since the Docker container processes run in the foreground, a simple
dat share
should suffice. For clarification, is thedat create
only necessary when you want thedat.json
created for its metadata? - Run a Docker container for each share.
- Enable replication amongst the shares.
- Have the ability to download the closest proximity file as we will be delivering data from multiple sites globally. I like that the BitTorrent libraries are in use here. Is the system smart enough to detect what file I'm looking for and direct me to the closest one?
Many thanks for the reply.
My desktop cannot see it with dat doctor or dat clone, but other Public IP machines can, which leads me to believe that it needs to communicate back and forth like that and both need to be able to see each other.
Can you connect to the public peer in the first part of the test (from desktop)? It should look like this:
❯ dat doctor
[info] Testing connection to public peer
[info] UTP ONLY - success!
[info] TCP ONLY - success!
If you are running inside Docker on your desktop, that is likely the issue. We haven't been able to figure out how to get around Docker's NAT without switching to host networking mode.
For clarification, is the dat create only necessary when you want the dat.json created for its metadata?
Yep, exactly.
Have the ability to download the closest proximity file as we will be delivering data from multiple sites globally. I like that the BitTorrent libraries are in use here. Is the system smart enough to detect what file I'm looking for and direct me to the closest one?
There is no network prioritization yet. But it should download more data from whatever is the fastest peer, just by the nature of how the requests work.
Thanks, @joehand. I've gone down the dat doctor
route, and I'm convinced that it's our lousy network here. Thank you for explaining how hypercored
works, too.
I've been running Docker with --net host
without issue on hosts outside of our network. Running with --net host
isn't a big issue for us.
Also, because we have a requirement to serve files from the fastest peer, how is that set up? If there are multiple dat
s registered in the same place and share the same path, will the download be distributed amongst them?
Thank you again! You've been extremely helpful.
I've gone down the dat doctor route, and I'm convinced that it's our lousy network here.
Not ideal, but glad you figured it out. Feel free to run p2p test, it may give us some more data on your network:
npm install -g p2p-test
p2p-test [optional-short-message-description-the-network]
Also, because we have a requirement to serve files from the fastest peer, how is that set up? If there are multiple dats registered in the same place and share the same path, will the download be distributed amongst them?
Not quite sure I understand this question pre-coffee, but I'll give it a shot.
Dat networks are only created around a specific key. So peers never connect to other peers unless they are sharing/downloading the same dat. If dats share a path, then the downloads will be completely unaware of each other (it may cause some weird problems too but it should work eventually - we'll have something that locks a given path for writes while Dat is running eventually). They will write to the same metadata, in the .dat
folder. Which says what is downloaded, but it won't be coordinated.
Prioritizing the fastest peer for a single key is definitely a good feature, feel free to open an issue in datproject/dat. But prioritization across keys will need to be something more custom.
About fastest peers, if you have two sources online, the downloader will connect to the two sources and begin downloading from them at the same time. Since each individual block downloaded is quite small, the downloader will see that the blocks have downloaded mostly from the faster source.
Thanks @Karissa , that's exactly what my use case is. We will have a West Coast site and an East Coast site in Canada, with data replicated across them. If a user requests a file, how does Dat know to pull from those multiple sources?
Thank you!
If a user requests a file, how does Dat know to pull from those multiple sources?
Dat automatically connects to all the sources sharing a specific key (similar to bittorrent). So if both your west cost + east coast are sharing dat://xyz
and you dat clone dat://xyz
, it'll connect to both sources.