Simple file transfer application
This application allows to transfer files from one machine to another in multiple threads.
This is written in ~7 hours of pure (coffee breaks excluded) time.
Features:
- Unencoded TCP is used to transfer data
- Number of threads is variable (maximum is
255
), and is defined by client - Server will send any file it can read
Requirements
- UNIX-like OS with POSIX threads support
- Python 3 (developed under Python 3.8)
- GNU make
- C compiler able to link
glibc
Build
make
Run
Client
$ ./client.py --help
usage: client.py [-h] address threads file ofile
File download client
positional arguments:
address server address, in host:port format
threads number of threads to use to receive file
file file to download
ofile output file
optional arguments:
-h, --help show this help message and exit
file
is the path to file to download on server. It is passed as a stringofile
is the output file destination on local machine
Client must have access to /tmp
. It creates a directory and temporary files in it.
Server
$ ./server.py --help
usage: server.py [-h] port
File download server
positional arguments:
port server port
optional arguments:
-h, --help show this help message and exit
When the server is stopped (e.g. by SIGINT
), all its child processes are also stopped. Their sockets are properly closed.
Description of the implementation & notes
The implementation is straightforward, but is not quite polished. It works as described below; errors are handled, but there are probably cases which are not handled.
server
listens on itsport
for incoming HTTP connections.- This is implemented by means of Python's
http.server
- This is implemented by means of Python's
client
sends a POST HTTP request, whose path isfile
, and the content is JSON with number of threads- POST is used so that it is not cached, although GET is semantically better. I could set proper headers, but didn't do that in time. And now this requires implementation changes
server
checks the requestedfile
exists and is readable, and calculates its MD5 hash.server
starts a (asynchronous) subprocesssender
, whose purpose is to send thefile
in multiple threads. The TCP port it will listen on is determined byserver
in advanceserver
sends the hash and the port ofsender
toclient
client
starts a (synchronous) subprocessreceiver
, whose purpose is to receive thefile
in multiple threads. The TCP port it should connect to is obtained from the server's responsesender
opens a listening TCP connection on the port provided to it as a command line parameter, and awaits forreceiver
s. When a new receiver is connected, a thread is createdreceiver
creates a set of threads, and each is provided with an independent socket. They all connect tosender
; after a successful connection, they send a single byte, which contains thread ID (from 0 tothreads - 1
)- This is why no more than
255
threads are allowed.255
seems a reasonable limitation. Of course, the protocol can be changed to use more threads; but since this would require to deal with endianness, the single-byte implementation is used
- This is why no more than
sender
's thread obtains the thread ID (these may come in "wrong" order, of course), and calculates the offset from which it should start to readfile
. The file is split into (logical) blocks byserver
in advance by a simple expression:block_size = size_of_file // threads + 1
- A production solution would split the file into blocks whose size corresponds to the size of a block on file system, or in memory
sender
reads the file into a buffer, and then sends the buffer over HTTP- The buffer is of fixed size. In production, its size can be made variable
receiver
receives the data into a buffer. It then writes the buffer into a "part" file, which is located in a subdirectory of/tmp
determined byclient
in advance- When all data is transmitted,
sender
exits quitely.receiver
does the same client
callscat
to concatenate all "part" files into theofile