We're going to build a tiny HTTP client, and then use that to build a tiny multithreaded web scraper.
Our HTTP client is going to use the socket interface directly, bypassing Rust's standard library. This
You'll need Rust installed, which you can do by following the instructions at https://www.rust-lang.org/en-US/install.html.
This will install the Rust compiler and standard library, as well as the Cargo build and dependency management tool.
There are a couple of ways to get documentation for Rust packages, which are called crates. There's a publicly hosted site at https://docs.rs with docs for all the crates (packages or libraries) on crates.io. You can get a crate's docs by going to https://docs.rs/cratename, which will redirect you to the latest version of its docs.
Alternatively, Cargo has a subcommand for building docs for your project and
all its dependencies: cargo doc --open
. This generates the docs, and then
opens them in a browser. It's really handy to have all the docs for the
libraries you're using in one place.
Cargo can set up a project for you, including version control:
$ cargo new rust-ngw
Created binary (application) `rust-ngw` project
$ cd rust-ngw
$ ls -a
./ ../ Cargo.toml .git/ .gitignore src/
$ cat src/main.rs
fn main() {
println!("Hello, world!");
}
Note it's created a skeleton project, including initilising a git repository. (If you prefer another version control system, check cargo new --help
for docs on the --vcs
flag.)
Now, if we run cargo run
, it will bulid our binary and run it:
$ cargo run
Compiling rust-ngw v0.1.0 (file:///tmp/asss)
Finished dev [unoptimized + debuginfo] target(s) in 0.87 secs
Running `target/debug/rust-ngw`
Hello, world!
Since we'll be using the nix
library, let's add it to our project right away! Open up Cargo.toml
, and this under [dependencies]
:
nix = "0.10"
If you've used something like npm
or bundler
, Cargo plays a similar role: it manages dependencies, and figures out which versions to install and so on. We just told Cargo that we want nix
at version 0.10.x.
If you run cargo run
again, you'll see Cargo download and compile nix
and its transitive dependencies.
The default main
doesn't return anything. Because we'll be doing all kinds of
IO and network stuff that can fail, we want to signal this in our main
functions's signature. So change it to return a Result
, which is Rust's way
of representing tihngs that might fail. So let's change fn main()
to
fn main() -> Result<(), Box<std::error::Error>> {
println!("Hello, world!");
Ok(())
}
The Ok(())
is how we say that everything completed fine, and there's no useful result.
(If you've used a typed funcitonal language, ()
is sometimes called "unit".)
We'll get a chance to look at Result
and Error
in more detail later. For now let's just move on to...
To start off, we're going to learn a tiny bit about socket programming. A socket is an endpoint for communication. There are a bunch of different types, but the ones we're interested in here are TCP sockets.
First, we'll need to create a socket! This means we have to tell the OS what kind of socket we want. There are a couple of axes:
- the address family, which for us will either be AF_INET (IPv4) or AF_INET (IPv6).
- the type of socket, which for TCP sockets os SOCK_STREAM
The C function for creating a socket is socket()
in the sys/socket.h
header. The modules in the nix bindings mirror the layout of the C headers, so
we can get it by
extern crate nix;
use nix::sys::socket::socket;
extern crate nix
is telling Rust to bring the nix library in.
You can take a look at the docs for socket to see how to
call it. To create a TCP/IPv4 socket, we'll pass AddressFamily::Inet
as the
address family, and SockType::Stream
as the socket type. This means we need
to use
(ie import) those types as well, so change the use
statement to
use nix::sys::socket::{socket, AddressFamily, SockFlag, SockType};
Now we can create the socket!
let sock = socket(AddressFamily::Inet, SockType::Stream, SockFlag::empty(), None)?;
To break this down, we're asking the OS to create a socket for us using the
IPv4 address family, and with type stream, ie, a TCP socket. The
SockFlag::empty()
and None
arguments are the flags and protocol options,
and we don't need to set any here.
Great! Now, at this point, all we've done is create the socket. It's not currently connected to anything.
To connect to a server, we call the aptly named connect
function, passing it
the socket and an address to connect to:
let ip_addr = IpAddr::new_v4(1, 1, 1, 1);
let sockaddr = SockAddr::new_inet(InetAddr::new(ip_addr, port));
connect(sock, &sockaddr)?;
You'll need to add a few more imports for IpAddr
, SockAddr
, and connect
. At this point it might be worth changing the use
line to
use nix::sys::socket::*;
to import everything from the socket module.
Building up the address is kind of tedious. That's what we get for bypassing the standard library!
Exercise. Open a TCP connection to 1.1.1.1 port 80.
You might be wondering what the deal is with the ?
s. They're a shorthand to say "hey we know this call might fail. If it does, just return the error immediately, and stop running this function."
Exercise. See what happens if you connect to an IP / port combo that isn't listening for connections. Eg. try 127.0.0.3 port 123.
To receive bytes from the socket, we need somewhere to put them. We'll use a 1024-byte buffer:
let mut buf = [0u8; 1024];
This defines an array with 1024 entries, all of which are 0u8, ie 0 as an unsigned 8-bit number, aka a byte.
Now we're ready to receive using [recv][docs-nix-recv]. We'll need to add it to the use
statemnt up top. This function takes a flags argument, but we don't actually want to pass any flags, so we'll pass in empty flags.
let len = recv(sock, &mut buf, MsgFlags::empty())?;
recv
returns the number of bytes it received and wrote into buf
. So if we want just the newly received bytes, we can take a len
-long slice of buf
:
let new_bytes = &buf[..len];
To put this together, we're going to watch Star Wars Episode IV in our terminal! If we connect to towel.blinkenlights.nl
at TCP port 23, we can have some fun.
But! We don't know how to turn a host name into an IP address yet. You'll have to look it up with dig
or nslookup
and hardcode the IP address. (Or, just take my word that it's 94.142.241.111!)
Exercise. Write a loop that repeatedly calls recv
and then writes those bytes to stdout.
Hint. An infinite loop in Rust looks like this:
loop {
// do stuff for a long long long long long time
}
Hint. Here's how to write bytes out to stdout:
io::stdout().write(bytes);
(You'll need to use std::io
up top to bring the IO module in, and use std::io::Write
to bring in the write
method.)
Great, now we can connect out, and receive bytes. But we need to be able to make requests over th socket, so we'll have to be able to send bytes as well. The function that does that is send
.
To help us see this in effect, we'll use netcat. For a quick demo, open up two terminals. In on of them, run
nc -lp 12345
This tells netcat to listen on port 12345. In the other one, run nc localhost 12345
, which tells netcat to connect to port 12345 on your machine. You should now be able to type stuff in either terminal and have it show up in the other one. :-)
For our purposes, we'll just have the listening netcat. The role of the connecting netcat will be played by our program.
Exercise. Change your program to connect to 127.0.0.1 port 12345, and write some bytes! They should show up in your listening netcat.
Hint. You can turn a string into bytes by writing mystr.as_bytes()
, so pass something like "never graduate!".as_bytes()
to send
.
We've got a handy little set of tools now. We can
- connect to a TCP address
- receive bytes
- send bytes
Let's extract this and package it up into a type with some methods. We'll define a struct! Here's a basic struct definition:
struct MyStruct {
a_number: u32,
a_string: String,
}
Now we can define some methods using an impl
block:
impl MyStruct {
fn print_it(&self) {
for i in 0..self.a_number {
println!("{}", self.a_string);
}
}
fn number(&self) -> u32 {
self.a_number
}
fn set_number(&mut self, n: u32) {
self.a_number = n;
}
}
Exercise. Define a TcpSocket
struct, with one field fd
of type RawFd
. This is the type that socket
returns: a raw file descriptor.
Exercise. Define a connect
method that takes an IP and port, and returns a TcpSocket
connected to them. Since this might fail, we'll have it return a Result
type like main
does. Here's a signature:
fn connect(ip: [u8; 4], port: u16) -> Result<TcpSocket, Box<std::error::Error>> {
// STUFF
}
Exercise. Add a recv
method that takes a buffer and receives bytes into it, returning the number of bytes received. Here's a signature:
fn recv(&self, buf: &mut [u8]) -> Result<usize, Box<std::error::Error>> {
// STUFF
}
The buf
is &mut [u8]
which means a mutable "slice" of bytes. The mut
means that we can write bytes in there, which is needed to be able to put the bytes we receive there. :-)
Exercise. Add a send
method that takes a buffer and sends bytes from it, returning the number of bytes sent. I bet you can figure out a signature for this!
Hint For sending bytes, we don't need to change the buffer, so you can leave off the mut
!
Remember further up where we wrote use std::io::Write
to be able to write to stdout? Write
is the interface for... writing. There's analogous one for reading called Read
. Both Read
and Write
are traits, which is Rust-speak for an interface.
To make our little fledgling socket wrapper interoperate with Rust libraries, we can implement Read
and Write
for it ourselves!
To implement a trait, you make a special impl
block, and then defined the required methods.
impl SomeTrait for MyStruct {
// Required method definitions here!
}
For Read
and Write
, these are
fn read(&mut self, buf: &mut [u8]) -> Result<usize, io::Error>
and
fn write(&mut self, buf: &[u8]) -> Result<usize, io::Error>
respectively.
These signatures are really close to our own send
and recv
, so we can actually just implement them using the methods we already defined.
Exercise. Implement Read
and Write
for TcpSocket
.
Hint The big difference is the error type. To get an io::Error
, call `io::Error::new(io::ErrorKind::Other, error).
Exercise. Look up io::copy
and go back to the Start Wars example and use it to replace the loop.
All right, let's make an HTTP request! We're going to make some requests to httpbin.org, a super handy service for messing around with HTTP. Here's where we'll start:
concat!(
"GET /get HTTP/1.1\r\n",
"Host: httpbin.org",
"\r\n\r\n");
Exercise. Use your skillz to find an IP address for httpbin.org, and then use your TcpSocket
to connect and send this HTTP request. Then io::copy
the result to stdout.