Networks and the Internet
The transport layer
The transport layer is responsible for end-to-end communication. The IP address just identifies the interface to which the data is sent. What happens when it gets there? There could be a large number of processes using the link. The IP header doesn't contain sufficient information to deliver messages to specific users within a system, so two additional protocols have been implemented to handle the details of communications between "end users."1In practice, these end users are processes. These end users connect to the network via ports, or communication end points, within individual machines.
TCP
The Transmission Control Protocol, or TCP, is a so-called reliable protocol: it ensures that data gets to its destination, and if it doesn't, it sends another copy. If it can't get through after a large number of tries (14 tries and nearly 10 minutes), it gives up, but it doesn't pretend the data got through. To perform this service, TCP is also connection oriented: before you can send data with TCP, you must establish a connection, which is conceptually similar to opening a file.
To implement this protocol, TCP packets include a TCP header after the IP header, as shown in Figure 16-3. This figure ignores the possible options that follow the IP header. The offset of the TCP header, shown here as 20, is really specified by the value of the IP
Header length field in the first byte of the packet. This is only a 4 bit field, so it is counted in words of 32 bits: for a 20 byte header, it has the value 5.
A number of fields are of interest when debugging network connections:
- The sequence number is the byte offset of the last byte that has been sent to the other side.
- The acknowledgment number is the byte offset of the last byte that has received from the other side.
- The window size is the number of bytes that can be sent before an acknowledgment is required.
These three values are used to ensure efficient and reliable transmission of data. For each connection, TCP maintains a copy of the highest acknowledgment number received from the other side and a copy of all data that the other side has not acknowledged receiving. It does not send more than window size bytes of data beyond this value. If it does not receive an acknowledgment of transmitted data within a predetermined time, usually one second, it sends all the unacknowledged data again and again at increasingly large intervals. If it can't transmit the data after about ten minutes, it gives up and closes the connection.
UDP
The User Datagram Protocol, or UDP, Is different: it's an unreliable protocol. It sends data out and never cares whether it gets to its destination or not. So why do we use it if it's unreliable? It’s faster, and thus cheaper. Consider it a junk mail delivery agent: who cares if you get this week's AOL junk CD-ROM or not? There will be another one in next week's mail. Since it doesn't need to reply, UDP is connectionless: you can just send a message off with UDP without worrying about establishing a connection first. For example, the rwhod daemon broadcasts summary information about a system on the LAN every few minutes. In the unlikely event that a message gets lost, it's not serious: another one will come soon.
Port assignment and Internet services
A port is simply a 16 bit number assigned to specific processes and which represents the source and destination end points of a specific connection. A process can either request to be connected to a specific port, or the system can assign one that is not in use.
RFC 1700 defines a number of well-known ports that are used to request specific services from a machine. On a UNIX machine, these are provided by daemons that listen on this port number- another words, when a message comes in on this port number, the IP software passes it to them, and they process it. These ports are defined in the file /etc/services. Here's an excerpt:
# Network services, Internet style # # WELL KNOWN PORT NUMBERS # ftp 21/tcp #File Transfer [Control] ssh 22/tcp #Secure Shell Login ssh 22/udp #Secure Shell Login telnet 23/tcp smtp 25/tcp mail #Simple Mail Transfer smtp 25/udp mail #Simple Mail Transfer domain 53/tcp #Domain Name Server domain 53/udp #Domain Name Server ... http 80/tcp www www-http #World Wide Web HTTP http 80/udp www www-http #World Wide Web HTTP
This file has a relatively simple format: the first column is a service name, and the second column contains the port number and the name of the service (either tcp or udp). Optionally, alternative names for the service may follow. In this example, smtp may also be called mail, and http may also be called www.
When the system starts up, it starts specific daemons. For example, if you're running mail, you may start up sendmail as a daemon. Any mail requests coming in on port 25 (smtp) will then be routed to sendmail for processing.