Network Protocols

 

Communications on a network are governed by a set of rules and procedures called protocols.  Protocols exist in computers for the same reason they exist, formally or otherwise, in human interactions.  They spell out how the communication proceeds; whose turn is it, how does it become someone else’s turn, how do you ask someone to repeat the last part of the communication if there’s been an error, and so on.  The difference between computers and humans is, of course, that computers are stupid.  There protocols must be very rigid and simply defined if they are to work.  Here’s an example of a protocol.

 

Do you remember the “Lord of the Flies” by William Golding?  A bunch of English schoolboys are marooned on an island and have to build a society in order to survive.  When they have meetings, they need to keep the discussion organized so everyone gets his say and is listened to.  They invent a protocol to insure this desirable result.

 

They have found a large shell from a marine animal called a conch.  In a meeting, whoever holds the conch gets to speak and everyone else has to listen.  Since there is only one conch, there can only be one person speaking at a time.  It’s an elegant solution to an eternal human problem.  Robert’s Rules of Order are another (much more complicated) protocol to achieve the same ends.  On the island, when the boy’s protocol breaks down it’s a sign that their society is breaking down too.

 

There is a protocol for managing a ring network that relies on a protocol very similar to the conch shell.  Remember that all the computers are arranged in a ring, with each computer having a direct link only to the computers on its right and left.  In this protocol, each machine passes messages in one direction only, let’s say to the right.  If it receives a message, it looks at it to see if its own address is attached to the message.  If so, it makes and keeps a copy of the message.  In any case, it then passes the message on to the right. When a machine receives a message from its left and that message is one that it sent, it knows that its message has made it around the room and it can assume that the intended receiver has made a copy of it.

 

To coordinate the process, the network has a unique bit pattern called a token.  The token is passed around the ring, from one computer to the one on its right.  If a computer does not have any messages to transmit, it just passed the token on.  If it does have a message to transmit, it must wait until the token is passed to it.  It then keeps the token and passes its message on instead, with the address of the intended recipient and its own address (like the return address on a package.)  When it receives its own message back, it passes the token on instead and the next computer waiting to send a message can keep the token and send its message in the same way.  Like the schoolboy council, only one computer can be transmitting a message at any moment.

 

CSMD/CD

 

Probably the most common architecture used in bus networks is Carrier Sense, Multiple Access with Collision Detection (CSMA/CD.)  It’s the protocol used in an Ethernet.  Each computer in the network takes responsibility for its own communications by monitoring traffic on the bus and waiting until the bus is free.  It then tries to send its message with the address of the intended recipient attached.  Placing the message on the bus actually broadcasts it to all machines on the network.  Each machine looks at the message and checks the attached address to see if it matches the machines own address.  If it does, the message was intended for this machine and it receives it. If the address does not match, the machine is supposed to ignore the message.

 

The only remaining problem occurs if two or more machines are waiting to transmit, both detect that the bus is free and both try to transmit their messages at exactly the same time.  This is a collision.  If it occurs, both machines are supposed to detect the collision and back off, canceling their messages.  Each then waits a random amount of time and tries again.  With a little luck, each machine will wait a different random amount of time and the collision will not happen again.  If it does, oh well, wait and try again.  Eventually, both messages will be transmitted.

 

 

Layered Communications Model

 

There are many tasks involved in safely transferring data (files, documents, Web pages, audio, video, whatever…) from one site to another.  Requests for data need to be directed to the devices that hold the needed items, the collections of data need to be broken into small enough chunks to be transmitted, these pieces need to be labeled with their destination address, they may need to be encrypted and eventually decrypted, the pieces need to be physically transmitted and reassembled, and on and on.  Each task may be done either by hardware or software (or some combination.)  While it is certainly possible for a single computer company to develop complete, integrated collections of hardware and software to do the entire job from start to finish at both the sending and retrieving ends, it would probably not be a good idea.

 

In fact, these tasks are usually done by hardware and software made by different manufacturers, installed and updated over a long period of time and in different locations.  In the case of an open network, the components that carry out the transfer are not even owned by the same entity.  So how does that work?  Do all the manufacturers get together and plan out their product lines in coordination so they all work nicely together?  Riiiiight, I really see THAT happening.  Even though it is in everybody’s interest to have computer communications proceed smoothly, companies will still jockey for market share by creating some proprietary way to do a task desired by their potential customers.  The natural result of this pure competition seems to be the eventual domination of a market by one very powerful company with a few much smaller companies vying for distant second place based on niche applications.  Sound familiar?

 

In the case of computer communications, the reason everything works together so well is that there is a standard called the OSI (Open Systems Interconnection) model that divides the process of transmitting data into fourteen steps, seven on each end of the communication, and specifies what is to be done by the hardware and software in each step.  These fourteen steps are usually visualized as two stacks of seven blocks or layers, like this:

 

Notice that each of the layers has a matching layer on the other side of the communication.  Each layer passes communications through it in two directions, acting as the intermediary between the layer above it and the layer below it.  It receives outgoing messages from the layer above itself and prepares them for the layer below (or the physical connection to the other site, in the case of the “physical” layer.)  It receives incoming messages from the layer below itself (or from the physical connection to the other site, in the case of the “physical” layer) and prepares them from the layer above itself.  We will discuss the specific tasks done by each of these layers in class.

 

In any case, the point to layered specifications like this is that any company which makes communications technology can decide to make any one or more parts of a system, as long as their parts match contiguous sets of layers.  Since the interfaces between the layers are part of the specification, the company will know how to interface their parts with other equipment implementing the other layers, no matter who made the other equipment.

 

People and organizations wishing to communicate can purchase parts of their system from any vendor as long as they again pay attention to which layers are being covered by which pieces of equipment.

 

Network service organizations can provide a communications backbone by implementing some number of the lower layers and providing access to them like a utility company provides access to power or water.  Their clients can interface with the “communications utility” simply by knowing which layers are part of the utility and which they must provide themselves.

 

Layered Internet model

 

The Internet is also implemented following a layered model.  In this case, there are four layers of software called (from the top down) the Application, Transport, Network, and Link layers. An interesting question (deferred for now) is how these layers are related to the seven layers of the OSI model.

 

The Application layer consists of software packages that need Internet services in order to function.  One example is the collection of programs that use the File Transfer Protocol to move files across the Internet.  Another common package running in the application layer is the email routines that use the Simple Mail Transfer Protocol, SMTP, to transfer email.  The programs in this layer send and receive messages by passing them to the transport layer along with the IP address of the messages’ destinations.

 

The Transport layer divides outgoing messages into small units, adds sequence numbers so the parts can be reassembled, and attaches the destination address to each unit.  The resulting numbered, addressed, chunks are called packets, and are passed along to the Network layer.

 

The Network layer has to see that the packets eventually reach their intended destination even though that destination may be in a domain to which the sending network has no direct link.  This means that the network layer must understand the topology of the Internet enough to place a temporary intermediate address to the packets and pass them along to an intermediate network, which does the same thing until the packets reach the domain containing their destination.

 

Actually, of course, the Network layers don’t communicate directly with each other.  They pass their packets, now labeled with the intermediate addresses, to the Link layer.  Each network’s Link layer is responsible for navigating that network (and this means dealing with its own topology and protocols) and for passing it on the next physical network via its link layer.

 

When the link layer of a network receives a packet, it passes it up to the network layer which checks its destination address.  If it is on this network, the packets are passed up to the Transport layer for processing.  If not, the network layer labels the packet with a new intermediate address and passes it back down to the link layer to be forwarded to the next network.

 

When a packet does reach the network containing its destination, it is passed to the transport layer to be reassembled (with all of its companion packets) into the original message.  Once the original message has been reconstructed, it’s passed to the application layer and to the software that is waiting for it.

 

But how does the transport layer know which application package should receive the message?  The answer involves the use of numbers called (for obscure historical reasons) port numbers to indicate the target software package.  These numbers must be provided by the sending application, which makes them part of the destination address.  While port numbers are actually arbitrary and sending applications can try to use any port they want (especially in a storm), there are some commonly-used port numbers.  The HTTP server usually uses port 80 and the FTP server usually uses port 21.

 

TCP/IP

 

Transmission Control Protocol (TCP) and Internet Protocol (IP) are two of the protocols in a suite of protocols used on the Internet to implement this four-level hierarchy.  TCP defines one version of the Transport layer, and IP is the Internet standard for the Network layer.  There are actually two different transport layer definitions in the suite, TCP and the User Datagram Protocol (UDP).  TCP does more to control the communication than does UDP, but is somewhat more costly (in terms of time and network traffic) to run than is UDP.  When TCP is about to send a data packet, it first sends a message to the destination alerting it that a packet is coming.  In TCP, the transport layers at both ends of the communication use a series of coordinating signals and retransmissions to make sure that communications arrive intact.  UDP is called a “connectionless” protocol because it does not alert the recipient, and it is called an “unreliable” protocol because it does not verify that a message arrived.  That’s not to say that it’s useless; it is more efficient than TCP.  However, TCP is the more common transport layer definition in use.