The Trouble with Usenet (NNTP)

This document is Copyright (C) 1995-1997 Mike Read.

During 1995, the number of dial-in users on the Internet doubled. It more than doubled again in 1996. This rapid expansion is putting a great strain on the Information Service Providers (ISPs). Typically, this strain shows up as slower downloads from busy sites or over busy links; rejected logins because of user limits, etc.

In Usenet, the strain takes the forms of lost posts. They are particularly visible in newsgroups that contain high traffic and posts in several parts. There are frequent posts in these same groups complaining about "missing" parts or posts; countered by statements that the poster is able to view all parts after uploading. Where are these parts going?

When you configure your newsreader, you identify the NNTP server you wish to access. The information you enter was given to you by your dial-up ISP provider, or by a friend, or somehow snooped off the net (more on that later). This server is then queried for the newsgroups it carries and the articles it has received. When you make a post, or a follow-up to a post, you are posting back to the same server you download from.

In addition to being contacted by clients (such as ourselves), the server is also contacted by one or more OTHER servers, for the exchange of posts. Each server asks for posts made since the last time they exchanged posts. There is a unique message-id assigned during the original posting that cannot be duplicated anywhere on the planet. As one server offers a post to the other, it checks its database to see whether it has already received that post. If so, it refuses that post and goes on to the next.

The administrator at each server site configures a number of parameters that govern its operation. This includes the groups that are carried, what to do with posts to other groups, and certain "automated" requests. It also includes the criteria for "expiring" posts; that is, when to remove the post from the list of available posts. It is necessary to remove older posts (presumably after all interested have had the opportunity to download them), because even file servers have a limited amount of disk space available. Typical expiry rules vary from 3 to 10 days, depending on the server. Occasionally, an administrator (or savvy hacker) will go through and delete articles containing copyrighted or "offensive" material prior to their expiration.

What we have been leading up to here is that the rapid increase of the volume of traffic has exceeded many of these servers' capacities. Posters posting copyrighted CD-ROMs to the WAREZ newsgroups, hobbyists posting home movies to the multimedia groups, and numerous individuals, have generated such a tremendous volume of traffic that the servers are not able to process all of it.

The sheer number of bytes can exceed the thruput of the Usenet feed(s) into a site. The protocol allows for temporary back-ups and outages of service between servers, but if the sheer volume is such that the communcations path is unable to transfer the data, eventually the receiving server will have to dump some posts; or the originating server will begin expiring them before they are transferred.

Similarly, if posts are coming in faster than they are expiring, the disk space becomes exhausted. At this point, either older posts must be expired, or the newer posts delayed and re-tried, or dropped. Ideally, the server would install more disk space, but that takes time, money, and slots. Meanwhile, something is obviously going to be lost.

While lost posts are most obvious in high-traffic multi-part groups such as the "alt.binaries" hierarchies, it stands to reason that ALL NEWSGROUPS suffer from this lost post syndrome; there is one newsfeed and one set of disk space to accomodate all.

When a post is lost, it is lost to all servers "downstream" from the server that lost it. Several of the news servers have multiple newsfeeds to attempt to compensate for this problem. But they still suffer from the disk and i/o traffic bottlenecks.

One type of hacker attack that has been taking place with greater frequency is "flooding". This is the posting of a large file in many small parts. Usually, it is done by a user trying to make a group that he objects to unusable by others. Theoretically, this attack will contribute to overloading the downstream servers and causing the loss of one or more posts of something the attacker objects to. Occasionally, flooding is also done by a "newbie" who has incorrectly configured his newsreader.

Some server administrators have attempted to configure their servers to ignore these flooding attempts. This can also contribute to missed posts, as some posts are incorrectly identified as "attacks".

Some posters indulge in cross-posting (posting in multiple newsgroups) in the mistaken belief that parts not found in one group may be found in another. In fact, only one copy of the post is kept on the server; each group has a pointer to the post. If the post is rejected or lost, it cannot be found in ANY of the groups. The poster has succeeded in requiring more disk space to represent the same amount of information.

Some of the newsfeeds in the path may not carry the group the poster wishes to post to. By cross-posting to a group that is commonly carried, the poster knows that it will arrive at the desired destination.

Although only a single copy of the post is stored at the news server, a separate record of it must be kept for each newsgroup. Similarly, the readers of the news commonly subscribe to both groups. They then download the listing of the post in both groups. If the post is of interest to them, they are also likely to download duplicate copies of this post.


The Trouble with Today's Newsreaders

While newsreaders exist for virtually every computer and operating system imaginable, the primary players in today's home-computer, dial-in based setup are the Apple Macintosh running System 7, and the IBM PC (and clones) running Windows 3.1, Windows 95, or Windows NT. Public domain source code written in C is available for those wishing to port it to other systems.

All newsreaders currently available suffer from one primary flaw: they make the assumption that they will be connected to a single NNTP server. A few will allow connection to more than one server, but only one server at a time. Once you identify a newsreader client that you like, you discover that you cannot run two copies of the program at the same time against two (or more) different servers at the same time. It is possible to run two different readers against two different (or the same) server at the same time.

This makes it difficult to hunt down the missing posts from other servers. Once located, the parts must typically be exported a few from each of several servers, and then manually assembled without the benefit of the newsreader. This procedure is error prone, labor intensive, time consuming, and may be beyound the abilities of the majority of users.

Many of the publicly accessible servers are slow. Since the reader can only access one server at a time, the slow speed increases the time required to obtain the posts.


The AllNews Usenet (NNTP) NewsReader

The AllNews Usenet NewsReader is the ideal solution to managing multiple news servers and locating missing posts. It is the first and only usenet news reader that truly integrates the posts of several different servers into a single user interface, and distributes the downloading activities across the various servers.

AllNews begins with the assumption that you have a high-speed internet connection, and a high-end computer: high speed, lots of extended memory, and lots of available fast disk. Development was done on a 486DX2/66 with 16MB extended memory running Windows 3.1, using a 28.8 modem. 100MB or more free disk space is recommended for operation.

AllNews allows you to access SEVERAL servers simultaneously. It collects the list of posts available from each of the servers, and cross-checks it with the others. The user is presented with a composite list of posts, along with the indications of which servers currently carry the post. The newsreader downloads the selected posts from whatever server has it available, and integrates the post locally with any other posts the user has selected. A search function will search a list of auxiliary servers for the presence of missing parts.

Decoding of posts is supported in many popular formats, including UUEncoding, Mime, and Base64. A "full auto" mode attempts to detect all of the decoding parameters from the contents of the file. This mode is successful for the majority of correctly posted articles, and those posted by major newsreaders. A "manual" mode allows the experienced user to attempt to override certain parameters and correct for incorrect posts. AllNews also decodes multiple attachment posts; a feature notably lacking in popular readers.

Unlike other newsreaders, AllNews retains the downloaded article until the user has reviewed the results of decoding the article. The article body is retained until the user specifically deletes it. This allows the user to re-attempt the decode using other settings for any of the several manual overrides provided by AllNews. This helps in successfully decoding incorrectly posted binaries.

When an article is deleted from the list of available articles in most readers, it requires a major effort to regain access to that article. Allnews retains a record of that article until a configurable amount of time has passed (e.g. 10 days). The user simply has to select the "Show Deleted" feature to see the article he has deleted. He may then fetch that article.


The Message

AllNews is released as freeware to put the usenet community on notice: "We are fed up with missing posts and we're not gonna take it any more!" Newsreader writers must TRULY support multiple servers. News servers must TRULY support the global message ids. Information providers must redistribute their load to prevent information loss.

The model of a single usenet server per usenet reader no longer works. The idea of a news server carrying all posts of its supported groups means that daily all servers must pass their traffic on to the next server. This gives us problems with raw throughput, transport delay, disk saturation, groups that the admin declines to carry, at each leg of the journey. Many of these links are already operating at or near capacity. Any failure or unusual surge of activity can push it past the limit, causing lost posts, even with multiple redundant feeds.

By analogy, usenet today is like a downtown roadway, 3 lanes wide with lots of traffic lights. You can get from one place to another, with a lot of stop-and-go driving, but it is time consuming and inconvenient. We have to work our way through a lot of traffic and neighborhoods that are not interesting to us. We need to implement a high-speed bypass to take us around the congestion with exits near our favorite playgrounds and stores.

To handle throughput, many ISPs "slave" several usenet servers to a "master" server. The entire usenet hierarchy is duplicated (or missed) on all servers. When a user attempts to connect to the "master", he is "randomly" passed off to one of the slaves via local DNS resolution. This distibutes the user load more evenly among the servers, but increases the amount of storage and net bandwidth consumed.

Rather than trying to support all groups on each individual server, the server could be dedicated to carrying a specific hierarchy (e.g., "alt.binaries"), or subset of newsgroups. The information provider might use several servers as hierarchy hubs. The newsreader is then configured to access certain groups via certain servers. This eliminates the duplicate storage and slaving requirement. It distributes the storage and bandwidth requirements across several servers.

The usenet news server provided with the dial-up access can be backed up with one or more commercial usenet servers, and perhaps other servers that the user has knowledge of. This allows the user better access to the posts; and higher modem thruput because of multiple server sessions, resulting in shorter download and connect times.