Libwww API
Henrik Frystyk Nielsen
World-Wide Web Consortium, MIT/LCS,
@(#) $Id: Position.html,v 1.2 1998/05/14 02:10:08 frystyk Exp $
W3C Reference Library
Abstract
This position paper discusses the API of the
W3C Reference
Library
, a.k.a. "libwww". It introduces some of the basic concepts like
streams
,
call-out
functions, and
plug-in
modules. Libwww
is freely available from the
World-Wide Web Consortium's
Web site
together with documentation and example applications.
Introduction
Most Web applications regardless of functionality share some commonalties
such as protocol modules, transport interfaces, and other various low-level
Internet related features. While many application programmers get around
this by "reinventing the wheel" every time a new application is written,
there is an obvious need for a basic Web API. The libwww was designed to
provide such an API. In this paper we will discuss some of the experiences
that we encountered doing so and what can be improved.
Libwww has been part of the World-Wide Web almost from the beginning. However,
as the design criteria for Web applications in general have changed dramatically,
the basic design of libwww has undergone several major revisions. The design
ideas presented in this document are based on the most recent version 4.1
which is to be released in June 1996. The current libwww API was designed
with the following goals for a generic Web API in mind:
-
Light-weight:
-
The API should be a platform independent, medium level API with support for
an open-ended feature set rather than directly providing a full-fleshed feature
set by itself.
-
Application independent:
-
The API should impose no restrictions on the type of application using it.
It should be usable by all types of applications such as servers, clients,
robots, and proxies. To borrow a term from the
X
world: It should
provide a set of mechanisms for accessing the Web without imposing a special
policy on how to do it.
-
Layered:
-
The API should allow for easy integration with other APIs both below and
above the libwww itself in complexity and level of abstraction.
The libwww Core
The libwww API is a small, light-weight API based on a central registry called
the
core
. The core provides a frame work for applications to register
an open-ended set of modules that can provide the functionality and profile
desired by that application. The core is itself divided into three layers
described by each of their object:
-
Request Object
-
The
Request object
represents a request issued by the application.
All requests are associated with a URL representing the resource on which
an operation is to be performed. In most cases, a libwww request results
in some kind of network activity handled by a
protocol module
. All
protocol modules are application
plug-ins
so there is no limit to
the type of request libwww can perform. Each
Request object
has an
input and an output stream associated with them that can accept data from
and return data to the application respectively.
-
Net Object
-
The
Net object
represents a connection, for example to the Internet
or the local file system. Depending on the URL and the method specified by
the request, there can be multiple
Net objects
per request. As connections
are directly associated with system resources, the number of active
Net
objects
is limited by the core to a maximum number specified by the
application. Also, in handling
Net objects
, a considerable effort
is made to maximize use of persistent connections in that multiple requests
to the same remote host are serialized where possible.
-
Channel Object
-
Each open socket and file descriptor is associated with a
Channel
object
. The
Channel object
is associated with an input stream
and an output stream that are capable of reading and writing data to a transport
respectively. The channel streams are connected to the
Request object
streams either directly or via stream chains as described later.
By itself, the
core
is not capable of performing any Web related
tasks, they are all provided through
plug-ins
and
call-out
functions registered by the application. This model enables libwww to be
application neutral in that the application feature set or profile is provided
by the application and not by libwww. In the next two sections the concept
of
plug-ins
and
call-out
functions are described.
Stream Objects
All data flow between the application layer and the transport layer is handled
using streams. Streams are objects that accept sequences of characters. Streams
do not require an output, but in most cases, they send data along after having
performed a certain operation on the data. Examples can be to insert MIME
headers, or strip out a HTTP response line. In case the output is itself
a stream, stream objects can be cascaded into
stream chains
. As mentioned,
Channel objects
and
Request objects
both have two stream chains
associated with them. The connection between the request streams and the
channel streams is made using stream chains which can be setup at run-time
using
Converters
.
The
Converter stream
class is sub-classed from the generic stream
class.
Converters
are filters which can change the current representation
(or media type) of a data object. Examples of conversions can be from one
image format to another, or to "convert" an HTML document into presenting
the document to the user in a widget. As
Converters
are in fact streams,
there can be multiple
Converters
inserted into a single stream chain
from the
Request object
to the
Channel object
, for example.
Input and output streams are responsible for reading and writing data to
and from a transport, for example a BSD socket interface. By using a stream
based interface to the transport layer, it is very easy to add special transport
mechanisms, for example, using a multiplexed transport protocol. Also it
gives a consistent interface for sending objects, as well as reading objects,
which is a requisite for building interactive Web applications.
Plug-in Modules
Plug-ins
are modules that can be registered by the application at
run-time.
Plug-ins
are an open-ended method for adding new functionality
to the application. Characteristic for the evolution of libwww is that the
set of features that are handled through
plug-ins
are constantly
increasing. In version 4.1 of the Library, the categories of
plug-ins
include:
-
Client and server side protocol modules
-
Low-level protocol transport modules
-
User dependent modules
-
Data format handlers
One of the main advantages of using
plug-ins
is that the feature set
can change dynamically as required by the application. This allows the
traditional boundaries between application types such as "clients" and "servers"
to be broken down. In fact there is little difference between registering
a server profile and a client feature set, or profile, or having an application
change profile from a client to a server application at run-time.
Request Call-out Functions
Request
call-out
functions is another open-ended method for applications
to add functionality to the core. An application registers a new feature
simply by using a generic callback registration process. There are two main
points where
call-out
functions are activated:
-
Before a request is handed to the protocol module
-
After the protocol module has terminated
At each of these points the list of registered
call-out
functions
is traversed and each of the
call-out
functions are called. The Library
comes with a set of standard
call-out
functions that cover some often-used
features like:
-
Cache validation
-
Rule file matching
-
Proxying requests
-
Logging
-
History List
-
...
Request
call-out
functions can be registered as being local to a specific
request, or as being global to all requests. This mechanism allows existing
applications to be extended with little or no modifications, New features
can be inserted by sub-modules by registering independent
call-out
routines to be handled by libwww. The latest example of how this mechanism
can be used is the implementation of a
PICS
module, which is incorporated into any libwww client application by
registering itself as
call-out
functions. Other functions like signature
handling can be handed the same way.
Threads and Pseudo Threads
Threads are in many situations a great advantage, but in general can only
be regarded as reliable if they are native to a specific platform, or as
an integral part of the programming language used. Unfortunately, as this
is not the case in ANSI C, libwww has a model for handling pseudo-threads
based on interleaved I/O. This requires that the I/O descriptor can be handled
non-preemptively, which is the case for BSD sockets and WinSock socket
descriptors. As real threads, pseudo-threads impose certain programming
techniques to be applied on the application programmer. For example,
pseudo-threads are single stack, single process entities, and all state dependent
variables must be stored in a "thread" object. The result of this is that
all streams and protocol modules must keep local state of where they are.
Real threads do not require non-preemptive I/O and hence, much of the state
information can be kept as part of the thread environment.
Implementation Experience
Most of the libwww API has been designed using a large set of iterations
based on trial and error. The library has been around for a considerable
amount of time and represents a significant knowledge base for designing
Web APIs. As a drawback on the history behind libwww, it can be noted that
as libwww is based on ANSI C, it can not take advantage of many of the features
that are now available in modern programming languages. In order to prove
that libwww provides a consistent API that can be used by multiple types
of applications, a small set of example applications was developed representing
the most typical Web applications.
-
Command Line Tool
-
This application which shows how to use libwww for building simple batch
mode tools for accessing the Web. The tool supports HTTP, FTP, Gopher, NNTP,
Telnet, and WAIS. The HTTP support is consistent with the HTTP/1.0 specification
including the methods
PUT
,
POST
, and
DELETE
.
-
Mini Robot
-
A simple application which shows how to use libwww for building robots. The
robot has no constraint model but uses pseudo-threads and interleaved I/O
which allows for a large number of outstanding requests. The robot supports
HTTP, NNTP, FTP and Gopher using either the
GET
or the
HEAD
method.
-
Mini Server
-
A small application showing how to implement a server or a proxy using libwww.
The Mini Server also uses pseudo-threads and interleaved I/O which makes
it highly portable and very fast. The server does only support
GET
.
In addition to this set we have much experience from real applications like
the
Arena browser
and other Web GUI based clients. The Library is
freely available from the
World-Wide Web Consortium
software distribution together with all the example applications as well
as Arena.
Lessons Learned
A summarization of the lessons learned from developing the libwww API as
it looks today is a follows:
-
APIs
must
be layered
-
No single API can provide the flexibility required to support different types
of applications. Medium level APIs can provide cross application functionality,
and high level APIs can provide support for specialized applications.
-
APIs
must
support a dynamic, open-ended set of features
-
No API, regardless of the complexity, should impose a limit on adding
functionality. The experience from developing libwww shows that no feature
can in fact be considered essential enough that it should not be dynamically
replaceable. The core registry mechanism in libwww is a step in that direction
but does still impose a set of assumptions on what is considered "essential".
-
APIs must be thread safe
-
As native kernel threads become increasingly common and programming languages
start supporting threads, more and more applications will take advantage
of threads and the flexibility they provide. This means that APIs need not
only to be thread aware but must actively support threads. Currently, libwww,
is thread aware via its pseudo thread model, but it is does not have full
thread support.
-
Formalized APIs are
required
-
In practice, most APIs depend on their immediate environment such as the
features provided by a specific programming language. Examples of features
that do have a major impact on API design are garbage collection, class
inheritance and character sets. Better tools for describing API in a language
independent, formalized fashion, are required in order to supply truly language
independent, interoperable APIs.
Henrik Frystyk Nielsen,
frystyk@w3.org