Downloading web resources with http.stream - basics
From MorphOS Library
Grzegorz Kraszewski
Introduction
The http.stream class is one of Reggae stream classes, in other words data sources. In a chain of Reggae objects, a http.stream instance will be always the first object, having only one, output port. A http.stream object may be also used standalone, not connected to anything, just to retrieve any data resource reachable via HTTP protocol and particularly its GET request. From this point of view, http.stream is just embeddable HTTP/1.1 client with simple yet powerful API. A brief list of its features is given below:
- Socket API encapsulation. http.stream completely isolates application (and its programmer) from bsdsocket.library and TCP/IP stack. Only very basic knowledge of TCP/IP is needed to use http.stream with success.
- Unlike bsdsocket.library base instances, http.stream objects may be shared between processes (with the only exception that object must be disposed by proces which created it).
- The class has builtin parser of HTTP response headers.
- The class has also an easy to use HTTP request header builder, so custom fields may be added to the header.
- HTTP proxies are supported.
- The class supports chunked transfer and media streaming over HTTP.
- Optional user agent spoofing is possible.
- When connecting, HTTP redirections may be followed automatically.
- The class is able to handle streams longer than 4 GB.
- Easy protocol debugging via MediaLogger.
The class has some disadvantages however. Some of them may be removed in future versions:
- No support for POST requests.
- No support for persistent connections.
- Making connection, sending request and receiving response header is done in the constructor, so it is synchronous to the application. Any network delay in constructor blocks the application until timeout or other error is reached. It can be worked around by putting all the network operation on a subprocess.
Minimal example
When we skip any error handling, the whole process of downloading data via HTTP protocol reduces to three lines of code:
#define DATA_LENGTH 7465 /* just example value */
UBYTE buffer[DATA_LENGTH]; /* place for data */
Object *http;
http = NewObject(NULL, "http.stream", MMA_StreamName, "www.morphzone.org", TAG_END);
DoMethod(http, MMM_Pull, 0, buffer, DATA_LENGTH);
DisposeObject(http);
We assume here, http.stream class has been loaded previously with OpenLibrary() (see Opening and closing individual classes). The code will download first 7465 bytes of MorphZone main page (HTML code), assuming there will be no error. This assumption is rather risky, because a network operation can fail for numerous reasons. Then we will be calling method on the NULL pointer and then disposing it, which can even lead to application crash. For this reason http.stream offers a few ways for handling errors. They will be discussed later, for now a minimal error handling is checking NewObject() result against NULL. This is used in a simple example downloading the first 1000 bytes of a resource specified in the commandline and dumping them into the console. Note that using this program for binary resources (like images) may result in rather weird output... I recommend running this example along with MediaLogger, to learn http.stream protocol debugging features.