Category Archive
The following is a list of all entries from the Uncategorized category.
Java IO
Java’s IO classes offer a surprising amount of functionality; it’s kind of surprising that they seem unappreciated in practice. The documentation on them rarely extends beyond simple file I/O. File I/O is usually what you use them for, of course, but the package offers alot more than simply reading and writing to files.
Streams are the Fundamental Abstraction of java.io.*
java.io consists of around 50 separate classes, and you can divide them up in a few ways. Almost all deal with the input or output of a stream. A stream really isn’t anything formal; it’s just a sequence of bytes that you interpret as data.
Streams may not necessarily be intuitive in some ways, though, so I’ll go over some of the concepts. Understanding these is critical to avoiding alot of the weird situations that you can find yourself in with streams.
Streams have no Implied Structure
The data contained inside a stream has no real indication of its structure; it’s up to the programmer to read it correctly. Java provides two sets of classes to deal with the two common forms of data: InputStream or OutputStream interpret data merely as bytes. Reader or Writer assume your data is textual, and they interpret data as a series of characters. These four abstract classes are the core of the IO package. Other classes supplement these to allow a wider range of expression and abstraction when dealing with data.
The Length of a Stream May Not Be Known
An easy analogy for streams is that they are like arrays. The stream contains a sequence of units of data, the programmer reads them in bits and pieces, and from them, the programmer gains useful information about something.
The analogy breaks down when it comes to finding a length. If you imagine an array, its size is known and fixed. You can access any piece of it at any time, instantly. The entire thing is immediately available. On the other hand, a stream’s size is not known. It’s assumed to end at some point, but may very well not depending on its source.
To belabor this point (and probably appear very patronizing), arrays are crayon boxes. You can pull out any given crayon without having to deal with any other crayons. Streams are like coke machines. You can access one unit of coke per request. Many different functions make this more convenient and faster, but in the end, you (or the methods you’re calling) are stuck dealing with one coke can at a time. You don’t know how many coke cans there are in the machine until the machine tells you it doesn’t have any more.
Of course, we’re talking about streams in the purest sense; for our examples, we know the length of the stream because we provide the data explicitly. The length of files may also be known since the array that we’re “streaming” from is local. So while a stream has no concept of its length, that doesn’t necessarily mean we never know its length. It just means that streams have no concept of their own size.
Streams Do Not Need To Immediately Return; They May Block
The power of streams is in their abstraction. Streams are merely sequences of data. The format is freeform, the length is not a concern, and the source need not be local. This flexibility has its limitations, however. The format must be explicitly given to the user of the stream, an unknown length requires care on the part of the programmer, and non-local sources may take a long time to return.
I’ve mentioned the first two, but the third point definitely requires some mention. A stream is dependent on its source for information. The speed at which that source provides our stream with data is not known, but we are ultimately dependent on that speed. While we’re waiting for that data, we must wait. This state of waiting is known as blocking – we’re blocking any further progress while we wait.
For an example, I’ll return to the crayon box/coke machine example. Getting a crayon from the crayon box is dependent only on how fast you yourself retrieve it. There’s no waiting because between the two concepts of “wanting the crayon” and “having the crayon,” you’re working towards the latter.
If you’re waiting in line at a coke machine, there’s still the two concepts of “wanting a coke” and “having a coke.” However, between these two concepts is a large amount of doing nothing. You’re waiting for other people to do stuff; you’re not working towards the goal directly. The coke machine and its users are blocking you from getting your coke.
There’s a couple clarifications to be made here when relating it to streams. First, the concept of blocking can occur with local things; accessing the hard drive is very expensive compared to accessing memory directly. In this example, your request for data on the hard drive will block until that data is available to use. Even if your request is the only one active, you’ll still have to wait for that process to succeed. Blocking simply means ‘the program cannot continue to execute until this method call successfully returns.’
Blocking Resources Do Not Always Need to Block
Blocking isn’t really an exact term. Waiting for a webpage to load is apparent blocking; you can’t use the webpage until it’s fully loaded. (Of course, there’s incremental loading of pages, but just work with me here.) You can see and feel the waiting. On the other hand, file I/O may not be perceivable. Large files, maybe, but small ones won’t. Memory access, processor cache access are all used with completely inapparent, but existent, wait-times between request and response.
These waits are blocking in the conceptual sense. Of course, we talk about blocking when it comes to network and file I/O but don’t talk about it when it comes to memory I/O. There are two reasons for this: The first is convention – file and network access take long enough to perceptibly block.
The other is more arcane: File and network access do not offer themselves exclusively to the requester. They may be servicing many concurrent requests at once, so we might be queued. These non-exclusive places require may require some form of blocking before they’re actually doing any work at all on our request. This is not true to exclusive resources; these service one person exclusively and constantly.
This last point is more important than it appears: The fact that non-exclusive resources may block means that they’re considering blocking resources. Whether they block on any given request is irrelevant. This marks a distinction between the concept of ‘blocking’ and the action of ‘blocking.’ I may not be blocking, but if I can block on some requests, I am a blocking resource.
Understanding blocking will save you alot of trouble when it comes to streams (or any sort of synchronous request). The fact that streams block means that stream is a potential bottleneck. A set of classes in java.io.* exist to minimize the strain these streams may cause to your performance.
read() Returns One Unit of Data from a Stream
The public interfaces of these classes mirror each other. The input classes mirror the output classes and the byte-based classes mirror their character-based ones, so learning one part of it lets you transfer that knowledge to any other set of classes. Very useful. I’ll start with read() and write().
Derivatives of Reader and InputStream function in the exact same way when dealing with simple I/O. Here’s an example to show this:
// Read one character from the stream
private static void doSomeCharIO() throws IOException {
Reader reader = new StringReader("This is a happy string.");
int firstValue = reader.read();
System.out.println(firstValue + " " + (char)firstValue);
// Output: "84 T"
}
This straightforward example is probably the smallest IO one can do. We create a StringReader which lets us treat a string as a stream. We then call reader.read() to get the first character. Notice that we don’t actually get a char back. Instead, we’re given an int that we have to cast.
One reason this doesn’t return a char is to be consistent with InputStream. (The other reason is negative values are indicative of errors; I’ll talk more later) At any rate, the lesson here is this: When using any input stream or reader, read() returns a single unit of data (a byte or character depending on the class used) that is always in terms of an integer. Casting that integer to a byte or character, respectively, will get you the data you want.
To prove my point that these two classes are basically the same, here’s the above example but using an InputStream:
// Read one byte from the stream
private static void doSomeByteIO() throws IOException {
byte[] byteArray = new byte[] { 'T', 'h', 'i' };
InputStream stream = new ByteArrayInputStream(byteArray);
int firstValue = stream.read();
System.out.println(firstValue + " " + (char)firstValue);
// Output: "84 T"
}
Since we’re using a different format, we need to use different classes. However, other than that, the two examples are identical.
write() is Symmetric to its Reading Counterpart
Write does what you’d expect: It writes a single byte or character. The signature of the simplest method is void write(int b). This signature is identical for both OutputStream and Writer. Since the potential range of values in a single char or byte is less than an int, you don’t need to explicitly cast them. Here’s the code for both, just to demonstrate:
private static void doSomeCharWriting() throws IOException {
int theValueOfT = (int)'T'; // Should be 84.
Writer writer = new StringWriter();
writer.write(theValueOfT);
System.out.println(writer); // Prints 'T'
}
private static void doSomeByteWriting() throws IOException {
int theValueOfT = (int)'T'; // Should be 84.
OutputStream stream = new ByteArrayOutputStream();
stream.write(theValueOfT);
System.out.println(stream); // Also prints 'T'
}
As you can see, it’s straightforward. We instantiate the correct class, write our value, then can send it directly to System.out. It should probably be noted that neither OutputStream nor Writer guarantee toString() will work in this way for all their derivatives; retrieving the value of a written stream may vary from class to class.
I’m getting a little self-conscious about the rate at which I’m introducing this stuff. I figure these mechanics seem fairly self-evident; my intention is to demonstrate these fundamentals fully before moving on to more complicated stuff. My hope is that this foundation will serve you well when you’re trying to understand the much more complicated stuff later, as you’ll have this to fall back on.
But after all, dealing with streams in terms of single units is not exactly common place. You’ll probably be dealing in terms of groups of bytes or characters. You may abstract even further and deal with streams in terms of primitives and objects. If your data has some parsing involved, you may need to navigate through the stream. Java’s facilities will serve you will in all these cases.
A Stream Can Work With Arrays of Data
read() and write() can understandably be given arrays as well as single units. The character-based and byte-based stream classes differ only in what type of array they’re expecting, so I’ll only demonstrate one such type here. This example will use a CharArrayReader just because we haven’t used one yet; there’s no motive beyond that.
private static void doSomeSimpleCharReading() throws IOException {
Reader reader = new CharArrayReader("This could be a character array!".toCharArray());
char[] charArray = new char[4];
reader.read(charArray);
System.out.println(charArray); // Prints "This"
reader.read(charArray);
System.out.println(charArray); // Prints " cou"
}
This little tidbit fills our charArray with as many bytes as it can hold (in this case, just 4). Notice that since we’re just dealing with data, the array we give it has no concept of being “filled” or not. You can see here that when we call read() again, we overwrite our previous contents. This can be fairly convenient when you’re dealing with data that comes in known quantities.
On the other hand, having it destroy your array every time it reads may not be what you want. Furthermore, the Reader will fill the array with n bytes, where n is the length of your array. You may not necessarily want it to take that many, but don’t want to make many arrays to handle all the different sizes. Java has more advanced capabilities to deal with this sitaution, but for the moment, the idea is useful to show the final version of write():
private static void doSomeExactCharReading() throws IOException {
Reader reader = new CharArrayReader(
"This could be a character array!".toCharArray()
);
char[] charArray = new char[10];
reader.read(charArray, 0, 4);
System.out.println(charArray); // Roughly prints "This"
reader.read(charArray, 5, 5);
System.out.println(charArray); // Roughly prints "This coul"
}
This version of read expects a char[] array for its first argument, like the second form. It also expects an offset and a length. These will be used to determine where to start writing in the array you’ve given it, and how many characters to fill. Notice that you’ll be given an IndexOutOfBoundsException if you try to overflow the array.
Streams Do Not Panic When Reaching the End Of A Stream
At some point, you’ll reach the end of your stream. Many languages throw an exception when this occurs. They provide no other means to determine whether you’ve reached this point, so you’re always having to catch it. This condition makes for clever code (since your EOF, or end-of-file for convenience here, logic is separated), but it’s hardly exceptional. I imagine it is merely the cleverness that justifies using the exception. This is an act grates on me.
Luckily for us, Java reacts calmly in the face of EOF. A tempting solution is to use a stream method called available(). This returns an integer that represents an estimate of how many bytes that can be read without blocking on the next invocation of a method for this input stream. It is NOT necessarily the length of the stream; as I mentioned earlier, streams are coke machines. We don’t necessarily know how long the stream is because the stream may not know. For example, one can’t open a file, make an array of the size that available() returns, and live happily ever after assuming that’s the full length of the file.
The definition I heartlessly stole from the API doc’s is very deliberately phrased. [[ To be continued... ]]
Streams Can Optionally Provide Mark/Reset Functionality
[[ To be continued... ]]