Under The Hood What's going on behind the scenes?
Wilfred Springer
Table of Contents 1. Codec .......................................................................................................................... 1 2. CodecFactory ............................................................................................................... 2 2.1. CompoundCodecFactory ...................................................................................... 5 2.2. WholeNumberCodecFactory ................................................................................. 5 2.3. BooleanCodecFactory ......................................................................................... 6 2.4. ObjectCodecFactory ........................................................................................... 6 2.5. ListCodecFactory ................................................................................................ 7 3. Binding ....................................................................................................................... 7 4. CodecDecorator ........................................................................................................... 8 4.1. LazyLoadingCodecDecorator ................................................................................. 8 4.2. SlicingCodecDecorator ........................................................................................ 8 The previous chapter introduced a couple of simple introductory cases, showing some of the tricks the framework has up its sleeves if you do not feel the desire to customize anything. While you were reading that chapter, you might have already gone like "hmmm, that's sweet, but unfortunately it doesn't work in my corner case". If that's what you thought, then this is the chapter you need. In this chapter, I will explain what is actually going on under the hood, in order to understand how to extend the framework yourself. The focus in this chapter is on four major abstractions: the Codec interface itself, the CodecFactory interface, the Binding interface and the CodecDecorator interface. In addition to discussing the actual interface, this chapter will also discuss some of the implementations, in order to help you to understand how everything fits together.
1
Codec Let's first revisit the Codec interface. In the previous chapter, we already talked about the way you use this interface. In fact, we said that you better use the Codecs convenience class in all cases. If you do that, then there is actually very little to know about the Codec interface itself. It just magically works. However, if you want to extend the framework yourself, then this is the first interface you really need to understand, since this is probably the interface you need to implement.
Figure 1. Codec Interface public interface Codec {
1
Under The Hood
T decode(BitBuffer buffer, Resolver resolver, Builder builder) throws DecodingException; int getSize(Resolver resolver); Expression getSize(); CodecDescriptor getCodecDescriptor(); Class>[] getTypes(); Class> getType(); } The decode(BitBuffer, Resolver, Builder) is obviously the operation that will decode data from the BitBuffer into an instance of T. The Resolver allows the Codec to resolve references used in Preon annotations. The Codec interface is the interface implemented by objects that are able to decode data from a BitBuffer and to encode data into a BitBuffer1. In addition to that, the Codec needs to be able to make some sort of prediction on the number of bits occupied by the encoded data. And last but not least, the Codec needs to be able to return a CodecDescriptor, used for rendering documents with a description of the Codec. When I just said that Codecs are capable of decoding data from a BitBuffer, you could have actually read 'decoding an object from a BitBuffer'. Codecs are associated to a single type, and are expected to return only a single instances of that type from the BitBuffer. In many cases, your objects will hold references to many other objects. Does that mean that a single Codec needs to be able to recursively traverse the object graph and understand how to decode each of the individual members of the objects that it encounters? The answer to that question is both yes and no. "Yes", since whenever you invoke decode on the Codec, it needs to be able to reproduce all objects that are referenced by the object that is going to be returned. "No", since the Codec does not have to understand all of that itself. It can simply delegate to other Codecs, one for every type of attribute it encounters. If you are decoding an object, you could actually use the Codec interface directly. If you create an instance using the Codecs class, they just magically know what to do. We just learned that Codecs constructed like this, will most likely delegate their work to other Codecs, which in turn will most likely delegate their work to other Codecs, and so on and so forth. This way, the Codec both hides a chain of responsibility, but is also able to act as a link inside a chain of responsibility.
2
CodecFactory The previous section ended with stating that a single Codec most likely delegates to other Codecs, which in turn delegate to other Codecs, etc. Obviously, each Codec has to be constructed before it can be used. All of these instances are created as a result of the create() method on Codecs. But how does the Codecs class know which ones to create? As you might have already have guessed by its name, it is of course the CodecFactory. The CodecFactory a single operation, that is expected to be able to return Codec from the context passed in, or return null.
1
The latter is currently not supported yet, but it's not unlikely that it's going to be implemented in the future.
2
Under The Hood
Figure 2. CodecFactory Interface public interface CodecFactory { Codec create(AnnotatedElement metadata, Class type, ResolverContext context); } Almost every type of Codec you will ever use, will be created by a CodecFactory. If you are searching the Preon codebase for the different type of Codecs supported by it, then chances are you will stumble across CodecFactories only. And if you want to extend the framework with your own Codecs, the thing you actually need to pass to the Codecs class is a CodecFactory, and not a Codec. The CodecFactory needs to be able to create a Codec from three parameters passed in. The type of object expected, a socalled ResolverContext and metadata. If the Codec is used to decode data to be injected in a field, then the metadata provides access to the annotations defined on that field2. The third parameter (ResolverContext) is a Limbo ReferenceContext. This is the object that supports your CodecFactory in creating references to the context of the field for which it is currently trying to create a reference.
Example 1. Expression sample class Stuff { @Bound int nrOfThings; @BoundList(size="nrThings") Thing[] things; } Let's take the class above as an example, and assume that your CodecFactory needs to see if it is able to construct a Codec for the "things" field. The annotation contains an expression defining the number of "things" in the array. However, the size annotation attribute is just a String. In order to be able to turn the expresion "nrOfThings" into something usable, we need to turn that expression into something we can actually evaluate - in this case, a Limbo Expression object. And if there is a problem with this expression, we obviously want to find out early.
2
In general, the CodecFactory should not rely on the assumption that the metadata passed in is based on a field. It should just treat it as a number of hints suggesting how to decode data.
3
Under The Hood
Figure 3. Expression applied
There are basically two things that could be wrong with the expression: either the expression cannot be interpreted, or the expression contains references to variables not available in the context in which the expression will be evaluated. The ResolverContext supports detecting the latter case. This is the way it works: whenever your CodecFactory gets a chance to create a Codec based on metadata, type information and a ResolverContext passed in, it will have to use the ResolverContext to create Expression objects. The Expressions class used to create Expression instances accepts a ReferenceContext as a parameter, and ResolverContext is nothing but a special ReferenceContext. (One specific to Preon.) Normally, you would pass the ResolverContext directly, but there are cases in which you could consider replacing or wrapping the ResolverContext3. So, the first and most important purpose of the ResolverContext is early validation of your expression. The second and also important purpose of the ResolverContext is to facilitate documentation getting generated from your Codec. If your CodecFactory constructs a Codec based on an expression, then the documentation generated by that Codec probably needs to take that into account. In case of the example of Figure 3, “Expression applied”, you would expect a description similar to this: "First you read a 32-bit integer. Then you read a list of items with the size corresponding to the 32-bit integer you just read before." Any type of realistic documentation of this file format needs to include this dependency. You want the documentation to clearly state that the size of the list size is dependent on the 32 bits you just read before. That last bit "the 32 bits you just read before" is encapsulated in a Reference. And the ResolverContext allows the Expressions class to obtain these references without having to analyze the entire class again. So, the Expression "nrOfElements" will be parsed into a Reference to the nrOfElements attribute defined before. It's the responsibility of the Reference to render itself in a useful way. The Codec 3
There are actually cases in which you might want to replace that ResolverContext with another one. Typically when you want to introduce new variables, or if you are basically 'popping' or 'pushing' the stack. (If your current context changes into the context of the attributes type.)
4
Under The Hood
or CodecFactory does not even try to make sense out of it. It just relies on the ResolverContext to return a proper reference. Expression expr = Expressions.create("nrOfElements", resolverContext); // Potentially results in" // "the number of elements defined before" expr.document(....); Now, this may be one of the areas in which the flexibility is getting a litle in the way of understanding what's going on here. However, just to comfort you a little, it's just object orientation. That's all it is. The Reference itself decides how it needs to be represented. And the type of ResolverContext passed to the Codec decides how these References are getting constructed. With that, sky is the limit.
2.1
CompoundCodecFactory The CompoundCodecFactory must be one of the only implementations of CodecFactory that doesn't actually create any other Codec itself. The main purpose of the CompoundCodecFactory is to hide the complexity of choosing a particular type of CodecFactory behind an interface. The CompoundCodecFactory references a list of other CodecFactories. When asked for construction of a Codec, it will simply ask all of the factories it references and try each of them to construct a Codec. If all fails, it will simply return null, just what you would expect for a CodecFactory.
2.2
WholeNumberCodecFactory There are many CodecFactories that create Codecs themselves. (There are also a few other CodecFactories that don't create Codecs themselves, apart from the CompoundCodecFactory, but that's a subject that can wait for a while while.) This section starts with one of the most simplest: the WholeNumberCodecFactory. The WholeNumberCodecFactory creates Codecs capable of decoding - well - whole numbers. I didn't want to say integers, since that might lead you to be believe that its capable of decoding java.lang.Integer and its primitive counterpart only, which is not the case. WholeNumberCodecFactory supports decoding byte, short, int, long and all object representations of those types. The WholeNumberCodecFactory is the first example in which a a CodecFactory could use the metadata passed with annotations to decode the data in the proper way. By default, it will return a Codec for byte, short, int or long type of fields whenever: • null is passed in as metadata; • An @Bound instance is passed in as metadata; • An @BoundNumber instance is passed in as metadata. The null case is not important for now. The @Bound annotation supports the default case. The CodecFactory will take this annotation as a signal that it actually needs to create a Codec, using the defaults for the type passed in. The @BoundNumber annotation supports the case in which you do want a Codec to be created, but you don't like the defaults. Here are some examples in which you would want to use the @BoundNumber annotation, instead of the @Bound annatation:
5
Under The Hood
• You want to decode 4 bits into a byte. • You want to force little endian when decoding an 32-bit integer. • You want to decode a 32-bit unsigned integer as a long. The WholeNumberCodecFactory is a good example of the pattern that you will find implemented in almost all other CodecFactories: • Attempt to return a default Codec whenever null or @Bound is passed in. • ...unless there is some other piece of metadata passed in, telling the CodecFactory how to customize the Codec it creates.
2.3
BooleanCodecFactory The BooleanCodecFactory is by far the simplest example of a CodecFactory (and associated Codec) that you could ever come up with. If the CodecFactory is challenged with a boolean type (either the primitive type or the object type) and the presence of an @Bound annotation in the metadata passed in, then it will create a new Codec. Whenever this Codec is asked to decode a value from the BitBuffer, it will read a single bit and return true if the bit is 1, and false otherwise.
2.4
ObjectCodecFactory In theory, it could have been possible to construct Codecs by hand, by using a constructor. However, that's not actually something you want to do yourself. The Codec created would have to closely resemble the datastructure you need to decode. And since every Codec is capable of decoding one value, and there is a fair chance that the object you are trying to decode exists of many other objects, it would even be quite hard to do this yourself. The good news is: you don't have to do all of that yourself. The ObjectCodecFactory basically does it all for you. The ObjectCodecFactory works on basically all existing objects. If all other CodecFactories fail, then the ObjectCodecFactory might still be able to return a Codec, even though the Codec created might basically be a no-op. (It won't decode anything.) That's why normally every CompoundCodecFactory instance is advised to try the ObjectCodecFactory as a last attempt. The ObjectCodecFactory is probably much simpler than you would expect. Suppose that you want to create a decoder for instances of type A: 1.
Get the list of all fields declared on type A.
2.
For each field declared on type A, create a Codec for the type of value accepted by that field, and wrap both the reference to the field and the Codec for its values in a new Binding instance.
3.
Create a new Codec that - on an invocation of decode - will always create an instance of type A, and populate its data by giving every Binding the opportunity to load its data from the BitBuffer passed in. (The Binding will simply use the Codec it references to do the actual decoding of the field's value.)
You may have wondered how the ObjectCodecFactory creates a Codec corresponding to the field's type, in Step 2. The answer is simple: it uses a CodecFactory. It may be different CodecFactory than
6
Under The Hood
the ObjectCodecFactory itself, but it's highly likely that it references the ObjectCodecFactory itself somewhere under its covers. We just said that ObjectCodecFactories use Binding objects under the covers. It's important that you know about these guys, since the ObjectCodecFactory actually allows you to override the way they work, by pluging in your own BindingFactory. We are not going to discuss the typical use case yet, but for now it's important at least to know they exist. (Check out Section 3, “Binding” for a typical example on how to leverage this plugpoint.)
2.5
ListCodecFactory With the three CodecFactories listed above, we would already be able to decode nested objects, in which each of the objects fields references either a numberic value or another object. But that's not enough. (In fact, we don't even know when it would be enough; that's why Preon is an extensible framework. We leave it up to you to decide when you consider it to be enough.) One of the most important missing cases is support for lists. Many binary encoding standards have some repeating sequences inside. This is where the ListCodecFactory comes in. The ListCodecFactory allows you to decode lists of objects. At this stage, it's limited to a list of a certain size, but it is expected that there will be other implementations that have other ways to demarcate the end of the list. The ListCodecFactory creates a Codec whenever the @BoundList annotation is passed in. It will use the attributes of this annotation to figure out how and what to decode. In the simple case, that could be as little as stating the type and the size of the list. If you would create a Codec simply by passing the type and the size of the list, then the ListCodecFactory can be expected to return a Codec implementation that will - on calling decode() return a List implementation that allows you to visit all elements in the List. Don't expect one of the default implementations of List though. By default, the Codec created by the ListCodecFactory will return its own implements of List, one that decodes object on the fly, on demand. It is important to emphasize the difference between the Codec that returns the List instance, and the Codec used to decode elements of the List. These are two completely different Codecs. The Codec decoding the List is seeded with a Codec capable of reading the elements of the List. ??? shows how these elements collorate in a real world scenario.
Figure 4. Decoding a List
3
Binding A couple of sections back, I talked about Bindings, and how the Codec created by an ObjectCodecFactory uses these bindings to load and store values from and in a Field. I also said that you can customize the way these Bindings behave. This section is going to give you an example. Many binary formats have some conditional built into the specification. Here are some examples: • If the version of the file is bigger than 400, then read this data structure first, otherwise continue with the next data structure.
7
Under The Hood
• If the first bit read is 1, then read 7 bits and decode it as a byte; otherwise continue to read 7 + 8 bits and decode as a short. All of these cases could have been solved by breaking out of the ordinary framework and implement the entire encoding/decoding logic yourself. That would not only be really hard, but also result in code that will not be self-documenting in the way Preon normally is. Preon takes another approach, and allows you to declaritively specify the conditions in which the framework should try to load data in bound fields, using the @If annotation. The @If annotation takes a single argument, which is the condition stated in Limbo. The reason Preon is capable of dealing with these expression is because it will create a Binding that is capable of loading data from the BitBuffer if the condition holds. Which is why it is sometimes convenient to have the ability to create your own custom Binding implementation instead of the default one.
4
CodecDecorator Up til now, we have only seen examples of CodecFactories creating Codecs. Codecs are however not always constructed by CodecFactories. If you want the framework to deal with your own Codec, then there is another way to make it create your own Codec, which is by implementing the CodecDecorator interface. As you may have guessed, the CodecDecorator has something in common with the Decorator Pattern (see GoF). The CodecDecorator allows you to transparenly add additional behaviour to a Codec, by applying some 'decoration'. So, where the CodecFactory accepts the type and metadata indicating the specific Codec you want to construct, the CodecDecorator also accepts the Codec that should be wrapped.
4.1
LazyLoadingCodecDecorator When you ask the LazyLoadingCodecDecorator to decorate an existing Codec, you get a new Codec that - on receiving a call to decode - will not pass the call on to the Codec it decorates; instead, it will return a proxy that acts as the object that needs to be created by the decororated Codec. The actual object will only be loaded from the BitBuffer right after you call an operation on that proxy for the first time. The LazyLoadingCodecDecorator is triggered by the presence of the LazyLoading annotation on the type you want to have loaded lazily.
4.2
SlicingCodecDecorator The SlicingCodecDecorator decorates existing Codecs by returning a new Codec that - on receiving a decode call - passes the request on to the decorated Codec, but replacing the BitBuffer passed by taking a slice of the original. The main reason of having a SlicingCodecDecorator is to support type length value records. The main idea here is that Codecs reading the records data should not be required to know the amount of data
8
Under The Hood
to be expected. As you might have seen before, the Codec constructed doesn't support passing in the maximum number of bytes to be read, or anything. And having to implement that logic for all TLV record readers seemed to be a waste. Instead of having every TLV record Codec implement support for detecting the end of the record, the decision was made to externalize that behaviour and put it into a separate Codec, decorating the Codec that reads the data from the TLV record.
9