Over Generalising

[an error occurred while processing this directive]

Things to beware of when generalising software and interfaces into 'frameworks'

When designing systems and data formats, it can be tempting to create a one-size-fits-all solution that covers all circumstances, or an 'open architecture' framework that you can 'just plug new things into'. In doing this, beware of:

Producing a solution that is so generic that you're back where started from,
Obscuring benefits of the underlying technology,
Introducing a new bespoke skillset to learn and maintain.

Single, Common Formats

When you're moving data about, it's tempting to make things simpler by passing it around in some 'common format'. For example if you have instruments that will register their capabilities, it's tempting to have a common 'instrument' format. But if the instruments have no common characteristics - such as a windspeed meter, a mobile video camera and a pollution level meter - think again.

Redundant Fields

A one-size-fits all format tends to mean lots, even all, of the fields are 'optional'. Adding special rules for which fields must be filled in if other fields are present requires a new component and a new skillset.

Define formats appropriate to the data being held. Factor out common items (such as geographical location) and assemble.

Vague Fields

Another option is to leave the fields vague enough that they can be used for several things. For example, our video camera may have a battery, and our pollution level meter some kind of chemical fluid, and so you can have a field "Remaining".

But without type and defined meaning, you can't tell what is remaining without knowing what the instrument is. So your form is generic, but the implementations that use it have to know what 'type' it is from a field value, and cannot properly use it if they don't know that value. You can display it - ie, you move the identification to the human being. But you can't use it, say, to order a resupply.

Similarly you cannot tell what components can process it or not because all of them have to accept the format. Contract and 'type' constraints are lost.

Shoehorning - losing features - Simplest common denominator

Now you have a framework with a generic format, you should be able to "just plug in" a new instrument. And if you're instrument is close enough to what you originally had in mind then 'all you have to do' is write the code to transform its output to something the framework can handle. But if it's not, then you have to modify your whole framework to handle the new features. Or you have to leave those features out, blocking them from other components attached to your framework that might know how to use them.

False satisfaction

When a generic format has been agreed it's tempting to think that interoperability has been agreed. It hasn't, it's just been moved. The generic format must still be handled according to its different data type contents; and that, after all, was the original interoperability problem.

Design patterns vs implementations

Design patterns are frequently implemented for reuse. Don't. It may be tempting to write an 'Observable' class with methods for registering 'Observer' interfaces and distributing 'ObservedEvent' classes. But you lose type information and safety, and there's no contractual way of distinguishing between distinct events.

Single, common interfaces

Similar to above. Building abstract/generalised interfaces to sets of specific solutions cannot properly handle all the ins and outs. For example, a fully generic interface to file access has to handle various remote file systems, and all the different delay, display, authentication and connection errors that might happen.

The ultimately generic interfaces are "data producer" and "data consumer". These are typeless and useless; a maintenance and debugging nightmare.

Rewriting programming languages

When you find yourself reinventing types such as strings and numbers or language structures such as conditionals, take a break and think whether you'd be better off just coding it.

Loose vs close coupling

When assembling a lot of components, especially when you can forsee other components being added later, you need to reduce the interdependencies. If every component has to know about every other component's data type it can quickly become impossible to maintain.

However... the answer is rarely to make one single common format, for the reasons above. Instead think about which ''sets'' of formats are common. Think also about what the consumers might need to know anyway about the sources.

For our example, any consumers of pollution meter outputs will need to know local environment information as well (windspeed etc). Other consumers might just want to know name, owner and position. Others might not be interested at all. A different set of consumers will be interested in video feed, and those will require very different handling. Only a few will be interested in both, and they will need to 'understand' both anyway.

Joys of Generic

On the other hand, it may be that all you need is a generic form.

For example, you might translate all of your produced data into a table with appropriate labels, if all you want to do is put it into a spreadsheet to be sorted, plotted, transformed, etc by a human.

Examples

picontainer

Structures lose type, and objects lose contracts. A framework's framework. See PicoContainerIsBadSingletonIsGood and SeparatingConfigFromExecution

VOTable

VOTable is an XML document format that is the agreed service interchange standard for astronomy formats. It is essentially a flat 2d table of values, including extra details on each column such as units.

Aside from a few quirks, it allows components to take a wide variety of data forms to be manipulated generically, say onscreen in a graphing application.

However as a common service interchange format, it prevents any standard contracts being defined for any service. It requires all data to be placed in a 2x2 matrix. And there is no way to tell if a VOTable is holding a source (astronomical object) list, candidate list, magnitude histogram, spectral energy distribution, etc, and so no way of specifying these in the service contracts.

Java Beans

Java classes that provide a constructor without arguments and getXxxx() and setXxxx() methods where Xxxxx is a 'property'. A required generic constructor and property setters means beans can be in invalid states, which loses an advantage that object oriented programming provides.

Use only when necessary.

Commons Logging

I don't entirely agree with all these comments (particularly the amusingly billeous one at Bile Blog ) but here is Think Again before using commons-logging. See Dependency Inversion Notes.

Remote File access

See LightAccessToRemoteFiles, my framework for accessing remote (and local) files. Encounters exactly the same problems as given above, even though it's only dealing with files and not even trying to include other remote tree like structures.

BinX

BinX is used to describe arbitrary binary documents using XML, and in some cases can provide XML-like access to those documents. Review here. But aaargh BxObject - ''why?!''

Units

(Will write more on this) Units are not straightforward. Specifying units may not be sufficient; ie how do you tell if the numbers are fluxes, or their relationships with each other. (UCDs + ontologies). May be heirarchical. Do you still 'squash' the data into the 'standard' form? What do you lose if you have separate forms for each data type?

CEA

CEA vs Web service. Adds callback and parameter metadata. And effort and tools to implement.

Effort

A programming language and a decent library gives you one of the most powerful frameworks/general solutions you can have. Programming skills are reusable and widely available. Consider the learning curves of both options.

Creating Frameworks

What is the obvious extra effort to write the framework/generalised system, compared to the expected saved effort in applying it? How much effort will be required to maintain it, compared to a 'straight' application or format?

Adding to frameworks

A programming language requires that code is written and tested to handle additions, and it requires compiling. Compare the effort required to do this, with writing and testing configuration files/etc for the framework.

Some things you can do.

Bear in mind that adding value and meaning is about reducing flexibility, not increasing it. Subclasses actually restrict rather than extend their superclasses; an apple has a restricted set of attributes that a fruit has. The same applies to structures and validation; an XML document that must conform to a specific schema has more explicit and implicit information than a freeform one.

Standard keywords vs standard schemas

Rather than defining one size fits all, try defining and agreeing a set of keywords that will be used to represent specific data with types and/or structures.

Factor out

Factor out the common stuff into libraries and include the relevent ones in the relevent specifics, rather than including everything in one major 'root'.

More Blind Mantras

Do Only What You Need To
Keep It Simple

References

This note is overgeneralised as it talks about all general solutions...

Service Interchange formats (including form vs content)
Joel on Software
Reuse or bespoke
Over Modelling
Dependency Inversion
IBM Developer Works Quality buster Single Technology Solutions
ServerSide's article Services and coupling

A commentry on 'solving' solutions by abstracting to where you started from. [an error occurred while processing this directive]