A Bristol Geek

No trees were harmed in the creation of this blog; however, a significant number of electrons were slightly inconvenienced.

XML Feed Configuration woes

It has been one of those weeks (well two in a row actually); some legacy systems went astray and some bad decisions were made by some of my colleagues. In both situations myself and other members of my team rode to the rescue and corrected the problems quickly. Needless to say I am glad that it is Saturday, and even happier that Daisy (the puppy) slept until 6:30 this morning!

Whilst I was resolving one of the fortnights problems I came across a “coding pattern” (for want of a better term) that can really cause problems. In an attempt to prevent its use I shall share it and my suggestions here.

In companies across the globe exist instruction services that a company’s customers and work-sources use to send work; most often these are XML Web Services (such as a SOAP or Rest interface) that receive XML files and create items in workflow, case management, or shop systems. I have both written and worked with a number of these systems; and I have seen great ones and frightening ones; both in their internal code and they way in which they work with the systems around them. The service I recently had to fix had one rather poor design decision; the way it stored its incoming XML files.

The service wrote each of the files to disk, every incoming instruction created one file in the folder structure named according to the incoming reference. You may ask what is wrong with that, it’s an easy to use and easy to locate store of all incoming instructions.

After a year there were just short of half a million files in one directory; if anyone has ever tried to open a folder with that many files they will know that windows hates it. Trying to locate a file in that folder left the cpu running at 100%. To make it worse the location of that folder was relative to the service, it was not configurable in the app.config file.

Needless to say I did not write this application; I have to simple “rules” that ensure this problem does not happen.

  1. Any external storage should be set in a configuration file (be that app.config, web.config, database, ini file, etc).
  2. Storage that works day by day should be organised as such:
    1. YYYY/MM/DD/{OPTIONAL UNIQUE}

Making the storage location a configurable value solves all sorts of problems:

  1. “Where are the files stored?”, how often have you heard this:
    1. Network Manager: Where are the files stored?
    2. Developer: Let me check the config, oh it’s not in here.
    3. Network Manager: Well where are they then?
    4. Developer: Let me go find the source code and I will get back to you?
  2. Running out of storage space (network managers never keep that under control do they!). You can quickly change the location without recompiling.
  3. A file server goes down, again you can quickly move to another location and keep the service online.

Using the date formatted folder structure (such as \\fileserver01\instructions\InsurerABC\2015\10\17) means that you keep each node of storage separate. No one folder gets too big in comparison to the others; windows doesn’t go crazy when you try to open the folder and locate a file. On top of this a couple of other things get easier:

  1. Locating a file.
    1. If you know the date you received the instruction you already have the correct folder.
  2. Archiving.
    1. Rather than having to locate an age range inside a huge folder, you can just cut and paste the 2010 folder to clear out that entire year.

So to sum up a rambling article that I didn’t get around to finishing in a timely fashion; don’t hard code configuration values in your application, and break up big folder structures.

Leave a Reply