May 3rd, 2012
Greetings FME folk,
As promised here is part two of my article on new transformers in 2012. Apologies for it being a week later than planned. It got bumped by my live blog on the FME World Tour.
If you remember, the first part was about the AttributeExpressionRenamer, RasterToPolygonCoercer, and SliverRemover. This part concentrates on three new XML transformers: the XMLFlattener, the XMLSampleGenerator, and the XMLUpdater.
XML gets a bad reputation as being difficult to understand and complex to use. And it is bizarre that something so simple and obvious to look at, can simultaneously be difficult to understand and a pain to work with.
But it’s only a pain when you don’t have the right tools: and FME2012 provides tools that work efficiently, in a way that is user friendly to even the most casual XML user.
If you don’t believe me, read below: or go see FME in action on any of the FME FME World Tour events. As Don says, “Let your XML love shine”!
One challenge to using XML data with GIS (and other spatial archetypes) is that spatial systems are commonly geared to working with relational and “flat” data structures. However, XML documents are object-oriented and often nested to a high degree.
Therefore, much of the challenge of handling XML data is in either:
- Reading an XML document and converting it to a GIS relationship-type structure
- Converting a GIS relationship-type structure and writing it as an XML document.
The first of these challenges is where the XMLFlattener comes in. Think of an XML document with all its different hierarchies and indented tags. Now turn that picture 90° to the left in your mind. Now imagine smashing down on it with a large hammer!
That’s what the XMLFlattener does. It hammers out the XML till you get attributes as a flat, GIS-like, data structure.
As an example, let’s take the “old lady who swallowed a fly” document that you might have seen demonstrated in various FME presentations. The idea in the demonstration is to create the XML from a series of incoming features (a fly, a bird, a cat, a dog, etc). The result is a highly nested XML document along the lines of:
<lady> <dog> <cat> <bird> <fly></fly> </bird> </cat> </dog> </lady>
However, in our case what I’ll do is take that output document and read it back into FME. I want to read the data in a flat structure, so I’ll use the XMLFlattener:
So the workspace reads the XML data with the Textline reader, using a new (for 2012) option to read the entire file into a single feature attribute (the usual is to get one feature per line).
Then I use the XMLFlattener to extract the information from the XML attribute. The result is an output like this (click to enlarge):
What I have there is a mix of attributes. Because there is a list included I know there must be multiples of one or another element; in fact you can see the old lady swallowed multiple flies.
There are complex attribute names, but remember, I could use the AttributeExpressionRenamer (see part 1) to automatically remove a lot of these Horse/Cow/Dog prefixes.
But, because it’s hard to handle data that is a mix of attributes and a list, I’m going to re-open the flatten options - this time in advanced mode - and edit the cardinality. What I’ve done is removed a “+” symbol (you’ll see if you try it yourself):
Now when I re-run the workspace I get everything as a list, like so (click to enlarge):
This might still look a little unwieldy but, because the data is fully exposed as a list, I can handle it much more easily. Depending on what I need to do with the data I could expose the list and explode it into individual features with the ListExploder. I could use a looping custom transformer to examine each list element individually. And, of course, I could use other list transformers such as the ListElementCounter to - for example - find out how many flies each old lady swallowed.
OK, the old lady example is obviously not something you’ll encounter as a real world scenario, but if you have used XML data yourself, you’ll know how realistic the data structure it uses is.
XML documents do not contain an internal schema definition in the same way other formats might. Their schema is usually defined by a schema file with a .xsd extension. An XML schema lets you discover the data types to be used and validate the names of the XML paths.
So, when it reads an XML (or GML) dataset, FME can use the schema file that comes with it to understand the data structure and build the workspace schema. If you don’t have a schema file, and FME can’t otherwise interpret the contents to understand what the XML/GML is supposed to be, then you’ll get an error when reading the data.
ERROR |The XML format could not be determined by examination. Try entering a feature path into the “Elements to Match” parameter, specifying an xfMap, selecting an XRS, or using a more specific reader
But that’s reading data. What happens when you want to create (write) an XML document from scratch? You might know what the structure should be because you have a copy of the xsd schema document. But how can you turn that into an XML document? With the XMLSampleGenerator!
The XMLSampleGenerator takes an XML schema file and builds an empty XML document from it. That’s a nice thing about XML: you can have a document that is completely empty of data, but still properly structured in preparation for it. Once you have that structured dataset, you can start to populate it with information.
Take this example.
I want to create a metadata document to go with a dataset proper. I’ve downloaded the schema xsd file that describes how the XML is to be structured, and want to use that to create the metadata document.
So, first I open an empty workspace and put down an XMLSampleGenerator, with a Creator transformer to create a feature to trigger it. Then I put an AttributeFileWriter transformer to write out the newly created XML:
The parameters in the XMLSampleGenerator look like this:
I’ve basically pointed to the XSD file (in this case one based on ISO 19139) and specified what elements of the XSD data I wish to include in the output XML. Here I’m just selecting contact information, just to show what can be done. Other parameters let you control whether to include optional attributes and the like.
When I run the workspace, the results of the transformer are written out to a file and look like this (click to enlarge):
So now I have a basic XML document into which I can start adding information.
Of course, the previous sentence begs the question: how do I add information into an existing XML document? The answer is with the XMLUpdater.
The XMLUpdater lets you add, remove, or edit the contents of paths/elements inside an XML document. It can either read the XML document from a file, or use the contents of an attribute, which is what we’ll do in this example.
Now the XML document created by the XMLSampleGenerator, is passed through the XMLUpdater before it is written out. The XMLUpdater will update the document contents. The updates can be hard-coded (as I will do here) or come from a feature in the UPDATE port.
Note: You always need a feature to enter the UPDATE port and trigger the update, even if is not carrying update information.
I’ve been caught by that many times so I will repeat: if you don’t have a feature enter the UPDATE port, the XMLUpdater will not do anything. Again, I can just use a plain null feature from a simple Creator transformer.
The XMLUpdater parameters look like this:
You’ll need to click that image to enlarge it, because I expanded the dialog to display the full strings I entered. Notice that are operations for replace (update), insert, and delete. I’m just making updates to get the data to look as I want it to. I’ve deleted fields I don’t need, updated fields with new values, and added a field that I think might be useful (phone extension).
The path comes from the XML document. Go back and compare the paths I have in the XMLUpdater with those in the XMLSampleGenerator output and you will see how they match up.
I could now run this workspace, but I’m going to add two extra XML transformers in there. Call it your bonus for reading this far. They are the XMLFormatter and XMLValidator:
The XMLFormatter lets me “pretty-print” the XML contents - basically making it more readable by removing excess white-space and indenting the contents properly.
The XMLValidator lets me compare the new contents against the original schema, to ensure the contents are structured properly and have the right tags. In this case the validation will probably fail, because I’ve messed the data around too much (for example adding a new field) so I’ll just validate the syntax and not the schema.
When I run the workspace (and manually edit a few bits for clarity) this is the result:
Voila! I’ve created an XML document out of thin air and populated it with the required content. Like visiting the dentist, it didn’t hurt half as much as you expected it too!
Like much of FME, there are at least two ways to carry out the same task. I just received a demo that shows an alternative to the XMLSampleGenerator/XMLUpdater combo and thought I should pass it on.
The XMLTemplater transformer is not new for 2011, but what is new is that XML sample generation has been incorporated into it (click to enlarge):
You can see that if I click the Generate button then up pops a dialog very similar to that of the XMLSampleGenerator. When I select the XSD and click OK….
…I get the sample/template created and entered into this dialog. Now, rather than using the XMLUpdater, I can insert attributes from the left-hand panel into the correct location in the template.
Why would I choose this method? I’m glad you asked. To me it’s easier to update because I can see where the updates are being made in relation to the rest of the document, and I don’t have to manually define paths. There’s no equivalent visualization or functionality in the XMLUpdater transformer.
On the other hand, the XMLUpdater does work on the field name rather than a fixed position; meaning it’s more of a wildcard substitution. So if there are multiple records to be updated or deleted, it would be the preferred method.
So there you are. This post has been very XML-centric, but I won’t apologize as I know that more and more users are having to work with XML data.
If you are interested in the topic, we do have XML training coming up in May. There are sessions for both XML Reading and XML Writing, and they are being offered both in a North American timezone and a European one.