FME2010 Use Case: Dynamic Batching

April 14th, 2010

Hi FME’ers,
As before, this use case arose because of a question on the FMETalk user group:

“A client has a large number of shape files in a directory tree. All of these will be reprojected into another coordinate system.

We are looking into ways to automate the process. The main problem I can foresee is that I cannot count on  the attribute structure of the shape files being uniform. However, the transformation need not do anything with the attributes except copy them to the target files.”

The ability to handle unknown attributes is a bread and butter case for the new dynamic functionality in FME 2010. However, what a dynamic workspace won’t do is recreate the input directory structure. In fact, default FME behaviour is to merge all the source data into a single output dataset.

So this post describes how to integrate dynamic and batching tools within FME.

Source Data
In my example I’m using the standard FME Sample Dataset

What we have is a series of shp files buried somewhere inside C:\FMEData\Data

What we want to get is that same series of files, reprojected and written to C:\FMEData\ReprojectedData

Creating the Workspace
It’s very simple to create a workspace which reads multiple source datasets - and just as easy to set it up to read a folder containing an unknown number of datasets.

Below (click to enlarge): Workspace Setup Dialogs

I first set the source and destination formats (in this case both ESRI Shape). Next I select the ‘Swizzler‘ icon:

…to open a dialog where I can select the source directory (C:\FMEData\Data) and check the option to follow up through all subdirectories.

The Writer Dataset parameter isn’t important - and in fact it is optional so I need not even have set it.

Finally I make sure to use the Dynamic Schema workflow, then click OK to create the workspace. It looks like this (below):

Notice that there are no attributes on the destination Feature Type, because in a dynamic workspace they aren’t needed. The source Feature Type has an amalgamation of all attributes in all Shape datasets - but again this is not really important to us.

Finding the Source Folder
I need to know where each feature came from, in order to be able to write it back to the same relative location. This I can tell with the Format Attribute called fme_dataset. So the first thing to do in the workspace is open the source Feature Type properties dialog, click the Format Attributes tab, and put a check mark against fme_dataset (below):

Now fme_dataset is available for me to use. However, I only want the folder name, not the full folder+file name. So what I do is use the FilenamePartExtractor transformer to extract only the part of the name I need (in this case the directory) (below):

Notice how the parameters let me extract the directory from fme_dataset and put it into a new attribute called new_fme_dataset.

Setting the Output Folder
At this point new_fme_dataset will point to something like C:\FMEData\Data\xxxx, where xxxx is a folder containing a shp file. Firstly I need to replace ‘Data’ with ‘ReprojectedData’ but I also need to remove the “C:\” for reasons which will become clear shortly.

I can do this all at once with the StringReplacer transformer (below):

Notice how the paths are given double back-slashes. That’s because backslash is a special character and I have to mark it with an escape character (which also happens to be the backslash) to preserve it.

Now new_fme_dataset will point to something like FMEData\ReprojectedData\xxxx

Setting a Fanout
Now I can go ahead and set the Dataset Fanout. For those not in the know, a Fanout splits up output data according to an attribute value. In this case I’ll use new_fme_dataset as the fanout attribute.

In the Navigator Window I find the Shape writer’s Fanout parameter and double-click it (below):

This opens up a dialog in which to set the Fanout parameters (below)

As you can see I turn on the fanout and set the root directory to C:\ (this is why I needed to remove the ‘C:\’ part from my fanout attribute). Then I set the fanout attribute to be the newly created new_fme_dataset

Transforming the Data
At this point the structure is all set up and I can make whatever transformations I like to the data. In this case I am setting a reprojection. Because Shape datasets usually come with a prj, I don’t need to set the Reader Coordinate System parameter, just the Writer equivalent.

Running the Workspace
Still with me? OK, now I can run the workspace. The log tells me how many features I read and wrote (below):

…and Windows Explorer shows me the replicated file structure (below):

…and the FME Viewer confirms that all features have their original attributes and have been reprojected (below):

Considerations
The ability to read from folders and subfolders is very useful where you have a set of files, any of which could be in any number of locations. What is really great is that this function is dynamic, in the sense that it searches for datasets at run-time, not just when the workspace is created. So you can add and remove data to the folders into preparation for the translation about to take place.

A fanout is just one method of creating multiple outputs (or batch processing). Others would be to use File > Batch Deploy from the Workbench menubar, or to use a combination of the FilePath Reader and the WorkspaceRunner transformer to run a workspace and pass source data parameters into it.

The more source data you have, the greater the strain on the fanout, because all the data is getting read and cached in a single run. With the other methods you have less performance worries because each source dataset is read and processed separately.

Server Implementation
Although we’re not serving up data via a web service, this scenario does have potential to run on FME Server. That’s because FME Server’s architecture is scalable and so works well for high-volume data processing.

In that case you would definitely want to use the WorkspaceRunner method (or more specifically the FMEServerJobSubmitter transformer) because each dataset in the batch process could run on a separate FME engine, greatly improving performance and use of system resources.

I hope this use case is of interest to you. It can certainly serve as a good example of why dynamic functionality is important, but also how it can be combined into other tasks such as batch processing with a fanout.

100 Things to Do with FME!
Instead of giving links to my favourite music, I thought it more appropriate to inspire folks with a list of 100 different uses for FME; from the obvious and straightforward, to the wierd and wonderful!

So: Number 1 in a Series of 100 Things to Do with FME: Share your data on OpenStreetMap

Entry Filed under: Data Transformation, FME Desktop, FME Server, GIS, Miscellaneous

-->

Feed

Add to Technorati Favorites

The FME Evangelist

Welcome! The FME Evangelist delivers insider news, cutting edge examples and the latest functional developments for Safe Software’s FME application.

Links

Archives

Categories

Tags