June 7th, 2010
Performance tuning an FME workspace can be a process of trial and error; you run a workspace, tweak it, run it again, and see what the performance gains were. And unfortunately it isn’t always obvious what to tweak and how.
Fortunately there is one extremely simple change you can make that often gives a significant performance boost, but few people are aware of the functionality, let alone the benefits it offers.
That capability is changing writer order in a workspace, and this post is all about how you can use it to improve performance, and explain some FME behaviour which might otherwise be truly baffling.
How FME Works
To simplify a complex subject, let’s imagine an airport. When passengers arrive they remain in the waiting area until it is their flight’s turn to board.
So everyone waits until the flight scheduler announces a flight, everyone boards, and the flight departs. Then it’s the turn of the next flight to board.
In FME the process in similar. Features arrive at the end of a workspace and are cached in memory awaiting their turn to be written. At some point all the features for a particular writer are retrieved and written to an output dataset. Then it’s the turn of the next writer to create output.
The Original Improvement
So FME worked like this until some bright, young developer reasoned thus:
If the first flight to depart was already lined up at the gate (like below), then the passengers for that flight wouldn’t have to sit in the waiting area. They could board as and when they turned up. It would mean the first flight would be faster to leave and the waiting area less full.
In FME terms, the first writer is opened immediately features are available, while features destined for other writers are stored in the cache. The first writer is therefore quicker, and fewer features need to be cached.
Not many people know about this capability (even some Safe developers), but then there’s no need because it all occurs automatically, and you don’t need to do anything. However…
The Neat Trick
If you were the flight scheduler at this airport, what sort of aircraft would you line up to be the first one, already waiting at the gate?
That’s right: you would choose the 747; that way you have 500 passengers who can board immediately, rather than just 40.
So, in FME terms, the trick here is to ensure the first writer - the uncached one - is the one handling the most data. That way the largest amount of data is being handled faster, and less data is having to be cached to memory or disk.
How To Do This
To promote the busiest writer is simplicity itself (much easier than trying to persuade an airport to prioritize your flight). Locate the list of writers in the Navigator window, and use a right mouse-click to reveal options to move writers up and down in the list.
The writer uppermost in the list will get the coveted #1 slot.
Below: This isn’t going to be efficient. The second writer has 100,000x the number of features! Better promote it…
Below: That’s better. Now FME will only have 1 feature to cache (not 100,000)
Examples and Use Cases
You can find more information about this topic on fmepedia.
The page shows the example above, and how it (nearly) cuts in half the time taken to run the workspace.
It also mentions that “most data” can be a misleading term because it isn’t necessarily the same as the number of features (the amount of geometry and attributes is a factor to be considered).
I mention use cases because this issue has arisen a couple of times in recent weeks.
A common scenario in data validation is to write “failed” features to a separate writer. Ideally this writer (writing but a few failed features) should not be first in the list. That’s something I’ve seen recently and suggested be corrected.
Below: You really want to hope that the database writer here is first in the list.
A second instance was checking a customer’s workspace for performance improvements. I was able to get an approximately 15% improvement in time/memory simply by switching the writer order. Two seconds of effort and five minutes saved per translation.
And finally, I saw a case when a user had disabled a writer, but it was also occupying top spot in the list. In most cases that would really be a wasted opportunity (take note), but the curious thing here was that the caching of other datasets was important for writing features to a database in the correct order.
When the disabled writer was removed, the workspace failed, because the writing order now violated certain constraints.
The solution was (I believe) to reorder the writers to write the dependent features later in the translation, but it was very puzzling to all that removing a disabled writer could cause such a problem.
As multi-threading becomes a possibility, the future might be a translation that automatically scales to always have a departure gate for every aircraft. Then FME wouldn’t need to cache any features in a waiting area. And wouldn’t that be good for performance!
Hope this article was useful