Wednesday, January 30, 2008

Understanding how federated search works

As a member of the committee that was charged with setting up Bearcat Search, one of the things I've wrestled with is understanding how federated search technology actually works. I've still got a lot to learn, but I think making the effort to do so has helped me not only in thinking through the issues involved in implementing Bearcat Search and troubleshooting it but also in giving me a foundation from which I can teach our students and faculty how and when to use the tool.

Here is my mental model that I've pieced together of how Bearcat Search works:
  1. User goes to search box on our Bearcat web page that is hosted by Serials Solutions and types in query for a selected set of databases (for discussion's sake, say it's ten databases).
  2. Query is passed on to Serials Solutions servers, which then use custom built connectors (more on that later) to translate my query into a formats that can be understood by each of the ten different database vendors.
  3. Those translated queries are passed by Serials Solutions on to the servers for each of the ten databases.
  4. Each database returns its results Serials Solutions (not the complete set of results, just the first 50, 100, 125 hits depending on the database).
  5. Serials Solutions analyzes the first batch of results, removing duplicate records (sometimes but not always), and then passes them on to the web page that the user is on.
  6. If the user pages through all of the first batch of results (say 200 out of 5,297 found), then when the user clicks to see the results for 201 and on, Serials Solutions goes back to the ten database vendors to get another set of results; steps 4 and 5 above are repeated if the user continues to page through the end of a given batch of results.

The concept of what a connector is a good thing to understand. There's a great blog appropriately called the Federated Search Blog that offers a nice, short overview of what a connector is and how federated search vendors (like Serials Solutions) have to custom create each one for each library. I also recommend reading the following four posts on that same blog that explain how it is that federated search tools connect to the databases they are aggregating:

For anyone who has ever noted varying response times from Bearcat Search, this post, "How fast is federated search" offers some valuable insights.

Please stop me in hallway or add a comment here to let me know if any of this is useful.

No comments: