Read Digital Edition


ADS BY GOOGLE
Top Three Links You Must Click On


High-Performance Data Services with Smart Caching
Reduce potential performance bottlenecks and ensure timely delivery of information

One of the main concerns among IT architects planning an implementation of an enterprise data virtualization layer in their service-oriented architecture (SOA) or overall information system is the performance of the participating data services. Performance becomes particularly important in real- or near-real-time environments as well as in environments with highly distributed data sources where network latency cannot be controlled. This article examines how to reduce potential performance bottlenecks by utilizing high-performance caching with data virtualization middleware. Different scenarios within single-, cluster- and distributed-caching implementations are covered.

Introduction
A data virtualization implementation normally includes a wide variety of data sources, both relational and non-relational, often distributed across several business units, and sometimes located in different geographical regions. Therefore, the data virtualization layer's performance highly depends on response latency. As the amount of data retrieved through the data virtualization layer increases, network latency can quickly turn into a bottleneck for the entire IT department overseeing the implementation. If the implementation includes various client applications distributed across multiple business units or geographies, the data virtualization layer implementation has to include response latency as one of the major SLA items.

In its most simple implementation, the data virtualization layer's response latency comprises three factors: the data sources, the middleware layer, and the network. Hence, when the data sources have non-uniform performance characteristics, located on networks with varying throughput capacity, the total request/response latency can be measured as follows: (see Figure 1):

Total response latency = MAX(DSL1, DSL2, ... DSLN) + MAX(DSS1, DSS2... DSSN) + MAX(DSC1, DSC2...DSCN) + DVL, where

DSL - Data Source Latency|
DSS - Data Source Network Subnet Latency
DSC - Client Network Subnet Latency
DVL - Data Virtualization Latency

In deployments where the data virtualization middleware, all data sources and all clients are located on the same subnets, network latency for the data sources and clients will be similar and more or less constant. Therefore, to reduce the response latency of the entire solution, architects should concentrate on reducing the latency of the slowest data source to minimize the amount of time the data virtualization middleware idles, waiting for a response. While changing the data source or partitioning it are both valid options, a less invasive approach is to utilize a high-performance caching system in the data virtualization layer. There is a wide variety of implementation choices available in the data virtualization layer, starting with a simple table-level caching, more advanced materialized-view caching, and the most complex dynamic result-set caching. Using these options, IT architecture teams can achieve a performance boost of their data virtualization deployments, ranging from 10 percent to 50 percent (see Figure 2).

The effectiveness of caching in the data virtualization layer depends on a number of additional solution characteristics, including the frequency of changes in the underlying data and the applications' tolerance for "stale" data (e.g., the frequency with which the cache system has to refresh itself to comply with the required SLA). It is easy to imagine a case when caching, if not implemented properly, actually slows down the overall solution performance. This can happen if the underlying source data changes frequently and the client application requires access to the real-time data. In this case, most application requests will result in a cache miss, and therefore will either initiate a pass-through request or cache refresh. The additional time it takes the request to travel to the cache system and back will actually add to the overall latency. Hence, the architecture team needs to consider a number of critical characteristics before deciding if caching is suitable for its environment and business requirements. In this scenario, implementing an incremental cache update utilizing change data capture will eliminate the need for a full data refresh, yet still provide freshly updated data to the requesting application, while maintaining SLA requirements (see Figure 3).

While performance improvement is typically the most sought after benefit of the caching system in the data virtualization implementation, an overlooked, but equally important, advantage is the reduced impact or stress on the production systems. With the caching system enabled, many, if not most, of the client requests will be fulfilled by cached data, thus reducing the number of requests going against the production data sources. With high-request volumes, this additional benefit supplements the performance gain benefits of the caching system

Single Cache Instance Implementations
Single cache instance deployment is the most basic implementation in the data virtualization layer. Single-cache instances are typically preferred for small- to medium- sized departmental projects with low-to-moderate client load activity. As mentioned earlier, caching systems can improve data virtualization layer performance and reduce stress on the production data sources, depending on the implementation characteristics. If performance improvement is the primary objective, the implementation team should consider co-locating the cache on the same subnet as the data virtualization middleware, to minimize the network latency between the middleware and the cache. This is an important consideration because the caching system typically bears the brunt of the request load and hence the amount of traffic between the middleware and the caching system is expected to increase significantly. In situations where cached data is relatively small, but is accessed frequently, it may be beneficial to collocate the caching system with the middleware on the same blade server, thus eliminating network latency altogether. To further improve the performance of the cache collocated on the same blade, the cache database may be configured to pin the cache table into memory, and therefore further reduce the time needed to fetch the data from the cache (see Figure 4).

Depending on the topology and nature of the underlying data sources, the caching system may be configured to cache raw table data, materialized views or procedural data. Caching raw table data is suitable for environments where the performance of a single data source is significantly worse than the rest of the data sources, causing the data virtualization middleware to idle while waiting for a response. Caching the table data from slow data sources into the higher performance caching system improves the performance of the overall solution by removing the incremental latency delta associated with the idling middleware (see Figure 5). Materialized-view caching is most suitable when numerous clients send identical requests, therefore clogging production systems with requests that elicit identical responses. In such scenarios, the data virtualization middleware will execute the first client request as usual against the production systems, and then cache it, instead of discarding the returned result-set, so that subsequent client requests will be fulfilled by the cache system instead of the production systems. Finally, if one or more of the underlying data sources is a web service with long or unpredictable response latency, then enabling procedural caching will allow the data virtualization middleware to optimize the overall performance by caching the result-sets returned by the web service sources based on the passed parameters and thus eliminating potential web-service latency.

Cluster Cache Implementation
For more complex deployments, such as environments with heavy client request loads, a single instance of middleware and a single cache instance might be insufficient to handle all the requests within the allotted SLA. In such cases, the most common approach is to cluster the data virtualization middleware into multiple nodes. Although middleware clustering adds capacity to handle additional client requests, it also exacerbates the load on the production data sources, because each individual client request, even if the subsequent requests are identical, is executed against the production data sources. Enabling a caching system in a clustered environment, therefore, will potentially have a significant impact on the solution's performance as well as on offloading stress from the production systems (see Figure 6).

Distributed Cache Implementation
Finally, in environments where one or multiple clients are located remotely, a distributed caching system helps reduce the network latency associated with frequent requests over long networks. Such a distributed caching system typically has a central cache repository and multiple remote edge caches for servicing requests from the remote clients. There is usually no need for the edge caches to replicate the central cache system one to one -because the edge cache system monitors remote client requests, it can simply replicate the portion of the central cache that is relevant to its client requests. After initial replication, edge caches register change data capture requests with the central cache and are notified automatically whenever the central cache data changes, thus eliminating the need for a complete edge cache re-sync (see Figure 7).

Conclusion
As global enterprises and government agencies implement data virtualization to federate data across disparate systems and geographic locations, IT teams are considering the data virtualization layer's performance in relation to the overall information system. By using advanced data virtualization middleware with high-performance caching, IT architects can reduce potential performance bottlenecks and thus ensure timely delivery of information.

About Avtandil Garakanidze
Avtandil Garakanidze currently serves as the Vice President of Product Management and Strategy at Composite Software Inc, the leader in data virtualization solutions. Prior to Composite, he held executive and senior product and engineering management positions with high-tech companies including Symantec/VERITAS, Siebel Systems, Yahoo! and Starfish Software. Garakanidze earned an MBA from MIT’s Sloan School of Management and an MS from the Georgian Technical University.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

  Subscribe to our RSS feeds now and receive the next article instantly!
In It? Reprint It! Contact advertising(at)sys-con.com to order your reprints!
Subscribe to the World's Most Powerful Newsletters

ADS BY GOOGLE
SugarCRM, the world’s leading provider of open source customer relationship management (CRM) softwa...
This past weekend I set out explore some of the extension capabilities of Google Wave. One of the we...
More good news for cloud computing! Google last week released its once mysterious Chrome Operating S...
There's a lot of talk about how we need to focus on our buyers' issues and provide them educational ...
In CloudBerry Lab we are striving to make our customer service better. In this competitive market wi...
We talk a lot about social media on Marketing Trenches. And for good reason – Social media seems to...
Intel has put out its promised beta SDK for Windows (C and C++) and Moblin (C) developers working on...
InformationWeek stumbled on a Microsoft patent application dating back to 2006 deceptively titled “M...
Berlin-based ThinPrint AG, the printer virtualization house, thinks it’s got a cloud solution for th...
Behaving like it’s got a future, Sun Monday put out what it calls a significant new version of Virtu...
IBM has acquired Guardium, a seven-year-old subsidiary of Israel’s Log-On Software transplanted to M...
But on the web, access to services is implicit in the fact that the business is offering the service...
Oracle has offered to cordon off MySQL inside a combined Oracle-Sun to get the European Commission t...
The second set of charges filed last week against Indian outsourcer Satyam Computer Services founder...
Gartner told Reuters that it overestimated how many PCs Acer shipped in the last seven quarters by a...
Gartner thinks the server business has stopped sliding into the abyss. Third-quarter sales weren’t a...
Gartner is buying ~$40 million-a-year AMR Research Inc for close to $64 million in cash. AMD special...
Singed by user reaction to its plans to up the price of its support contracts, SAP Tuesday postponed...
Apparently Google Gears ain’t gonna stick around that long. Google Apps will eventually get their of...
Office Web Apps, Microsoft’s answer to Google Apps, are supposed to be out sometime in June along wi...