Read Digital Edition


ADS BY GOOGLE
Top Three Links You Must Click On


Four Paths to Java Parallelism
Finding their 'sweet spots'

Parallel programming in Java is becoming easier with tools such as the fork/join framework, Pervasive DataRush, Terracotta, and Hadoop. This article gives a high-level description of each approach, pointing you in the right direction to begin writing parallel applications of your own.

Boiling the Ocean of Data
Companies today are swimming in data. The increasing ease with which data can be collected, combined with the decreasing cost of storing and managing it, means huge amounts of information are now accessible to anyone with the inclination. And many businesses do have the desire. Internet search engines index billions of pages of content on the web. Banks analyze credit card transactions looking for fraud. While these may be industry-specific examples, there are others that apply broadly: business intelligence, predictive analytics, and regulatory compliance to name a few.

How can programmers handle the volumes of data now available? Even efficient algorithms take a long time to run on such enormous data sets. Huge amounts of processing need to be harnessed. The good news is that we already know the answer. Maybe parallel processing isn't new, but with multicore processors it has become increasingly accessible to everyone. The need and ability to perform complex data processing is no longer just in the realm of scientific computing, but increasingly has become a concern of the corporate world.

There's bad news too. While parallel processing is much more feasible, it isn't necessarily any easier. Writing complex and correct multithreaded programs is hard. There are many details of concurrency - thread management, data consistency, and synchronization to name a few - that must be addressed. Programming mistakes can lead to subtle and difficult-to-find errors. Because there is no "one-size-fits-all" solution to parallelism, you might find yourself focusing more on developing a concurrency framework and less on solving your original problem.

Making parallelism easier has been a topic of interest in the Java world lately. Java starts out with basic concurrency constructs, such as threading and synchronization, built into the language. It also provides a consistent memory model regardless of underlying hardware. Most programmers are now aware of (and perhaps vaguely intimidated by) the utilities of java.util.concurrent added in Java 5. At least these tools provide a step up from working with bare threads, implementing many abstractions you would likely need to create yourself. However, it remains very low-level with its power coming at the expense of requiring significant knowledge (and preferably concurrent programming experience) (Sun, 2006).

Fortunately, a number of tools and frameworks that attempt to simplify parallelism in Java are available, or soon will be. We'll look at four of these approaches with the intent of finding the "sweet spot" of each - where they provide the most benefit with least cost - financial, conceptual, or otherwise.

About Matt Walker
Matt Walker is an engineer at Pervasive Software, seeking a deeper understanding of concurrent programming techniques to improve the Pervasive DataRush framework for dataflow programming. He holds an MS in computer science from UT and received his BS in electrical and computer engineering from Rice University.

About Kevin Irwin
Kevin Irwin is a senior engineer at Pervasive Software, working on performance and concurrency within the Pervasive DataRush dataflow engine. With 15 years of industry experience, he previously worked at companies including Sun and IBM, developing high-performance, scalable enterprise software. Kevin holds an MCS and a BA in mathematics and computer science from Rice University.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

While it’s true that familiarity with concurrent programming principles is needed to make full use of all of Terracotta’s developer-facing features, the extensive library of Terracotta Integration Modules (TIMs) for use with third-party technologies allows many people to make use of Terracotta *without* needing to know anything about concurrent programming.

This can be seen to great effect in the high-scale reference web application we built to show how Terracotta is used in a real-world scenario. When you look at the code to examinator, you’ll find very little concurrency-aware code. All of the concurrency is handled inside the various TIMs used by the application (e.g., Spring Webflow, Spring MVC, Spring Security, …).

You can see a full list of available TIMs here:

http://terracotta.org/web/display/orgsite/Integration+Guides

Like Terracotta itself, all of these TIMs are open source and free for use in production.


  Subscribe to our RSS feeds now and receive the next article instantly!
In It? Reprint It! Contact advertising(at)sys-con.com to order your reprints!
Subscribe to the World's Most Powerful Newsletters

ADS BY GOOGLE
We talk a lot about social media on Marketing Trenches. And for good reason – Social media seems to...
Intel has put out its promised beta SDK for Windows (C and C++) and Moblin (C) developers working on...
InformationWeek stumbled on a Microsoft patent application dating back to 2006 deceptively titled “M...
Berlin-based ThinPrint AG, the printer virtualization house, thinks it’s got a cloud solution for th...
Behaving like it’s got a future, Sun Monday put out what it calls a significant new version of Virtu...
IBM has acquired Guardium, a seven-year-old subsidiary of Israel’s Log-On Software transplanted to M...
But on the web, access to services is implicit in the fact that the business is offering the service...
Oracle has offered to cordon off MySQL inside a combined Oracle-Sun to get the European Commission t...
The second set of charges filed last week against Indian outsourcer Satyam Computer Services founder...
Gartner told Reuters that it overestimated how many PCs Acer shipped in the last seven quarters by a...
Office Web Apps, Microsoft’s answer to Google Apps, are supposed to be out sometime in June along wi...
Gartner thinks the server business has stopped sliding into the abyss. Third-quarter sales weren’t a...
Gartner is buying ~$40 million-a-year AMR Research Inc for close to $64 million in cash. AMD special...
Singed by user reaction to its plans to up the price of its support contracts, SAP Tuesday postponed...
Apparently Google Gears ain’t gonna stick around that long. Google Apps will eventually get their of...
Oracle seems to have divided the open source ranks over the MySQL delay it’s having closing its acqu...
We hear – well, you know how people talk – that Oracle has been quietly meeting with the European Co...
The Korean government is going to sink around $172 million into cloud computing next year under a st...
In response to Opera’s complaints Microsoft has reportedly modified the proposed ballot screen that’...
CA is looking for talent in EMEA: associate account managers, directors of solution sales, senior so...