Computing Canada

September 2012 - Free Online IT Magazine - Read in this digital publication: Two approaches to analyzing unstructured data; Apple versus Samsung: Who really wins?; the newest Android smart phones; and

Computing Canada is the crossroads of business and technology. Delivering enterprise technology, networking, telecom, career, managed services, cloud computing, and other technologies and services that enable business.

Issue link: http://epubs.itworldcanada.com/i/81483

Contents of this Issue

Navigation

Page 19 of 27

FEATURE LISTEN UP: IDC's Steve Conway discusses how the technologies of Hadoop and high-performance computing converged in IBM's Watson, but also why there are some analytics tasks only supercomputers can perform THE SHIFTING SANDS OF UNSTRUCTURED DATA O ut of the vast majority of the unstructured data that will be created today, a single piece of it, on close inspection, won't look very important— an email, a text message, stock price, or even a sensor transmitting an "off"signal. But once these tiny bricks of data are put togeth- er, the resulting structure can tell us something quite impor- tant. In truth, a big mul- tina- tional shoe company probably doesn't care what you, personally, think of its new sneaker. But it will surely care if thou- sands of others feel the same way. Similarly, if the power fails in one house, the elec- tricity company won't lose much sleep over it. But when the entire grid gets knocked out, you can expect some- thing to be done about it. The mess of information we're immersed in represents a great technological chal- lenge—that is, finding a way to give it all meaning. But since it also offers an irre- sistible power to business— virtual omniscience—the money to make indus- trial-level data sifting a reality has arrived in sufficient quantity to get it off the draw- ing board and into the com- mer- cial world. But in giving meaning to a mass of unstructured 20 I SEPTEMBER 2012 I ITWorldCanada.com The technology for finding meaning in unstructured data can take two very different forms By Brian Bloom data, a distinction has to be made when we start with our initial question. Our answer could be built out of a mil- lion tiny components. Or it could come in the form of one giant, irreducible entity. Both require a very different kind of hardware. And here is where we enter the worlds of "massively parallel" and "embarrassingly parallel." C THE STRENGTH OF THE ELEPHANT We can't talk unstructured data without making some mention of Apache's Hadoop project, which is now the preferred beast of burden for this kind of big data. It's the needle-in-the-hay- stack problem solved to the extreme: Hadoop can sift through end- less bales of hay, neatly organizing every needle it finds. Massively scalable and cheap to run on commodity hardware, Hadoop represents something we've wanted for years but have only recently been able to use. John Kreisa is the vice- president of Hortonworks S U I B S R B E N O W

Articles in this issue

Links on this page

Archives of this issue

view archives of Computing Canada - September 2012 - Free Online IT Magazine - Read in this digital publication: Two approaches to analyzing unstructured data; Apple versus Samsung: Who really wins?; the newest Android smart phones; and