Practical Graph Analytics with Apache Giraph

Practical Graph Analytics with Apache Giraph jobs at

Practical Graph Analytics with Apache Giraph


We live in an age of so-called Big Data. We hear terms like data scientist, and there is much talk about analytics and the mining of large amounts of corporate data for tidbits of business value. There are even apocryphal stories involving diapers and beer selling together in the same store aisle. The common theme is the problem of having large amounts of data and somehow converting that data into actionable information. Enter graph theory. It’s a branch of mathematics concerned with pairwise relationships between objects. Graph theory can be taught abstractly, and probably often is. It’s very practical though. Imagine mapping all the link relationships in a web site. One page might turn out to be in more relationships than all the others, and perhaps that page is an important one. Likewise, one can examine relationships between people in a group, and perhaps the person having the largest number of connections could also be seen as having the widest influence. Certainly, you’d want that well-connected person if your goal were to spread a piece of news or gossip quickly.

Download Ebook

About the Authors Claudio Martella is passionate about graphs. He is a member of the Large-Scale Distributed Systems group at the VU University Amsterdam. His topics of interest are large-scale distributed systems, graph processing, and complex networks. He has been a contributor to Apache Giraph since its incubation; he is a committer and a member of the project’s Podling Project Management Committee (PPMC). Dionysios Logothetis is a software engineer at Facebook. He is interested in building systems and tools for large-scale data management with a focus on graph mining. He has experience developing analysis systems built around Giraph and Hadoop. Dionysios holds a PhD in Computer Science from the University of California, San Diego, and also a degree in electrical and computer engineering from the National Technical University of Athens. Roman Shaposhnik is a vice president and one of the lead developers of Apache Bigtop, a 100% open source and community-driven Big Data management distribution built on top of Apache Hadoop. He has been working on making Hadoop ecosystem components more accessible and easier to use, and he has contributed to a wide array of Apache projects, from Avro to ZooKeeper. In addition to his day job building Data Fabric APIs at Pivotal Inc., Roman currently serves as a vice president of Apache Incubator, helping exciting and new open source projects join the Apache family.