Tag Archives: Big Data

The Future of HR Big Data

As HR applications continue to evolve, HR needs to consider the new data sources and types that are coming towards them. It is no longer sufficient to simply track basic employee information. With the evolution of HR, companies now have wide ranging options including video interviews, dynamic and social learning and development solutions, social monitoring for talent identification and internal collaboration, external survey benchmarks (including but not limited to salary, skills, performance, behavior, and personality), application logs, and predictive models for understanding cultural fit and preparedness for new jobs.

This complexity provides the need for HR Big Data. To deal with the variety and velocity of social data, video, documents, and transactional logs, HR departments need to work with other departments that may have already needed to work with these new data sources. Social monitoring is typically associated with marketing, video is often seen as a corporate communications or public relations tool, document management is seeing a new renaissance with the development of social and cloud-based software solutions, and transactional and network logs are core IT tools. Continue reading

Buffett’s Billion Dollar Bracket Bet

Lightning
Lightning photo courtesy of cnx.org.

The NCAA tournament has become a cultural phenomenon where everyone suddenly becomes a college basketball expert whether or not we’ve ever watched a game. This expertise has raised to a fever pitch this year as Quicken Loans has offered 1 billion dollars to anyone who provides a perfect bracket. But why?

This contest, underwritten by Warren Buffett’s Berkshire Hathaway, is often described as a one-in-9 quintillion change of winning, or 2^63rd power based on there being 63 games played by 64 teams in this single-elimination tournament. However, this model is obviously wrong since we know that some of these teams are better than others. Even the most casual NCAA bracket filler knows that a 1-seed (presumably one of the top 4 teams in the country) always beats a 16-seed (which is typically one of the worst 4 teams in the tournament). Similarly, a 2-seed almost always defeats a 15-seed with rare exceptions. After this point, the expectations start get a little trickier and the March Madness descends into full effect. Continue reading

Do HR Organizations Really Need Big Data?

Big Data Funnel
Big Data photo courtesy of hrringleader.

Over the past couple of years, the hottest trend in enterprise technology has been the evolution of HR applications from basic benefits, payroll, and workforce management to a variety of applications covering a much wider set of needs – from finding potential employees and predicting the need for specific skills to optimizing employee capabilities to the appropriate offboarding and succession of core talent. To get a strategic advantage, HR departments are being asked to use “Big Data” to “Moneyball” and quantify their approaches. But is this really the right way to go?

To determine this, consider what Big Data really is. Big Data is a set of technologies designed to store, process, and analyze data that does NOT fit into traditional spreadsheets, databases, and other basic structured data sources. Depending on whether you need your data to be faster, bigger, or more varied, you may be looking for specialized high velocity messaging solutions, cloud-based and multi-tenant storage, high performance analytic engines, Social Network Analytics, sentiment and natural language processing, or video analytics. This is the world of Big Data. A practical definition of Big Data would be 5+ terabytes of data, including some aspect of machine data, interactions, video, or high velocity streaming data. Continue reading

What Happens in Vegas is TDWI and the Future of BI


Las Vegas Sign photo courtesy Esquenta.com.br.

In late February, I had the pleasure of attending TDWI Las Vegas as an observer. A key differentiator between TDWI (The Data Warehousing Institute) and many of the other tradeshows I attend in the Big Data and Analytics space is that TDWI focuses on deep subject matter expertise taught by luminaries such as:

Continue reading

What Big Data Can Learn from the NBA


94Fifty smart sensor basketball photo courtesy 94Fifty.

On Thursday, February 13, Grantland’s Zach Lowe wrote an article on the latest technological development in professional basketball: measuring biometric information in game settings. Four D-League (the NBA’s developmental league) teams will start using one ounce sensors fitted on player jerseys to start measuring metrics such as heart rate speed, and position. These sensors are currently available from one of three companies: STAT Sport, Zephyr, and Catapult. These sensors are not new to professional basketball, as nearly two-dozen NBA teams already use these devices. However, NBA teams currently only use these sensors in practice settings, rather than in game-time situations.

There are a couple of interesting Social Big Data lessons that professional basketball could potentially learn from this experiment that every Big Data expert should be interested in finding out.

First, consider one of the quotes from the Grantland article:

“As the research-and-development arm of the NBA, the NBA D-League is the perfect place to unveil innovative performance analytic devices in-game,” said NBA D-League president Dan Reed.

This concept of an R&D product where you collect more data in an experimental setting is one that many technology companies could start to use. For instance, does your core cash cow product have a corresponding R&D product that can be tinkered with without affecting your revenue? This is a good role for your freemium or single user product. (Heck, Facebook does this for their core platform, although DataHive does not recommend the level of iteration that Facebook provides unless you have a monopoly or duopoly of your core market.) This new use of heart rate and other physical information will provide insights on team tactics and performance if used correctly, thus leading to not just Big Data, but interactive and social Big Data where each player’s metrics are dependent on each other.

Second, and more interestingly from a tactical perspective, this measurement will allow basketball teams to more closely align physical effort with results. It is easy to simply believe that hustle and effort lead to better results, but these metrics may actually show that a lack of hustle could be one of several things. It could be a health issue or laziness or it could be good strategy in saving energy for key moments. Hustle and physical movement should not be measured in isolation, but in context of results. If a “clutch” player ends up moving less than an average player or saves exertion for peak moments, the economics of movement may actually state that excessive “hustle” is detrimental to performance. These sensors may also show that specific team tactics lead to greater efficiency, just as our analysis of shot taking shows how important it currently is to take shots from within 4 feet of the rim or on the sides of the three point line.

From a business perspective, most of us do not put out the physical effort of a professional athlete for a prolonged basis at work. But do we waste time and energy by going in the wrong direction? Are we getting stressed because our managers are not telling us the right information? There is a key challenge of understanding how to use this information productively rather than punitively. It can be easy to fall into the trap of simply stating that more time at work equates to greater productivity, but it may actually be that after a certain point, the error rate or lack of clear thinking outweighs the incremental productivity that would be expected. Follow the real business metrics rather than pure resource utilization.

However, as this occurs, one of the biggest challenges will be to translate sports analytics to business analytics. Keeping score is very easy in a rule-based sports environment, but more difficult in a business environment when it can often be difficult to define KPIs. Based on personal experience and interviews with multiple basketball analysts, DataHive has found that the academics and number crunchers conducting this analysis are largely unaware of the value that these findings could provide in the sports world. Although business analysts can quickly see how the heat and activity maps associated with basketball could translate into greater retail, field, and manufacturing success, one of the great challenges is that the sports analysts currently doing this work do not understand how their work could be translated to other fields. In our role of supporting Social Big Data for Human Insight, DataHive serves as a Sports Data Whisperer that wrangles the findings and techniques used in the sports world and brings them to the business world.

DataHive’s principals have long believed that the structured world of sports serves as a natural testing ground for the predictive, geolocated, and biometric data that is being introduced to the corporate world. Video feed metadata and sensor-based data are the Next Big Things in Big Data and it is only a matter of time before the corporate world follows suit. Regardless of your personal interest in sports, Big Data professionals should keep track of the surveillance and sensor data being used in the basketball world to see how this controlled setting provides potential insight for future enterprise technology efforts.

The DataHive on Apache Hive

Hive is a terrific Big Data tool
Regular Apis Flores Nest Closeup image courtesy of The Beehive, Oxford | Maths in the City.

As DataHive Consulting, we have been remiss in not mentioning anything about Hive up until now, especially since we think Hive is the easiest way to start using Hadoop for those just starting to make the jump from structured to unstructured data. For those just starting to look into Big Data, Apache Hive is a data warehouse software built on top of Hadoop, which supports the management, querying, and analysis of distributed datasets. It includes ETL (extract-load-transfer) tools, MapReduce-based queries, metadata storage, and indexing. But most importantly, it can all be managed through HiveQL, a query language similar to SQL. Although it lacks full ACID functionality at this point, Hive is a quick way to use Hadoop for those who have SQL and/or MapReduce framework experience.

Here’s a couple of our favorite starting points for learning more about Hive:

Where are you picking up your Hive tips? Please feel free to share in the comments!

Google’s Nest Acquisition is About Comfort

Nest Thermostat
Nest Celsius Heating image courtesy of Nest Press Room.

Yesterday, Google purchased Next for a whopping 3.2 billion dollars. Although Nest is very cool with its smart and well designed thermostats and smoke detectors, its $25 million in revenue isn’t the reason that this move got made. Instead, this acquisition represents two things: intellectual property and comfort. Continue reading