Monthly Archives: May 2014

You Can’t Spell Informatica without IT

Informatica has long been known as a data management and data integration company. Throughout its history, Informatica has been synonymous for being as close to enterprise data as a company can get and has been a market leader in ETL, data quality, and master data management. But it seemed in the mid-2000s that Informatica had hit a rut: it was the master of its chosen markets, but didn’t know where to go next. In addition, new competitors started to come into place, powered by the cloud and the resurgence of enterprise mobility started by the iPhone and continued by the rise of Android and Samsung. Informatica had two choices: innovate or slowly fade away.

At Informatica World 2014, Informatica showed that it was truly devoted to data innovation that took the key data-driven aspects of social, mobile, and cloud into account with the concept of the Intelligent Data Platform. We discussed some of the ramifications of the IDP in our prior post, Informatica Wants to Build the Most Relevant Version of the Truth.

Within this platform, Informatica launched a couple of innovative use cases that venture-backed Silicon Valley startups would be proud to bring to market with the potential to change IT departments: Project Springbok and [email protected]

Project Springbok is a self-service data harmonization product that greatly simplifies data quality efforts by providing an Excel-like self-service interface to access and enrich your datasets. Informatica’s VP of Platform Product Marketing Piet Loubser described this as “metadata meets machine learning,” which immediately grabbed our metadata-loving hearts at DataHive. As end users access data, Springbok will automatically determine the relative size of the dataset and the quality of the data compared to typical enterprise data. Based on these parameters, Springbok will provide suggestions to automate the quality of data. For instance, if a column in the dataset is supposed to be a “Yes/No” field, but has “Yes”, “Ye”, and “Y”, this will hamper analysis and basic fact gathering. Springbok automatically profiles the data and allows business users to see all of these options and then recommend or infer ways to fix these fields. Springbok will also take multiple datasets and suggest columns that should be joined to provide the most relevant version of the truth to the end user.

In and of itself, this self-service and machine-learning aided data wrangling is helpful but not unique in the market. (Paxata and Trifacta come to mind as early leaders in this area.) However, Project Springbok has some additional tricks up its sleeve. For starters, each data quality step that is conducted by an end user is automatically recorded and can be reused either by another user or by an admin who wants to clean up the data at its original source. By giving end users a simple point-and-click method for cleaning data while recording workflows that can be integrated into enterprise data management, Springbok has the potential to solve the core problems in data quality; the sheer man-hours needed to fix dirty data and the ability for relevant end users to make immediate fixes to data that can be reused at scale by data managers.

This capability reminded me of the work that the sabermetric world did in the 1980s and 1990s to share and analyze baseball data. By expanding the access of data from the chosen few in professional baseball to a wide range of professionals, academics, and students online, we started to create communities dedicated to both cleaning and understanding baseball data. This was an important precursor to the Moneyball era and without access to clean data, Moneyball would not have happened. The spread of self-service data management workflows that can be quickly created and brought back to a community, department, or company is similarly a precursor to a new era of data-driven business where a majority of employees can actually access and manage some aspect of their data in something other than the ubiquitous Excel without creating new data inconsistencies or errors.

Although the hype of Moneyball and the rise of the analytic enterprise have been heralded over the past several years, the business relevance may have been premature because enterprise data has still never gotten to the point where end users truly clean up the vast majority of business data in a coordinated and networked manner. This is a key opportunity for Informatica as it both validates data harmonization and user-based data preparation.

Also, Springbok takes the social aspect of data seriously. Although the idea of “social” in the enterprise is often relegated to the idea of social networks such as Facebook, the fundamental importance of “social” is in creating trusted collation and interest-based groups. Springbok plans to track the users who access data as well so that employees can start seeing who else may have previously transformed or modified data. This will allow end users to start independently creating social networks based on their shared interest in specific data and potentially unlock new patterns of data usage and data interest within the organization. By combining fundamental data cleansing and linkages, social linkages, and the repeatability of enterprise data workflows into a single product that works in context of Informatica’s vision of an Intelligent Data Platform, Springbok has an exciting opportunity to improve the data management market. This improvement will not come from the speeds and feeds and raw processing power that typically defines progress in the data management world, but from fundamentally improving the usability and visibility associated with data.

Project Springbok is currently available in beta with a target of General Availability in the fourth quarter of 2014. Putting on my fantasy baseball data crunching hat that I’ve worn for almost 20 years, I would think that Springbok would be most useful for cleaning up frequently used operational data that normal end users see. This lines up with sales and service (including internal IT help desk service, customer service, and field service) data created and used by the people who keep the lights on in an organization. Springbok will actually make it relatively easy for an Excel-savvy sales people to clean up and rationalize existing data while creating tools and workflows that will help the rest of the company. Every company has “last mile” data quality problems that can’t be solved with automation or by the few data management gurus on staff. Springbok is a promising tool both to engage end users with the data and to introduce employees to each other through the social use of data.

A second announcement that truly caught my eye as being far outside the typical interests of Informatica was [email protected], a data security product. Informatica CEO Sohaib Abbasi made clear in his keynote that “In the new world of pervasive computing – with Cloud services, mobile devices and sensors everywhere – there is no perimeter”.

This is fundamentally true. When data no longer lives on-premises and can be simultaneously accessed by thousands of mobile devices over cellular, Wi-Fi, and landline networks, what is the point of trying to secure a perimeter? In today’s mobile and cloud-based computing world, traditional security tools such as passwords and encryption are increasingly meaningless in the face of the raw computing power that can be thrown at these challenges. Mobile device security is advancing rapidly, but the adoption of these tools is still relatively low in the enterprise despite the success of SAP, VMWare Airwatch, Good Technology, Mobileiron, Blackberry, and others. The new paradigm of security must be shaped around the use and access of data itself. This is the Eureka that Informatica has discovered with [email protected]

[email protected] allows companies to classify access and prioritization to specific data, using the data lineage and data management tools that Informatica Powercenter has long been known for. [email protected] is designed to identify specific data that is at risk based on source, location, proliferation, or specific compliance concerns. It also defines data usage policies on the source of the data, which allows security policies to be abstracted from the delivery network, application, or the endpoint device. Once sensitive data has been identified and associated with relevant security policies, the data can be tagged and tracked as it is accessed by other applications, devices or reports. By getting to the crux of the problem, securing the data, Informatica is going to shake up traditional views of data security. Rather than focus on how to mask, encrypt, securely deliver, securely access, virtualize, and authenticate data, companies will now be able to focus on directly placing policies on the data itself.
This concept seems simple: secure the data at its source. But the giant security Goliaths that rule IT security currently do not have a competitive product to match this.

I feel like I’m seeing the emergence of a new security category: direct data security. And this category feels very similar to the mobile device management industry that emerged in the mid-2000s as the Blackberry started to fall out of vogue. Similar to mobile device management, which was largely ignored by the endpoint security vendors at the time until it was too late to gain significant market share, direct data security will be a field that largely escapes the security megavendors while nimble and data-fluent vendors grab the majority of the market. It will be interesting to see how long it takes for Informatica to start selling into the security market and for this product to start moving the needle on Informatica’s revenues.

And there is additional potential for this concept to further change the IT world as well. If you can track and manage data directly from the source to the end user, you can also measure utilization and traffic. This could lead to a more accurate and granular way of planning Wide Area Network deployments based on a better understanding of demand. Rather than simply estimating traffic based on certain types of content or application types, enterprises could directly track the data itself. Informatica has not announced any plans for network capacity monitoring or assessment, but this could be an interesting next step to take advantage of the data tracking and monitoring capabilities that Informatica has long had across the entirety of enterprise data.

[email protected] is going to be available in beta in the second half of 2014 with the goal of general availability in 2015. As a former enterprise IT guy, I believe IT departments should jump on this beta as it will add immediate value to security war rooms and NOCs (Network Operations Centers).

As IT has shifted from a hardware-based department to a data-driven department, management and security tools have not kept pace. With Informatica’s announcements last week at Informatica World, IT now has a chance to augment top-down data management efforts and perimeter-based security efforts with efforts that focus on people and data, which are the true foundations of any company. It can be easy for a multi-billion dollar company to rest on its laurels or to get distracted by side projects that do not solve core enterprise technology problems (as some in the data management and business intelligence vendor spaces have done) With these announcements, Informatica is stepping into new markets that both reflect its heritage and show the willingness to take risks and step outside of Informatica’s traditional comfort zone. These announcements fundamentally demonstrate the ongoing opportunities that exist in enterprise data, which is now the true foundation of IT in a social, mobile, and cloud-based world.