The world of data management is undergoing a massive shift, as complex data sets are now part of everybody’s job, whether they realize it or not. But currently, supporting these complex data sets is tedious and time-consuming.
Because of the technical expertise required to access these sets, IT workers find themselves serving as gatekeepers, even though they lack the capacity as a department to address the growing number of requests in a timely manner and still do their actual core jobs. As the turnaround time for preparing data sets grows, end users eventually reroute around IT, both to speed up the work and to bring in apps and data that IT may not support. The world of ERP is quickly being replaced with a world where third-party data can augment business decisions quickly and easily. In a Bring Your Own Device, social, and SaaS-driven world, employees will find and use their own data if they can’t find data in their environments. Data is being pressed by the key drivers of social, mobile, machine-to-machine, and cloud just as every other aspect of technology is. But this leaves IT without oversight to guarantee data quality or security or even to manage data for the business as a whole. This can be disastrous from both data integrity and security viewpoints.
Rather than fighting for nonexistent time and people resources and risking data integrity and value, the future of data management needs to allow for the support of many data sets, many formats, and contextual integration while providing enough flexibility and user-friendliness for end users to access appropriate data on demand.
In this context, Informatica announced upcoming improvements to their information management product in the form of an “Intelligent Data Platform.” To demonstrate the concepts of the platform, Informatica is developing several new applications and solutions relating to common modern data management pain points: a “Managed Data Lake” to manage and refine data, “Data Harmonization” to predict and provide contextual suggestions for how end users should put their data sets together, and “Secure@Source” to enhance data security. Each of these points speaks to a next-generation demand to support data above and beyond the traditional ETL, data integration, and data management tools that have historically defined the enterprise use of data.
The first of these is the idea of a “managed data lake,” which provides a way to manage and refine all sorts of incoming data (whether structured, semi-structured, or unstructured), while providing an interface for end users to be able to select appropriate data sets without needing to navigate through an IT department bottleneck. Currently, data access relies heavily on IT data architects who take weeks or months to manually search for, access, and process data into a usable, quality-checked format. Individual users often create their own business rules, so when their data comes back, it’s frequently inconsistent with the data in the centralized storage. And once these users have taken the data and popped it into their software of choice, there’s no way to trace where else it goes, or to know how many copies of the data are out there, or to even know that standard data-handling procedures will be followed. The concept of “one version of the truth” isn’t even remotely possible in this kind of environment.
So Informatica has conceptualized better oversight and management through what they’re calling a “Managed Data Lake” on their Intelligent Data Platform. Data is put into the lake from all sorts of sources, structured and unstructured. It gets refined, brought into centralized storage and given scores for appropriate usage, and then that data is made available to be brought into other apps in a sort of data boutique that reduces the need for manual coding of data APIs and SQL-based queries. By providing consistent end to end governance, many of the initial governance tasks associated with creating a consistent data layer can be automated while refinement tasks can now be placed directly in end-users’ hands. As a result, companies will be able to develop more relevant data-driven results based both on consistent data rules and contextual data refinement.. By shifting the basic data refinement and linkages through the managed data lake, IT can focus on data reliability, context, and security rather than basic data integration and provisioning. IT is shifting from a pure support role to a more strategic advisory model that recommends what you should use and what the right actual tools are, rather than simply providing help desk support for whatever tools are on hand. All end-users need to do is go to the boutique, search for relevant data sets, preview this data, and add it to their cart for further work.
I admit, I’m not a huge fan of the “managed data lake” name. The variety and breadth of data out there for the enterprise doesn’t fit well into the concept of a lake. Companies are trying to combine enterprise applications, Software and Infrastructure as a Service, third-party applications, real-time sensor and environmental data, social networking, and trusted curation of metadata into a combination of data linkages that are aligned to a variety of job roles. As a result, companies have less control than ever over the management or confinement of data. So, this is a bit semantic, but the idea of a refined data universe seems more accurate to us even if it may be less marketing-friendly.
But the concept that Informatica is introducing is quite important. One of the key challenges associated with using more data and becoming “data-driven enterprises” is that data needs to be tied together quickly and efficiently. If new data sources can be linked together and cleansed on-demand for employees, companies can spend more time making appropriate decisions and less time preparing data that may or may not even be applicable to a business task. Now that simpler tools such as IFTTT have familiarized early adopter-type consumers with the idea of linking data from disparate sources and getting them into separate tools automatically, more end-users can understand using these kinds of tools in the workplace, where they might not have been aware of the possibilities before.
While you’re doing your “data shopping” in the managed data lake, the second application built on Informatica’s Intelligent Data Platform vision comes into play. “Data Harmonization” will infer likely “next steps” for the data you’ve got in your cart. Informatica will track how early adopters are defining key data demands, then automate finding common columns across those different data sets. For example, are the data sets in your cart commonly used with specific linkages? Do both data sets have common columns such as “name and address” columns? Based on this knowledge, the data harmonization aspect can search for matching cross-references, duplicates, and variations to provide the combined big picture more easily.
One of the most obvious use cases for this data harmonization is in mergers and acquisitions activity, which is an immense governance, risk management, and compliance issue for organizations. With the velocity that startups are purchased or peers are rolled up, the need for data synchronization among disparate parties only becomes more important with time. With a “data harmonization” approach, Informatica could play an increasingly important role in the governance and legal aspects associated with bringing businesses together as a strategic partner to existing compliance solutions.
The third part of Informatica’s Intelligent Data Platform vision has to do with security. Throughout the whole data management process, security is key, yet data security is shoved into IT to deal with as the sole responsible party. No matter how many times IT sends out notices about best security practices or forces password changes, end users are not seen as being responsible for data security. Considering how many different ways we acquire data and want to interact with it now, the traditional “perimeter” approach to information security is broken. People want to access data on their computers at work – and their computers at home, where they’re working, and other remote workstations, their mobile phones and tablets – and even to put it into apps that IT doesn’t know about or have time to vet and approve, just to get the job done faster.
And data storage in the cloud only makes it more difficult to ensure that the physical server is secure. It’s no longer enough to protect just the networks and endpoints the data traverses; taking advantage of (meta)data to track where data goes and how it’s used is the next, necessary step. In a virtual, mobile and cloud world, the concept of a “data perimeter” becomes increasingly fluid and meaningless. Informatica’s idea to solve this problem is a data security heat map as part of their data tracking. There are three parts: a “data sensitivity” index will identify sensitive data in the set (or in the lake?); a “data proliferation” index will identify machines the data passes through on its way to the user; a “data usage” index will track who uses what, how much, and what types of special data access privileges that user has.
With the Secure@Source application, the vision is that Informatica will monitor activity in realtime to identify both typical usage patterns and suspicious ones, and take action to protect the data on detecting a suspicious usage pattern. So if I share some data with my colleague Hyoun, and he shares it with others in his department, and one of them shares said data with a client – it illuminates the path the data takes. In and of itself, this is a powerful tool for forensic data analysis and for tracking data back to its source.
But there is a bigger picture to this data visualization as well. By providing a clear view of data in motion, Informatica also illuminates social data relationships throughout the company and externally. What data is being used most often? How does data propagate throughout the organization? What types of data are being accessed most often, especially from external sources? Which data sets and sources are the key influencers for making business decisions? There is a whole strategic world of how companies use data that could also be uncovered through this vision of data usage that could become a true strategic advantage. If an organization can understand how it uses data and move towards a goal of a specific model of data sharing and access, this could vastly improve the utility of data that is already accessible.
As an example, if I pull in information from external data sources, whose information do I pull in most often? Who influences how my company runs without everyone else realizing it? This aspect of data-driven influence is not a new one in the consumer world: it was particularly true on del.icio.us, a social bookmarking website I used for many years. I was certainly heavily influenced by who I followed on del.icio.us, who I trusted to follow; who seemed to be writing relevant things that made sense. If I’m influenced, my contributions to my company are also affected, and this is a business opportunity – for a partnership, for a customer, whatever. And who influences my influencers? From a data perspective, this could mean that perhaps Dun & Bradstreet data is overused in making decisions whereas semantic analysis of key customer transactions are being underused because employees lack training in using unstructured and semi-structured data. By seeing who is using the data and where the data came from, Informatica has an opportunity to both transform data security and data-driven strategy.
So, the vision that Informatica has is an ambitious one. And, frankly, it may be too much for some data managers to handle. Although the worlds of data ingestion, data duplication, data integration and transformation, data quality, and traditional data security are still very valuable, they are increasingly seen as areas that are being commoditized. Companies looking for data management solutions need to think not only about basic extracting, transfer, and loading strategies but on how their organization is planning to use data. Does your organization want to simply store “Big Data” or does it actually want to bring together tens or hundreds of data sets chosen both by IT and line-of-business users and with structured, semi-structured, and unstructured formats? All of this data needs to be rationalized at some level with consistent corporate governance so that even if different end users need to format and shape the data differently at the end user level, there is still a consistent data layer at the end of the day.
Although predictive analytics are a very hot topic, what about making data management predictive? Why not prepare your infrastructure to support your entire end user base by learning lessons from the usage patterns of the early data usage adopters? Networked and predictive insights aren’t just for front-end analytics; they can also support the building of a data management strategy that can optimize access and context for your users.
Finally, how important is data security to your organization? Is it important enough that you are willing to flip the concept of data security inside out? Instead of dealing with a data perimeter, it is time to think of each bit, byte, and packet as its own discrete unit that needs to be tracked and managed from end to end. The ramifications of this level of visibility are still very new, which is why DataHive is bringing up not only the security aspects of this, but also the social and trusted usage aspects of tracking data movement.
Informatica’s vision in 2014 is an important one for companies to both future-proof their data management investments for the next several years and to support the future of the data-driven enterprise. Informatica has moved away from the traditional world of enterprise application support and towards the reality of a world where every employee can potentially bring new data and applications into the company. Rather than take a top-down and constrictive approach, Informatica is embracing the openness. This is a challenging new world for enterprise data managers to accept, but ultimately one that they will need to tackle head-on. Informatica’s vision provides a straightforward challenge: are companies ready to accept their employees’ data usage patterns now or are they going to wait until these patterns create too much risk to ignore?