Traditional approaches to data management pose many challenges to financial institutions, with the segregation of ‘hot’ and ‘cold’ data leading to inefficiencies.
Over time, the concepts of 'hot' and 'cold' data have become a significant part of data management strategies. 'Hot' data refers to data that is frequently accessed and needs to be readily available for quick retrieval. This type of data is often stored in-memory to ensure fast access speeds, which is crucial for real-time processing and analytics.
On the other hand, 'cold' data is data that is accessed less frequently and does not require immediate availability. It is typically stored on slower, more cost-effective storage mediums like hard disk drives. Cold data is often archived or used for long-term storage due to its infrequent access patterns.
In many cases, 'hot' and 'cold' data are managed in separate environments, each optimized for their specific access patterns and storage needs. This can lead to different data models and management systems for each type of data. For instance, a data warehouse or a high-performance database might be used for 'hot' data, while a data lake or archival storage system could be utilized for 'cold' data.
Data citizens may not be aware of the full extent of the data that is available to them. This lack of awareness can be due to inadequate documentation, complex data landscapes, or siloed departments where information is not shared efficiently. Consequently, even if the data exists and could potentially be valuable, it remains unused.
When structured data is not incorporated into the frequently accessed ('hot') data model, it becomes less accessible to analysts who rely on these models for their day-to-day analysis. The process of integrating this 'cold' data into a 'hot' data environment can be hindered by technical complexities, such as differing data formats, compatibility issues, or the need for significant transformation or cleansing of the data.
There can be organizational barriers, such as policies or bureaucratic processes, that restrict access to certain data sets. In some cases, the data might be held in different departments or systems with limited cross-functional access, making it challenging for analysts to obtain the data they need.
The technical infrastructure in some organizations may not be equipped to handle the seamless transition of data from 'cold' to 'hot' states. This issue is compounded when dealing with large volumes of data or when the data is stored in legacy systems that are not easily integrated with modern analytics tools.
Due to these factors, a considerable amount of structured data remains underutilized or completely unused, effectively 'frozen' in the organization's data repository. Addressing these challenges requires not only technical solutions, such as better data integration tools and more flexible data architectures, but also organizational changes that promote data visibility, accessibility, and cross-departmental collaboration.
Financial institutions have historically employed two primary approaches to manage and analyze their vast data repositories: high-performance analytics environments and custom analytics built on core banking systems.
The high-performance analytics environments, often relying on in-memory computing, have been set atop golden sources, such as traditional on-premise data lakes built using Hadoop/HDFS. These environments have progressively moved towards cloud storage, reflecting a broader industry trend. The key advantage of these systems has been their ability to handle large volumes and complex computations of data, which is critical in risk analytics.
In parallel, there has been a trend towards developing custom analytics solutions using object-oriented programming languages like Java or .NET, integrated directly with core banking systems, including Murex, FIS, Finastra, and others. These custom solutions are tailored to the unique requirements of financial institutions, offering a high degree of specificity in risk analysis.
Both these approaches, however, have encountered significant challenges. One of the primary issues has been the complexity and cost associated with the maintenance of these systems. Advanced development skills are required to develop and maintain calculated measures, leading to high operational costs. Additionally, these systems often experience peaks in hardware resource consumption, particularly in in-memory based environments, posing a challenge to infrastructure capacity and efficiency.
The migration to cloud computing has introduced new dimensions to these challenges. While cloud environments offer benefits like scalability, flexibility, and lower upfront costs, they also bring forth a variable pricing model based on resource consumption. This model can lead to unexpectedly high costs for systems that were not originally designed with cloud efficiency in mind. As a result, systems that were cost-effective in an on-premise setup may become financially burdensome in a cloud-based infrastructure model, particularly if they lack optimization for cloud-native operations.
The transition to cloud computing, therefore, necessitates a strategic and thoughtful approach. Financial institutions are required to not only re-architect their solutions for cloud efficiency but also to embrace new cloud-native technologies.
This is what Opensee has been addressing since our inception in 2018 - offering a new alternative to hot vs cold data, and to costly high-performance databases and hard-to-maintain custom-built analytics.
In an upcoming article, we will delve into the organizational implications of this model, and how the application of data mesh can allow the delivery of such service as a fully integrated offer.
About the author: Emmanuel Richard is a Data and Analytics expert with over 25 years of experience in the technology industry. His extensive background includes leadership roles at industry giants and startups across the US and Europe.