Getting Ready for Modernization: Traditional or Modern Warehousing

February 8, 2019

Business Analytics (BA) and data warehousing have become the cliched words for everything related to data and its management. It consists of the various skills and technologies applied by various enterprises to derive insights on how to improve their business, and make it more approachable to customers. With the tremendous amount of data that's coming in everyday, it is no wonder why you need some sort of reliable system to make it useful and safe.

There are mainly four types of Business Analytics:

Descriptive Analytics - Here, BA helps you summarize the raw data and convert it into a form that's meaningful. By analyzing the data, you can even know a pattern or stream of incidents that's been occurring over a period of time.

Diagnostic Analytics - As the name suggests, here the analytics expert can go deeper into the issue and see if they can locate the source of the problem. You can look at historical data to see why something has happened. In a business environment, both Descriptive and Diagnostic Analytics are mandatory, so they go hand-in-hand.

Predictive Analytics - Looking at the current events, you can forecast trends. For example, looking at a patient's current health status, you can predict prospective health risks.

Prescriptive Analytics - In this kind of analytics, the situation is analyzed in a step-by-step manner to know what conclusion to reach. Out of a number of solutions, the best and probably the most result-oriented one may be chosen.

What is a Data Warehouse?

The procured data (data is procured from relational databases, transactional systems and similar sources) is stored in a central repository known as data warehouse. Much of the data that's pulled out is mainly from online transaction processing data or OLTP. It is from this digital data warehouse that analysis is done, and decisions are made.

For a long time, BI or Business Intelligence and Data Warehousing were synonymous and one would be considered useless without the other. You had access to massive amounts of historical data, you can organize, analyze and summarize the data within this repository in whichever format you need. But as the years went by, and as the amount of data increased, the dependency of Business Intelligence on data warehouse had a tremendous downside. It needed heavy investment (time, effort and capital expense) and even then, only allowed for only certain kinds of analysis. You needed major development efforts if you wanted to process new types of data.

Data Warehouse Architecture

Data Warehouse Architecture

A data warehouse has a three-tier architecture.

The Bottom Tier - The data warehouse database server is at the bottom tier. This is the relational database system. Analysts use back end tools to perform functions like Extract, Clean, Load and Refresh. This is also known in short as ELT or Extract, Load and Transform.

This is how data is fed into the bottom tier. You can also call this the data source layer. The various kinds of data sources being:

1. Operations (HR data, sales data, marketing data, systems data, product data, etc.)

2. Web server logs

3. Internet market research data

4. Third party data (survey data, census data, demographics data, etc.)

The Middle Tier - Here you have the OLAP Server to map the operations on multidimensional data to relational data. There are two types of OLAP servers - ROLAP (Relational OnLine Analytical Processing) and MOLAP (Multidimensional OnLine Analytical Processing).

- ROLAP is for analysis on multidimensional data stored in relational database. The mapping can be done by any reporting tool used by SQL (Structured Query Language).

- OLAP is used for direct implementation of multidimensional data and operations. Data aggregation can be as per geography, share analysis or sales by time.

The Top Tier - This is the front-end client layer. It holds query tools, reporting tools and data mining tools. It has the tools and API to get the data from the warehouse.

How Does a Data Warehouse Work?

A data warehouse is a federated repository of data, so it works by collecting all the data from various sources and then organizing it into a schema, where the layout and type are specified. Different kinds of data are stored in its allotted tables, as per the schema directions. Query tools use schema to know what kind of data is needed, how to access and analyze them.

Here's how the work happens in a data warehouse:

- Extract and load data

- Cleaning the data, transforming it

- Create backups and archive data

- Managing the queries, moving the data to appropriate data sources

The process of data extraction is uniform, irrespective of the source from where the data originated. The data is organized periodically, on a weekly, monthly or annual basis. After organizing, it is updated periodically.

In some cases, certain warehouses have an additional step known as data mart. This happens when the warehouse itself manages the duties of aggregating data. The duty of the data mart is to respond to user queries and retrieve and combine the desired data from the warehouse.

Data Warehouse Benefits

There are several benefits to data warehousing, here are five of them:

1. Enabling better decision making

Companies no longer have to hedge bets or worry about taking decisions based on partial, inaccurate or limited data. The decisions made on strategy and operations are backed by solid facts and statistics. The increased BI from the warehouse can be used for business processes like financial management, sales, inventory management, market segmentation, etc.

2. Data procured from many sources

Understandably, the amount of data increases when more avenues are opened before you, and this ensures timely access at all times. The time spent on data retrieval is actually very negligent when compared to past processes.

3. Speed of data retrieval

It is possible to store large amounts of data and query it quickly. Data warehouses are designed specifically for analysis and retrieval, not maintenance of individual records.

4. Consistent quality data

The data is collected from innumerable resources, and they would all be converted into a unified format. This consolidation of data is known as ETL or scheduled data integration. This consolidates the data from multiple sources and converts it into a useful format. The IT department would find it quite useful because they know what to look for and where (making it easy to access from one interface to another). This way the team can spend more time analyzing data, rather than worrying about gathering it. This would also help you use the results that would be consistent with your business, and ensures company-wide accuracy.

5. High ROI

Of course, the ultimate aim of every business investment is high ROI. Data warehousing aids in tremendous cost savings, especially when it comes to analyzing data.

Importance of a Modern Data Warehouse

The primary requirement for modern data warehousing is analysis. It helps build the repository for all your data in its many forms - structured, unstructured, streaming, etc. This facilitates transformative solutions like BI and reporting, real time analytics and advanced analytics.

Enterprises can make use of the security, flexibility and performance of fully managed Azure service like Azure Databricks and Azure SQL Data Warehouse to make this process smooth and quick.

For example, if you prefer to go with AWS, you can choose Amazon Redshift to help modernize your on-premise data warehouse. It is fast, cost effective, scalable and helps you to automate several administrative tasks like set up, scale, manage and maintenance of the warehouse. Using high-level machine learning and parallel query execution, you can expect extremely fast high performance with Redshift.

Advantages of Modern Warehouse

Modern data warehouses have a number of advantages. Small to medium size businesses can use modern data warehouse to adapt to sudden and competitive changes in data influx. Also, you can use current data; the data being available to any number of users at any time. Apart from that, you have these most noted benefits too:

- It helps in minimizing the inefficiencies caused as a result of communication gaps. There are state of art tools that integrate the information collected by different sources and streamlining them.

- You can instruct the IT department to understand your specific needs and find a probable solution; they can tap into any kind of unstructured data and gain insights from it.

- Easily scalable, making it possible to have diverse kinds of data - both structured and unstructured.

- An important aspect of the solution is privacy and security. Robust cloud platforms like Microsoft Azure and AWS ensures high tag on security, and aim to provide threat detection, encryption of data in motion and at rest, data-level security, auditing and enable IP restrictions.

The following table gives you an insight into the simple data warehousing solutions in Microsoft Azure.

CategoryProductsDescriptionCloud DatabasesTabular databasePerfect for small to medium size businesses; has a data store that's built into Power BI and Azure Analysis Services.Azure SQL DatabaseThis is Microsoft's cloud version for SQL server database. It's perfect for transaction processing; you can run data warehouses to about 4TB.Azure SQL DWThis is about 50 times quicker than Azure SQL database, and perfect for ad hoc queries. It is Microsoft's parallel database for data warehousing.Data Migration ServicesAzure Database Migration ServicesPerfect solution for migrating on-premises SQL Server to Azure SQL Database; a PaaS solution.Own DatabasesMicrosoft Analysis ServicesAt present, an on-premises OLAP cube database. Will be shifting to Azure Virtual Machines soon.SQL Server Stretch DatabasePerfect for keeping warm and cold SQL Server 2016 data in Azure storage.Microsoft SQL ServerThis runs on Azure VM, and perfect for storing more than 4 TB data. Acts as on-premises relational database.Data Lake ServicesAzure Data Lake StorageA great solution for files systems running on HDFS or Hadoop Distributed File System; it is a petabyte-scale multi-storage system.Azure Data Lake AnalyticsActs as on-demand service for querying Azure Data Lake Store.Azure HD InsightHadoop distribution running on Azure.

Now, comes the question: When should enterprises prefer modern data warehouse over traditional?

A modern warehouse mostly resides in the cloud, and it brings forth any amount of data, scales it up and down, helps you gain insights through real-time analytics, Business Intelligence (BI) and reporting, predictive analytics, advanced analytics, operational reports and analytical dashboards.

Enterprises can enjoy business agility, so if you are looking for a fast way to analyze all that data, rather than waiting for a central IT to provision all those data warehouse, you can enjoy a robust solution that shows remarkable results. It is a great solution if you don't want to go through the pain of upgrading your on-premise solutions every 2-3 years.

The amount of unstructured data that comes in everyday is horrendous. Cloud data warehousing acts as a bridge for combining structured data from legacy on-premises data warehouses and as a huge resource for newer big data sources.

The inherent flexibility of cloud warehousing and its cost effectiveness makes it amass a greater market share, and thus definitely a nice and suitable option for storing loads of data. This is a major help when you have a number of concurrent users, especially during the peak hours. Sometimes, simple queries could take to 5-30 minutes to show response, and when there are customers spread across the globe, you cannot afford to play around. You cannot afford to wait for days and weeks to analyse data critical to decision making. IT teams need real-time insights to base their decisions and this can be easily done in a cloud environment.

Cloud based data warehousing makes it possible for you to adapt to changing market scenarios and trends, explore new paths aiding in increased ROI due to better data insights, and of course scalability. The best part is that you need to pay only for the services that you use. During the off-peak hours, you can scale down, and not pay for the resources that you don't use.

Cloud thus requires very less upfront investment and you don't have to upgrade on the resources, server rooms, hire more people, train or maintain them. The uptime and reliability are other major features of cloud based technologies that we need to highlight here. You don't have to worry about those because it is the cloud provider's responsibility to ensure those. Once you hire the right cloud data warehouse provider, your job is practically done.

Here are some cloud data warehousing providers you can consider using:

Amazon Redshift

Released in late 2012, Redshift is the brainchild of Amazon Web Services and offered high levels of security and data migration capabilities when compared to earlier products. Redshift made it practical to migrate huge amounts of data with substantial cost savings.

Azure SQL Data Warehouse

Azure SQL warehouse is a fully managed cloud data warehouse for companies of any size. It promises lightning fast query performance and highly sophisticated data security apart from seamless integration with Azure Data Lake Storage, Azure Data Factory, Azure Active Directory, Azure Databricks and Microsoft Power BI. An impressive ecosystem of partners for ingestion service, elasticity in computation, visualisation tool providers and data preparation makes SQL Warehouse an obvious choice.

Azure Databricks

It is possible to run ad hoc queries directly with Databricks, a collaborative Apache Spark�based analytics service. Data engineers, data scientists, and business analysts collaborate on shared projects keeping an interactive workspace. It is flexible, scalable and can be integrated with other Azure services seamlessly.

Other cloud data warehouse services like Snowflake, Teradata Cloud Data Warehouse, Oracle Exadata Express Cloud Service, IBM dashDB and SAP Business Warehouse are also in vogue.

Challenges of Traditional Warehouse and Why You Should Consider a Transition

There are instances where the traditional warehouses may not work for your company. This is because the traditional process is ETL or extract, transform and load for wrangling data from multiple sources, and it is a cumbersome process as the data keeps collecting at the gate. Modern data warehousing follows the format of ELT or Extract, Load and Transform.

Certain DBMSs like Microsoft SQL Server, support the ELT process, but most of the time, it doesn't, or is dependent on the characteristics possessed by the specific DBMS you need.

If your requirement is to support NoSQL/NewSQL analytical DBMS or big data initiatives using Hadoop, then the Extract, Load and Transform format is the best option. If you are still confused as to whether to go for traditional or modern process, consider the following scenarios, and then base your decision:

- What is your business use case for the data warehouse?

- What design approach of the data warehouse architecture do you prefer?

- The storage technologies that you need

- Are you looking to create new revenue opportunities through real-time processing of customer data?

Traditional data integration tools and even new tools like Sqoop aren't really quite enough for progressive enterprises. These would soon become outdated and turn too brittle for huge data velocities and volumes. It's not possible to wait weeks and months to respond to new demands from customers.

While planning for data warehouse modernization, remember: it can happen in any of the three areas or all of them - platform, architecture, and management processes. If you are looking at a complete tactical and technical migration, it is a process with many steps, especially if you are looking to migrate all the components.

There are Cases when Traditional Warehouse is Better

Traditional warehouse is actually a good choice for operational reporting.

As explained earlier, in a typical IT environment, traditional data warehouses ingest, model, and store massive amounts of data through the ETL process. This process is good when the objective is to move large amounts of data in batches and on a daily basis. This way, the data would only be a few hours old, and sometimes a day or two older.

Answer the following questions if you are planning a transition from traditional data warehousing to modern:

1. Schema migration - Migration entails moving table structures and specifications. Do you need to make structural changes for this, or even partitioning and indexing?

2. Data migration - Migrating large amounts of data is no small chore. You have to consider the time involved in the process, especially if you are restructuring the schema. Are you planning to migrate in stream, or do you need to pre-process and then migrate?

3. ETL migration - Migrating ETL process is not easy at all. You may even begin to think that data migration was easier compared to this. Sometimes you need to change the code base, rework the data transformation so it syncs with restructuring, reduce latency, etc. You might even have to look for tools to convert ETL code.

4. Data pipeline changes - If all the above questions are answered in your favor, then you will have to consider rebuilding data buildings. It is a better option than migrating existing ETL. Especially, if the aim is reusability, maintainability, performance and agility.

5. Metadata migration - Metadata is an important aspect in data warehousing and proper troubleshooting, knowing data lineage and tracing is a must. Will the metadata move readily to the new cloud platform? Can you export and import the metadata? Can you reverse engineer it? Will you have to rebuild it from the beginning? What about the mappings, dataflow, workflow and transform logic?

6. Moving users and applications - Once the above-mentioned aspects are covered and all the questions cleared, you are moving to the process of migrating users and applications. How can you do it without interrupting the business operations? What BI and analytics tools do you have to connect? What security and authorization access should you allow?

Is Co-existence Possible?

If you want to continue with the on-premise data warehousing, and move only a portion of it to the cloud, you can enjoy the following benefits. This way you can leverage cloud-based technologies to complement existing EDW (enterprise data warehouse) platforms:

- Cost savings & cost flexibility

- Using PaaS to lower operational overhead

- Reducing capital costs

- No more costly license renewals

- Scalability

- Adding advanced capabilities

- Keeping historical data longer within an on-premise solution

- Elasticity

Enterprise Data Workloads (EDW) have grown in complexity with regard to analytics and operational workloads over the years. And the enterprises that have been using EDW platforms have had problems with this because of the mix of workloads that did not fit together naturally. This led to problems regarding scheduling and performance.

This is why moving to an EDW platform in the cloud is the answer for several enterprises. Make sure the design, functionality and migration aspects are covered, so the integration from on-premise remains seamless and with absolutely no downtime.

Conclusion

Transition from traditional data warehousing to cloud is definitely a positive move for your enterprise if your business gains a lot from it. Sometimes, one can complement the other, and in that case, that's what you have to do. Choosing the best migration strategy can done by asking yourself the above-mentioned questions.

The most popularly used method is lift and shift approach, and in certain cases, you can simply move data and process it to the cloud. But that doesn't work all the time because you to have consider compatibility and adaptation of data structures. The migration process itself is a game changing technology project for which you need the help of experts. Our experienced professionals at Cabot Technologies can guide you through it.

Contact Us Today!

Free Whitepaper: 30 Questions You Must Ask Before Investing in Cloud Computing