Dimensional modelling Vs Corporate Information Factory

In this column, we’ll clarify the similarities and differences between the two dominant approaches to enterprise warehousing. The first approach been Kimball Dimensional Modelling and the second been Corporate Information Factory.

#Data Design

Read More
Comment

Data Warehouse Dining Experience

Data warehouses should have an area that focuses exclusively on data staging and extract, transform, and load (ETL) activities. A separate layer of the warehouse environment should be optimized for presentation of the data to the business constituencies and application developers.

This division is underscored if you consider the similarities between a data warehouse and restaurant.

The Kitchen
The kitchen of a fine restaurant is a world unto itself. It’s where the magic happens. Talented chefs take raw materials and transform them into appetizing, delicious multi-course meals for the restaurant’s diners. But long before a commercial kitchen is put into productive use, a significant amount of planning goes into designing the layout and components of the workspace.

The restaurant’s kitchen is organized with several design goals in mind. First, the layout must be highly efficient. Restaurant managers are very concerned about kitchen throughput. When the restaurant is packed and everyone is hungry, you don’t have time for wasted movement.

Delivering consistent quality from the restaurant’s kitchen is the second goal. The establishment is doomed if the plates coming out of the kitchen repeatedly fail to meet expectations. A restaurant’s reputation is built on legions of hard work; that effort is for naught if the result is inconsistent. In order to achieve reliable consistency, chefs create their special sauces once in the kitchen, rather than sending ingredients out to the table where variations will inevitably occur.

The kitchen’s output, the meals delivered to their customers, must also be of high integrity. You wouldn’t want someone to get food poisoning from dining at your restaurant. Consequently, kitchens are designed with integrity in mind. Salad prep doesn’t happen on the same surfaces where raw chicken is handled.

Just as quality, consistency, and integrity are major considerations when designing the kitchen layout, they are also ongoing concerns for everyday management of the restaurant. Chefs strive to obtain the best raw material possible. Procured products must meet quality standards. For example, if the produce purveyor tries to unload brown, wilted lettuce or bruised tomatoes, the materials are rejected, as they don’t meet minimum standards. Most fine restaurants modify their menus based on the availability of quality inputs.

The restaurant kitchen is staffed with skilled professionals wielding the tools of their trade. Cooks manipulate razor sharp knives with incredible confidence and ease. They operate powerful equipment and work around extremely hot surfaces without incident.

Given the dangerous surroundings, the kitchen is off-limits to patrons. It simply isn’t safe. Professional cooks handling sharp knives shouldn’t be distracted by diners’ inquiries. You also wouldn’t want patrons entering the kitchen to dip their fingers into a sauce to see whether they want to order an entree or not. To prevent these intrusions, most restaurants have a closed door that separates the kitchen from the area where diners are served.

Even restaurants that boast an open kitchen format typically have a barrier, such as a partial wall of glass, separating the two environments. Diners are invited to watch, but can’t wander into the kitchen themselves. But while part of kitchen may be visible, there are always out-of-view back rooms where the less visually desirable preparation work is performed.

The data warehouse’s staging area is very similar to the restaurant’s kitchen. The staging area is where source data is magically transformed into meaningful, presentable information. The staging area must be laid out and architected long before any data is extracted from the source. Like the kitchen, the staging area is designed to ensure throughput. It must transform raw source data into the target model efficiently, minimizing unnecessary movement if possible.

Obviously, the data warehouse staging area is also highly concerned about data quality, integrity, and consistency. Incoming data is checked for reasonable quality as it enters the staging area. Conditions are continually monitored to ensure staging outputs are of high integrity. Business rules to consistently derive value-add metrics and attributes are applied once by skilled professionals in the staging area, rather than relying on each patron to develop them independently. Yes, that puts extra burden on the data staging team, but it’s done is the spirit of delivering a better, more consistent product to the data warehouse patrons.

Finally, the data warehouse’s staging area should be off-limits to the business users and reporting/delivery application developers. Just as you don’t want restaurant patrons wandering into the kitchen and potentially consuming semi-cooked food, you don’t want busy data staging professionals distracted by unpredictable inquiries from data warehouse users. The consequences might be highly unpleasant if users dip their fingers into interim staging pots while data preparation is still in process. As with the restaurant kitchen, activities occur in the staging area that just shouldn’t be visible to the data warehouse patrons. Once the data is ready and quality checked for user consumption, it’s brought through the doorway into the warehouse’s presentation area. Who knows, if you do a great job, perhaps you’ll become a data warehouse celebrity chef a la Emeril Lagasse or Wolfgang Puck.

The Dining Room
Let’s turn our attention to the restaurant’s dining room. What are the key factors that differentiate restaurants? According to the popular Zagat Surveys, restaurants around the world are rated on four distinct qualities:

  • Food (quality, taste, and presentation)

  • Decor (appealing, comfortable surroundings for the restaurant patrons)

  • Service (prompt food delivery, attentive support staff, and food received as ordered)

  • Cost.

Most Zagat Survey readers focus initially on the food score when they’re evaluating dining options. First and foremost, does the restaurant serve good food? That’s the restaurant’s primary deliverable. However, the decor, service, and cost factors also affect the patrons’ overall dining experience and are considerations when evaluating whether to eat at a restaurant or not.

Of course, the primary deliverable from the data warehouse kitchen is the data in the presentation area. What data is available? Like the restaurant, the data warehouse provides “menus” to describe what’s available via metadata, published reports, and parameterized analytic applications.

Is the data of high quality? Data warehouse patrons expect consistency and quality. The presentation area’s data must be properly prepared and safe to consume.

In terms of decor, the presentation area should be organized for the comfort of its patrons. It must be designed based on the preferences expressed by the data warehouse diners, not the staging staff. Service is also critical in the data warehouse. Data must be delivered, as ordered, promptly in a form that is appealing to the business user or reporting/delivery application developer. Finally, cost is a factor for the data warehouse. The data warehouse kitchen staff may be dreaming up elaborate, albeit expensive meals, but if there’s no market at that price point, the restaurant won’t survive.

If restaurant diners are pleased with their dining experience, then everything is rosy for the restaurant manager. The dining room is always busy; there’s even a waiting list on some nights. The restaurant manager’s performance metrics are all promising: high numbers of diners, table turnovers, and nightly revenue and profit, while staff turnover is low. Things look so good that the restaurant’s owner is considering an expansion site to handle the traffic. On the other hand, if the restaurant’s diners aren’t happy, then things go south in a hurry. With a limited number of patrons, the restaurant isn’t making enough money to cover its expenses (and the staff isn’t making any tips). In a relatively short time period, the restaurant shuts down.

Restaurant managers often proactively check on their diners’ satisfaction with the food and dining experience. If a patron is unhappy, they take immediate action to rectify the situation. Similarly, data warehouse managers should proactively monitor data warehouse satisfaction. You can’t afford to wait to hear complaints. Often, people will abandon a restaurant without even voicing their concerns. Over time, you’ll notice that diner counts have dropped, but may not even know why.

Inevitably, the prior patrons of the data warehouse will locate another “restaurant” that better suits their needs and preferences, wasting the millions of dollars invested to design, build, and staff the data warehouse. Of course, you can prevent this not-so-happy ending by being an excellent, proactive restaurant manager. Make sure the kitchen is properly organized and utilized to deliver as needed on the presentation area’s food, decor, service, and cost.

 

http://www.kimballgroup.com/2004/01/data-warehouse-dining-experience/

Comment

What we mean by “data lake” and DW “augmentation”

When we say “data lake,” we’re referring to a centralized repository, typically in Hadoop, for large volumes of raw data of any type from multiple sources. It’s an environment where data can be transformed, cleaned and manipulated by data scientists and business users. A “managed” data lake is one that uses a data lake management platform to manage ingestion, apply metadata and enable data governance so that you know what’s in the lake and can use the data with confidence.

“Augmentation” means enhancing what you already have, not starting from scratch. With a data warehouse augmentation, you keep your data warehouse and existing BI tools, but add a complementary, integrated data lake. For example, you could use a complementary data lake to prepare datasets and then feed them back into a traditional data warehouse for business intelligence analysis, or to other visualization tools for data science, data discovery, analytics, predictive modeling and reporting.

Why DW augmentation? 

We find that companies typically consider a DW augmentation project for two scenarios: 


Blue sky – You want to be able to do new things, beyond the capabilities of the data warehouse. This could include supporting specific business use cases for more advanced big data analytics or data science to find new insights or generate revenue; for example, with new products and services or through improved, more personalized customer experience. 


Cut costs – You want to continue doing what you’re already doing with your data warehouse, but do it cheaper using commodity hardware.

Reference Architecture

What could a data warehouse augmentation look like in your environment? Let’s review some architecture diagrams.

The differences between a traditional data warehouse architecture and data lakes are significant. An DW is fed data from a broad variety of enterprise applications. Naturally, each application’s data has its own schema, requiring the data to be transformed to conform to the DW’s own predetermined schema. Designed to collect only data that is controlled for quality and conforming to an enterprise data model, the DW is capable of answering only a limited number of questions. Further, storing and processing all data in the data warehouse is cost prohibitive.

Typically an organization will augment a data warehouse with a data lake in order to enjoy a reduction in storage costs. The data lake is fed information in its native form and little or no processing is performed for adapting the structure to an enterprise schema. The data can be stored on commodity hardware, rather than expensive proprietary hardware. Required data can be pulled from the lake to leverage in the data warehouse. While this model provides significant cost savings it does not take advantage of the strategic business improvements that a data lake can provide. 

Reference Diagram #3: Data Warehouse Augmentation and Offload Diagram

The biggest advantage of data lakes is flexibility. By allowing the data to remain in its native format, a far greater stream of data is available for analysis. When an organization enlists the data lake to offload expensive data processing, in addition to storage, the entire business can benefit from more timely access to more data.

 


 

Comment

Tableau

Tableau - Magic Quadrant for Business Intelligence and Analytics

Gartner Magic Quadrant for Business Intelligence and Analytics

Gartner Magic Quadrant for Business Intelligence and Analytics

Tableau's highly intuitive, visual-based data discovery, dashboarding, and data mashup capabilities have transformed business users' expectations about what they can discover in data and share without extensive skills or training with a BI platform.

Strengths

  • Tableau offers an intuitive, visual-based interactive data exploration experience that customers rate highly, and that competitors, large and small, try to imitate.
  • Core differentiator — making a range of types of analysis (from simple to complex) accessible and easy for the ordinary business user, whom Tableau effectively transforms into a "data superhero." Enabling business users to perform more complex types of analysis without extensive skills or IT assistance 
  •  A high percentage of users also said they chose its platform because of its low cost to implement; Tableau customers reported one of the lowest implementation costs per user of any vendor in the survey. 

Cautions

  • Although Tableau's average user count continues to grow and was above the market average in this year's customer survey, its products are often used to complement an existing BI platform standard; only 42% of its customers considered it as their BI standard.
  • New shoots of innovation relating to the automation of advanced analytics are emerging from vendors like IBM (Watson Analytics), which could threaten the dominance of the Tableau-based data discovery paradigm. Although these efforts have had limited success to date, the stakes are high for the incumbent vendors that are losing new license purchases to data discovery vendors such as Tableau.
  • Gartner's inquiries reveal that customers often view Tableau as inflexible in contract negotiations and complain that its 25% annual maintenance fee is higher than those of most other vendors.
1 Comment

HANA (SAP)

Relational databases typically use row-based data storage. However, column-based storage may be more suitable for some business applications.Traditional databases store data simply in rows. The SAP HANA in-memory database stores data in both rows and columns.

Read More
Comment

Data Governance

Data governance is an integrated discipline for the identification of decision rights and accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival and deletion of data and information.

Read More
Comment

Columnar Database

A columnar database is a database management system (DBMS) that stores data in columns instead of rows. This article elaborates on row based and column based architecture with an example.

Read More
Comment

1000's of table

Came across this article on managing relational data from a very different perspective. Typically relational design in OLTP grows to 100’s of tables. When we have 100’s of table, we require each business change to have schema changes defined upfront, are more difficult to use with agile methods and a challenge to use in a continuously integrated environment without significant additional engineering.

SQL Performance  Design

Read More
Comment