Gartner Magic Quadrant for Business Intelligence Platforms

Modern analytics and BI platforms are now mainstream purchases for which key differentiators are augmented analytics and support for Mode 1 reporting in a single platform

Magic Quadrant

Figure 1. Magic Quadrant for Analytics and Business Intelligence Platforms

Source: Gartner (February 2019)

Below vendor strengths and cautions for Tableau(Leaders) and Yellowfin (niche player) -


Tableau offers an intuitive, interactive, visual-based exploration experience that enables business users to access, prepare, analyze and present findings in their data without technical skills or coding. Tableau Online is the cloud-based SaaS offering.

In 2018, Tableau introduced a new, lower-priced Viewer role and now leads with named-user, subscription-based pricing.Tableau’s reference customers report that they use it primarily for decentralized analytics (70%) and agile, centralized BI provisioning (51%).

Tableau is a Leader, thanks to the popularity of its product, high customer satisfaction scores and strong roadmap.


  • Easy visual exploration and data manipulation: Tableau enables users to rapidly ingest data from a broad range of data sources, blend them, and visualize results using best practices in visual perception. Data can be manipulated while visualizing — such as when creating groups, bins and new hierarchies — all with a high degree of ease of use.

  • Customers as fans: Customers have a fanlike attitude toward Tableau, as evidenced by the record 17,000 users that attended its 2018 annual user conference. Reference customers placed Tableau in the top third of Magic Quadrant vendors for customer experience, and gave it high scores for achievement of business benefits. Tableau sets the industry standard for user enablement with Meetup groups, roadshows, online tutorials and availability of skills in the market.

  • Momentum: Tableau grew its total revenue to just over $800 million through 3Q18 — double-digit growth compared with 2017.


  • Product gaps: Support for querying multiple fact tables and complex schemas in a single data source is absent from Tableau’s product, which is used primarily for Mode 2 use cases. It does not support scheduled, bursted reports in a variety of output formats, or the promotion of content through development, testing and production processes. Support for bursted reports with output to PDF is on the short-term roadmap.

  • Support decline: The responses of surveyed reference customers, together with other Gartner research, indicate that the quality of Tableau’s product support declined in 2018. In this regard, reference customers’ responses put it in the bottom third of vendors in this Magic Quadrant, due partly to more difficult upgrades. Hyper was a major engine replacement to boost performance — one that has not gone as smoothly as previous releases. Further, 12% of Tableau’s reference customers say poor performance remains a problem, and 13% say the product cannot handle the required data volumes (both percentages are above the average).

  • Sales experience, contracting and cost: Tableau did well to introduce a new, lower-priced viewer license to compete better against Microsoft in particular, but this license is only available with a subscription license. Consequently, perpetual customers have to move to a new named-user and subscription model to be able to buy this new license. These conversions can be a point of friction, which may explain why Tableau’s reference customers place it in the bottom third of vendors in this Magic Quadrant for sales experience. Gartner Peer Insights reviewers place it in the bottom third for price and contract flexibility. Over one-third (35%) of Tableau’s reference customers identified cost as a limitation with regard to wider deployment (the second-highest figure among vendors in this Magic Quadrant).


Yellowfin began primarily as a vendor of a web-based modern analytics and BI platform, but quickly expanded to include data preparation and augmented analytics.

Yellowfin 8, released in October 2018, includes a new automatic insight generation product called Signals. In addition, a newly released Stories feature is one of the best examples of how to combine visual exploration with storytelling and infographics. Reference customers frequently use Yellowfin for agile, centralized BI provisioning (43%) and decentralized analytics (34%).

Yellowfin remains a Niche Player as it specializes in lower-priced analytics and BI, particularly in Asia/Pacific and for OEM use cases. Below-average scores for customer experience and operations contribute to its relatively low position on the Ability to Execute axis.


  • Easy to use for Mode 1 and Mode 2 (Bimodal): Yellowfin’s single platform includes one of the broadest ranges of capabilities, spanning data preparation, reporting with scheduled distributions, visual exploration and augmented analytics. Reports, dashboards and administration are all accessed via a browser-based interface, with no desktop components. Data is usually queried live from a relational data source, as caching into the columnar, in-memory engine is optional.

  • Innovation: Yellowfin was one of the first to bring augmented analytics capabilities to market in 2017, and it expanded them in 2018. Signals brings personalized alerting based on ML algorithms — a clear differentiator as only a few vendors support such a capability. While many vendors integrate NLG capabilities from third parties, Yellowfin provides them natively and in a range of languages. In recognition of the reality that many customers have multiple analytics and BI tools, Stories supports embedding of content from Tableau and Qlik Sense, with an interface inspired by social platforms. Beyond product innovation, Yellowfin has been active in the Data for Good movement and strives to increase workforce diversity in the fields of science, technology, engineering and mathematics.

  • Sales strategy: Yellowfin has one of the best sales strategies, with a strong partner network and clear, attractive pricing and packaging. Over one-fifth (21%) of Yellowfin’s reference customers identified low price as a key reason why they selected its product — a higher percentage than for any other vendor in this Magic Quadrant. Yellowfin sells on a subscription basis (named-user or number of processor cores). There are starter packages for small and midsize deployments, and flexible pricing models for OEMs.


  • Limited scalability and product gaps: Performance has been a perennial problem for Yellowfin, even though it has added caching and columnar storage as an option. Fourteen percent of its reference customers identified poor performance as a problem, with 9% considering it a barrier to wider deployment (a percentage in the top third for this complaint among vendors in this Magic Quadrant). The product does not readily support complex queries from multiple fact tables in the business views.

  • Weak momentum: Yellowfin, a privately owned company and one of the smaller vendors in this Magic Quadrant, has grown more slowly than key competitors. This may be partly due to Yellowfin’s lack of venture capital funding. Yellowfin rarely appears on vendor shortlists and is best known in Asia/Pacific. Its employee head count as of December 2018 was 183, a 6% year-over-year increase; this represents modest growth, compared with competitors with more momentum.

  • Operations and customer experience: Reference customers’ scores put Yellowfin in the bottom third of Magic Quadrant vendors for both customer experience and operations. Product quality has been a continual problem, and 9% of Yellowfin’s reference customers identified this as a limitation in relation to wider deployment.


Architecture Pattern

Patterns and Pattern Languages are ways to describe best practices, good designs, and capture experience in a way that it is possible for others to reuse this experience.

A short slogan for Pattern - `a solution to a problem in a context''

  • Context refers to a recurring set of situations in which the pattern applies

  • Problem refers to a set of forces -- goals and constraints -- that occur in this context

  • Solution refers to a design form or design rule that someone can apply to resolve these forces

Elements of a Pattern

The below elements are taken from Pattern-Oriented Software Architecture: A System of Patterns.


A meaningful and memorable way to refer to the pattern, typically a single word or short phrase.


A description of the problem indicating the intent in applying the pattern - the intended goals and objectives to be reached within the context and forces described below (perhaps with some indication of their priorities).


The preconditions under which the pattern is applicable - a description of the initial state before the pattern is applied. This tells us the pattern's applicability.


A description of the relevant forces and constraints, and how they interact/conflict with each other with the intended goals and objectives. Forces reveal the intricacies of a problem and define the kinds of trade-offs that must be considered. For Example -

  • Security, robustness, reliability, fault-tolerance

  • Manageability & Scalability (incremental growth on-demand)

  • Efficiency, performance, throughput, bandwidth requirements, space utilization

  • etc., ...


A description, using text and/or graphics, of how to achieve the intended goals and objectives The description of the pattern's solution may indicate guidelines to keep in mind (as well as pitfalls to avoid) when attempting a concrete implementation of the solution. Sometimes possible variants or specializations of the solution are also described.


One or more sample applications of the pattern which illustrate each of the other elements: a specific problem, context, and set of forces; how the pattern is applied; and the resulting context.

Resulting Context

The state or configuration of the system after the pattern has been applied, including the consequences (both good and bad) of applying the pattern, and other problems and patterns that may arise from the new context. It describes the post conditions and side-effects of the pattern.


Rationale provides insight into the internal workings of the pattern . The solution component of a pattern may describe the outwardly visible structure and behavior of the pattern, but the rationale is what provides insight into the deep structures and key mechanisms that are going on beneath the surface of the system.

Related Patterns

The relationships between this pattern and others.

Known Uses

Known applications of the pattern within existing systems, verifying that the pattern does indeed describe a proven solution to a recurring problem. Known Uses can also serve as Examples.

Patterns for architecture are very much in the infancy. The potential to bring enterprise best practices into individual Agile team lies with defining pattern and hence the relevant potential.



Knowledge Graph Platform for Wealth

RDMS has limitations when it comes to modelling and retrieving data that has highly connected dataset. The below Client relationship in Wealth management is an example. Modelling such a complex relationship in a table structure has huge challenges. RDMS like Oracle does not have much native capabilities that help with storing and navigating relationships.

Client Relationship

Client Relationship

Client Relationship

Client Relationship

The Knowledge graph ecosystem is for handling and navigating such relationships. The data is stored using W3C standard RDF (Resource Description framework) and queried using SPARQL, which builds queries to navigate highly connected datasets.


Wealth Management Use Case for using Knowledge Graph

Case#1 - Wealth Estate planning accounts have data that span decades. An account opened 10 years back with financial information of the client could be outdated. Third party vendors that provide client data can be used to verify the accounts. Knowledge graph ecosystem(below) can be used to data mine the accounts leveraging the third party vendor data. The information has potential for additional revenue generation.

Case#2 - The other use case is to use Natural Language processing (NLP) to extract information from Trust legal Contract and ensure compliance using Knowledge graph.

Amazon has their version of graph database in the cloud (


Business Rules Management System (BRMS)

BRMS are expert systems designed to manage your rules and business logic. Instead of keeping your rules—the guidelines by which you make critical business decisions—in the form of application code in software systems, you have abstracted that logic and put it into a system specifically designed for managing rules with NO CODE.

With a BRMS, you are building, modeling and testing your rules in this separate system from start to finish. This allows you to concentrate on the differentiating features and user experiences of your applications without worrying about the logic in the code.

The Bank I am currently working uses Corticon from Progress as the Business Rule Engine. We develop business rules and deploy the end points as Restful API’s.

By adopting a BRMS, Business and IT teams can collaborate regarding business rule changes and shorten development time. Business people can be brought into the mix as they can now take on the responsibility of directly updating the business logic. This will in turn require less IT resource because your business analysts will not have to wait for development teams to update your current rules.

BRMS tools like Corticon can support millions of transactions, all of which can be configured using a spreadsheet-like modeling tool that just about any competent business analyst could master.

The value proposition of BRMS is two-fold. First, your business can write down all of its business rules in one place, and business experts can review them and make sure they make sense. Second, because BRMS is a self-contained system, separate from the main software application, people in the organization do not have to be programmers to write and update business rules, or to apply them to the functioning of the main application.


Collateralized Loan Obligation

CLO is a structured financial product that pools together cash flow-generating assets and repackages this asset pool into discrete tranches that can be sold to investors.  The tranches in a CDO vary substantially in their risk profile. The senior tranches are relatively safer because they have first priority on the collateral in the event of default. As a result, the senior tranches of a CDO generally have a higher credit rating and offer lower coupon rates than the junior tranches, which offer higher coupon rates to compensate for their higher default risk.


The total assets always exceed the liability issues, sometimes by over 30% to 50%.  This allows for some of the collateral to default without the integrity of the deal structure being affected. 

A collateralized loan obligation (CLO) is structured with the asset pool primarily consisting of secondary bank loans, which serve as the debt obligations of the CLO.

A collateralized debt obligation (CDO) is structured with the asset pool primarily consisting of asset-backed securities (ABS) or residential mortgage-backed securities (RMBS), which serve as the debt obligation of the CDO (if the asset pool primarily consists of CDOs then this is called ‘CDO Squared’ or a ‘CDO of CDOs’)


The CDO/CLO deals are issued with a set of rules (compliance rules, payment rules, business rules, rating notching rules, etc.), guidelines, roles & dates (determination period/dates, payment dates, CDO/CLO Lifecycle dates, etc.). The deal’s indenture is considered as the ‘bible’ to the deal and has final say as to all activities within the deal (ex - how and when to start paying down the liabilities).

Trade Flow

To make changes to the deal’s collateral inventory the collateral manager proposes trades to the deal to see what the effect on compliance will be.  This is called hypothetical or what-if trading.  A hypothetical trade can be a single trade (buy or sell one security) or a combination of trades (buy or sell multiple securities). The collateral manager informs the trustee which proposed scenarios to actually trade. When the trustee receives the trading scenario they re-run compliance to make sure the scenario still fits within the deal. Once compliance is satisfied the trade is settled.

More details -



What is blockchain?

Blockchain is a way of organizing data so that transactions can be verified and recorded through the consensus of all parties involved.  The system is founded on the concept of an authoritative ledger that records events. 

While current data systems hold that ledger in a single, centralized location, a blockchain requires each individual participant – or node – to hold a copy of the record.  Any potential changes to the record must be compared against each and every node before being approved, which strengthens security and reduces the likelihood of unauthorized changes.

For example, let’s say that Joe is being tried in court for stealing Pauline’s purse.  Typically, a court reporter would be employed to type out everything that Joe says into a court record.  If Joe admits to the crime, this transcript of the proceedings will function as irrefutable proof that he confessed.   This document is kept in the court’s offices, under lock and key.

But Joe’s friend Bob wants to help Joe out by eliminating the record of his crime.  Bob steals a copy of the key, opens the court’s office, erases Joe’s confession, and leaves. 

Since there’s no other official record of Joe’s admission, there court cannot prove that Joe ever admitted to stealing the purse.

This is how the current generation of data repositories work.  Bob wasn’t authorized by the court to have a key to the centralized location where the data lived, yet he gained access anyway and used it improperly.  Joe’s crime has been erased from history, as far as the legal system is concerned.

Now, let’s say Joe is up to his old tricks again, and goes out on the street to snatch Jessica’s purse.  But this time, there are ten bystanders with smartphones, and each bystander independently records the deed.  Now there are ten records of what Joe has done, and no easy way for Bob to change every single bystander’s video in exactly the same manner to make it look like Joe is innocent.

The only way the record of this event could be changed is if all the bystanders come together and decide, as a group, to alter each copy of the video in precisely the same way.  That is unlikely to happen.

If all the bystanders agree as a collective system that the data should continue exist in its current format, the event becomes locked in time as a fact that has happened.  It is an entry in the ledger: a fixed point that cannot be changed. 

Once the chapter has been closed on this event, it is known as a “block.”  The block is redistributed to each node, which re-validates the change and agrees to edit its version of the ledger to include this new event.

Every subsequent potential change to the data must be validated against this block before it can be added to the “chain” of events.  This is accomplished through sophisticated matching algorithms that verify the party’s right to access or alter the data.

When it comes to financial matters, or a virtual currency like Bitcoin, it’s easy to see why this is important.  Every participant in a transaction must agree that a payment took place at a certain time for a certain amount, or the transaction is void. 

All transactions depend on the ones that went before it.  If Joe empties his account and has a $0 balance, he cannot then write a $20 check to Bob from that same account an hour later.  Neither Bob, Bob’s bank, nor Joe’s bank will agree that the transaction is valid once the check bounces.




Wirehouse vs RIA vs Broker Dealer


A Wirehouse is a large integrated broker with a national, as opposed to regional business.  Wirehouses once collected trade orders from branch offices in distant cities using dedicated telegraph lines, hence the term wirehouse .

Wirehouse now typically refer to full-service brokerages that offer investment advice, trading services and research all under one roof..

Examples of wirehouses include Morgan Stanley Smith Barney, Bank of America Merrill Lynch, Wells Fargo Advisors, UBS Wealth Management and Charles Schwab.

RIA (Registered Investment Adviser) and Broker-Dealers

All financial advisors fall into one of two broad categories: Registered Investment Advisors (RIAs) and broker-dealers. RIAs are fiduciaries, while broker-dealers aren’t.

Although it sounds like an individual job title, a Registered Investment Adviser (RIA) refers to a firm that is registered with the Securities and Exchange Commission (SEC) or a state’s securities agency. Now, an individual who works for a RIA is an Investment Advisor Representative (IAR).

Broker-dealers are held to what’s referred to as a suitability standard when offering financial and investment advice, rather than a fiduciary standard. This means that their advice must be “suitable” for the client’s needs at that particular time. The suitability standard is less stringent than the fiduciary standard in terms of the advisor’s obligation to make recommendations that are in the client’s best interest.

In addition to the fiduciary obligation, the other main difference between an RIA and a broker-dealer is in the way they are compensated. RIAs either charge their clients a percentage of assets under management or a fixed or hourly fee. Broker-dealers, in contrast, receive most of their compensation through commissions based on the investment products they recommend and sell.


WealthTech - Industry Trends

While the largest RIA custodians - SEI, Charles Schwab, TD Ameritrade, Interactive brokers and Fidelity - dwarf the smaller custodians by size, smaller custodians are targeting new market opportunities with specialized offerings such as alternative investments or targeting specific markets such as start-up RIAs.


Read More

Microservices Have Macro Effect

Microservices are not a type of technology, but rather an approach to IT architecture. By using a suite of tools like APIs, containers and cloud, microservices break applications into simple, discrete services.

APIs are at the heart of technology based partnerships, which is why microservices are so critical to any business looking to build partnerships at scale.


Read More

Hyperledger Project - Fabric, Sawtooth Lake. What's all this?

The Hyperledger Project is managed by the Linux Foundation and NOT by IBM – which is key because the Project is meant to incubate more than just IBM’s offering. IBM initially contributed what was then called ‘Open Blockchain’ and is now called ‘Fabric’, and arguably that is the biggest / highest profile project.


Read More

Dimensional modelling Vs Corporate Information Factory

In this column, we’ll clarify the similarities and differences between the two dominant approaches to enterprise warehousing. The first approach been Kimball Dimensional Modelling and the second been Corporate Information Factory.

#Data Design

Read More

Data Warehouse Dining Experience

Data warehouses should have an area that focuses exclusively on data staging and extract, transform, and load (ETL) activities. A separate layer of the warehouse environment should be optimized for presentation of the data to the business constituencies and application developers.

This division is underscored if you consider the similarities between a data warehouse and restaurant.

The Kitchen
The kitchen of a fine restaurant is a world unto itself. It’s where the magic happens. Talented chefs take raw materials and transform them into appetizing, delicious multi-course meals for the restaurant’s diners. But long before a commercial kitchen is put into productive use, a significant amount of planning goes into designing the layout and components of the workspace.

The restaurant’s kitchen is organized with several design goals in mind. First, the layout must be highly efficient. Restaurant managers are very concerned about kitchen throughput. When the restaurant is packed and everyone is hungry, you don’t have time for wasted movement.

Delivering consistent quality from the restaurant’s kitchen is the second goal. The establishment is doomed if the plates coming out of the kitchen repeatedly fail to meet expectations. A restaurant’s reputation is built on legions of hard work; that effort is for naught if the result is inconsistent. In order to achieve reliable consistency, chefs create their special sauces once in the kitchen, rather than sending ingredients out to the table where variations will inevitably occur.

The kitchen’s output, the meals delivered to their customers, must also be of high integrity. You wouldn’t want someone to get food poisoning from dining at your restaurant. Consequently, kitchens are designed with integrity in mind. Salad prep doesn’t happen on the same surfaces where raw chicken is handled.

Just as quality, consistency, and integrity are major considerations when designing the kitchen layout, they are also ongoing concerns for everyday management of the restaurant. Chefs strive to obtain the best raw material possible. Procured products must meet quality standards. For example, if the produce purveyor tries to unload brown, wilted lettuce or bruised tomatoes, the materials are rejected, as they don’t meet minimum standards. Most fine restaurants modify their menus based on the availability of quality inputs.

The restaurant kitchen is staffed with skilled professionals wielding the tools of their trade. Cooks manipulate razor sharp knives with incredible confidence and ease. They operate powerful equipment and work around extremely hot surfaces without incident.

Given the dangerous surroundings, the kitchen is off-limits to patrons. It simply isn’t safe. Professional cooks handling sharp knives shouldn’t be distracted by diners’ inquiries. You also wouldn’t want patrons entering the kitchen to dip their fingers into a sauce to see whether they want to order an entree or not. To prevent these intrusions, most restaurants have a closed door that separates the kitchen from the area where diners are served.

Even restaurants that boast an open kitchen format typically have a barrier, such as a partial wall of glass, separating the two environments. Diners are invited to watch, but can’t wander into the kitchen themselves. But while part of kitchen may be visible, there are always out-of-view back rooms where the less visually desirable preparation work is performed.

The data warehouse’s staging area is very similar to the restaurant’s kitchen. The staging area is where source data is magically transformed into meaningful, presentable information. The staging area must be laid out and architected long before any data is extracted from the source. Like the kitchen, the staging area is designed to ensure throughput. It must transform raw source data into the target model efficiently, minimizing unnecessary movement if possible.

Obviously, the data warehouse staging area is also highly concerned about data quality, integrity, and consistency. Incoming data is checked for reasonable quality as it enters the staging area. Conditions are continually monitored to ensure staging outputs are of high integrity. Business rules to consistently derive value-add metrics and attributes are applied once by skilled professionals in the staging area, rather than relying on each patron to develop them independently. Yes, that puts extra burden on the data staging team, but it’s done is the spirit of delivering a better, more consistent product to the data warehouse patrons.

Finally, the data warehouse’s staging area should be off-limits to the business users and reporting/delivery application developers. Just as you don’t want restaurant patrons wandering into the kitchen and potentially consuming semi-cooked food, you don’t want busy data staging professionals distracted by unpredictable inquiries from data warehouse users. The consequences might be highly unpleasant if users dip their fingers into interim staging pots while data preparation is still in process. As with the restaurant kitchen, activities occur in the staging area that just shouldn’t be visible to the data warehouse patrons. Once the data is ready and quality checked for user consumption, it’s brought through the doorway into the warehouse’s presentation area. Who knows, if you do a great job, perhaps you’ll become a data warehouse celebrity chef a la Emeril Lagasse or Wolfgang Puck.

The Dining Room
Let’s turn our attention to the restaurant’s dining room. What are the key factors that differentiate restaurants? According to the popular Zagat Surveys, restaurants around the world are rated on four distinct qualities:

  • Food (quality, taste, and presentation)

  • Decor (appealing, comfortable surroundings for the restaurant patrons)

  • Service (prompt food delivery, attentive support staff, and food received as ordered)

  • Cost.

Most Zagat Survey readers focus initially on the food score when they’re evaluating dining options. First and foremost, does the restaurant serve good food? That’s the restaurant’s primary deliverable. However, the decor, service, and cost factors also affect the patrons’ overall dining experience and are considerations when evaluating whether to eat at a restaurant or not.

Of course, the primary deliverable from the data warehouse kitchen is the data in the presentation area. What data is available? Like the restaurant, the data warehouse provides “menus” to describe what’s available via metadata, published reports, and parameterized analytic applications.

Is the data of high quality? Data warehouse patrons expect consistency and quality. The presentation area’s data must be properly prepared and safe to consume.

In terms of decor, the presentation area should be organized for the comfort of its patrons. It must be designed based on the preferences expressed by the data warehouse diners, not the staging staff. Service is also critical in the data warehouse. Data must be delivered, as ordered, promptly in a form that is appealing to the business user or reporting/delivery application developer. Finally, cost is a factor for the data warehouse. The data warehouse kitchen staff may be dreaming up elaborate, albeit expensive meals, but if there’s no market at that price point, the restaurant won’t survive.

If restaurant diners are pleased with their dining experience, then everything is rosy for the restaurant manager. The dining room is always busy; there’s even a waiting list on some nights. The restaurant manager’s performance metrics are all promising: high numbers of diners, table turnovers, and nightly revenue and profit, while staff turnover is low. Things look so good that the restaurant’s owner is considering an expansion site to handle the traffic. On the other hand, if the restaurant’s diners aren’t happy, then things go south in a hurry. With a limited number of patrons, the restaurant isn’t making enough money to cover its expenses (and the staff isn’t making any tips). In a relatively short time period, the restaurant shuts down.

Restaurant managers often proactively check on their diners’ satisfaction with the food and dining experience. If a patron is unhappy, they take immediate action to rectify the situation. Similarly, data warehouse managers should proactively monitor data warehouse satisfaction. You can’t afford to wait to hear complaints. Often, people will abandon a restaurant without even voicing their concerns. Over time, you’ll notice that diner counts have dropped, but may not even know why.

Inevitably, the prior patrons of the data warehouse will locate another “restaurant” that better suits their needs and preferences, wasting the millions of dollars invested to design, build, and staff the data warehouse. Of course, you can prevent this not-so-happy ending by being an excellent, proactive restaurant manager. Make sure the kitchen is properly organized and utilized to deliver as needed on the presentation area’s food, decor, service, and cost.


Spark SQL, Hold the Hadoop

THE APACHE SPARK processing engine is often paired with Hadoop, helping users to accelerate analysis of datastored in the Hadoop Distributed File System. But Spark can also be used as a standalone big data platform.That’s the case at online marketing and advertising services provider Sellpoints Inc.—and it likely wouldn’t be possible without the technology’s Spark SQL module.

Sellpoints initially used a combination of Hadoop and Spark running in the cloud to process data on the Web activities of consumers for analysis by its business intelligence (BI) and data science teams. But in early 2015, the Emeryville, Calif., company converted to a Spark system from Databricks, also cloud-based, to streamline its architecture and reduce technical support issues. Benny Blum,vice president of product and data at 

Sellpoints, said the analysts there use a mix of Spark SQL and the Scala programming language to set up extract, transform and load (ETL) processes for turning the raw data into usable information.

The BI team in particular leans heavily on Spark SQL since it doesn’t require the same level of technical skills as Scala does—some BI analysts do all of their ETL programming with the SQL technology, according to Blum.

“Spark SQL is really an enabler for someone who’s less technical to work with Spark,” he explained. “If we didn’t have it, a platform like Databricks wouldn’t be as viable for our organization, because we’d have a lot more reliance on the data science and engineering teams to do all of the work.”

Sellpoints collects hundreds of millions of data points from Web logs on a daily basis, amounting to a couple of terabytes per month. The raw data is streamed into an Amazon Simple Storage Service data store. It is then run through the extract, transform and load routines in Spark to convert it into more understandable metricsbased formats and to translate it for output to Tableau’s business intelligence software. The software is used to build reports and data visualizations for the company’s corporate clients.

Spark SQL isn’t a perfect match for standard SQL at this point. “There are certain commands that I expect to be there that aren’t there or may be there but under a different name,” Blum said. Despite such kinks, the technology is familiar enough to get the job done, he noted, adding, “If you know SQL, you can work with it.” 


IT Supplier Management

Spurred by technology advances and a lower cost delivery model, outsourcing of IT functions across various industries continues unabated.  Many IT resources find themselves assuming dual responsibilities for technical subject matter expertise and supplier management - with supplier management as an evolving new role.

Suppliers also find themselves taking on more end-to-end responsibilities in Managed Services Models, which include both service delivery and program management. In many cases, suppliers are placed in positions of control for end-to-end processes. As a natural consequence, many suppliers have more autonomy and are self-incentivized to drive process improvements.

Companies have increasing come to the realization that the success of an outsourcing program lies largely on their supplier management. But an inherent challenge exists. Closer relationship with suppliers is generally beneficial for day-to-day operations. Too close of a supplier relationship will affect objectivity and potentially, breeds ineffective supplier management. Fulfilling contract obligations or Key Performance Indicators (KPIs) may then be at risk.

There is certainly a growing awareness that good governance can make or break an outsourcing deal. Understanding and adequately enforcing key aspects of supplier governance will significantly reduce risks. Effective governance can stop value leakage and improve outcomes. Supplier management has become increasingly important although many organizations continue to under-invest in this critical activity.

Involving the Procurement group in this role can be a solution to this challenge.  The CIO can be well served with a strong, capable and independent Procurement partner to play the “bad guy” role in supplier management. In particular Procurement can help to manage supplier risk across the enterprise since then are working across the organization with other categories such as HR, marketing, legal, operations and manufacturing,

The Procurement group can rise to the challenge by transforming itself in three key areas.

Recruit, develop and retain the right people

  • Recruit business oriented people with strong soft skills who can operate across multiple disciplines and develop strong relationship with IT stakeholders.

    1. Implement program to train existing staff with the right skills to be able to operate in the new collaborative environment. Training needs to include supplier management playbook training, analytical, business case development and soft-skill training e.g. presentation, facilitation, conflict management, emotional intelligence.

    2. Develop good career path through supplier management track and interesting strategic opportunities to retain and motivate Procurement staff.

Develop and implement strong processes, including risk management

  • Develop a strong governance model with defined roles and responsibilities between the firm’s management structure and suppliers. Escalation paths should be clearly defined with specific contacts documented.

  • Drive the agendas for Quarterly Business Reviews as opposed to relying on supplier-led status updates. Ensure that performance issues are raised and addressed. QBR (too often I have to sit through QBR where supplier reported all the good news and the status where all green, while I know there had been performance issues with those particular suppliers)

  • Segment suppliers into Tier 1 and Tier 2 to provide appropriate focus on the critical suppliers. In most cases, because management of Tier 1 suppliers requires significantly more effort, Tier 1 should be limited to 5-10 suppliers.

    1. Implement a periodic supplier auditing process. Annual site visits to Tier 1 suppliers should be included to ensure compliance.

    2. Ensure there are sufficient contracting languages to protect the firm’s data and systems. Also, a robust business continuity plan should be in place.

    3. Evaluate and manage supplier risk at the Statement of Work, order or contract amendment level i.e. evaluate supplier risk by each SOW and not just at the supplier level. Since there will be examples the you can unknowingly giving high risk work to low risk suppliers, therefore inadvertently making them high risk without the appropriate evaluation

  • Manage financial risk is an area that Procurement had demonstrated successes: My procurement teams have taken over the management of invoices for selected suppliers (telecom, data, software, hardware etc…) on behalf of IT department. The procurement teams have been able to apply their knowledge to the contracts to make sure invoices are accurate. Historically we have discovered inaccurate and duplicate invoice, from my experiences there were some suppliers have error rate in the 40-50% range. For this service instead of asking senior service manager (manager/director level) to review 500-1000 pages of invoices against 10 different contracts per month, we relied on a handful analyst to review and validate invoices against inventories and performance levels.

    • Develop supplier management playbook and train the Procurement team to monitor consistent high quality and service delivery against the playbook.

    • Assess current Procurement services and determine service realignment necessary to meet the needs of the CIO and IT. For example, expand capacity without increasing head count by reducing “bad busy” work and increasing “good busy” work.

Leverage on Procurement technology

Procurement also needs enabling technology. For examples:

  • Procure-to-Pay process, e.g. Ariba or Coupa, can be used to efficiently match purchase orders with invoices.

  • Risk management tools, e.g. Hiperos or Bravo, can manage risk at the transaction level.

As IT evolves to become a major client of the Procurement area, Procurement must be ready to transform itself to become a supportive IT partner. Procurement will need to realign its services to provide the available bandwidth and produce consistent and high quality service level to ensure a trusted relationship is achieved. Procurement will need to develop stronger capabilities in the supplier management area to support the expanding needs of IT. This will require strong sponsorship from the CIO and CPO to ensure a comprehensive implementation. The CIO and the CPO play pivotal roles to closely collaborate to cement the relationship between their respective IT and Procurement areas.

Credit : This article first appeared in CIO Review magazine.


Programming Language Rankings: January 2017

Credit : RedMonk

1 JavaScript
2 Java
3 Python
5 C#
5 C++
7 Ruby
9 C
10 Objective-C
11 Scala
11 Shell
11 Swift
14 R
15 Go
15 Perl
17 TypeScript
18 PowerShell
19 Haskell
20 Clojure
20 CoffeeScript
20 Lua
20 Matlab


Agile development and Scrum

Somebody asked me the What's the difference between Agile and Scrum. Let's first understand the basics of Agile and scrum.

What is Agile development?

  • Agile software development is a methodology that is followed to overcome issues associated with the traditional waterfall development.

  • In Agile development, you follow an iterative approach where the entire software project is completed in iterative phases. Each iteration delivers an incremental working version of the application.

  • Users continually evaluate each working version / iteration of the product, and provide feedback to the development team which the developers incorporate into subsequent versions of the product.

  • This approach provides the opportunity to account for changing business realities and also minimize large scale project/product-failure risk that can sometimes happen when using the Waterfall approach to product development.

  • Agile still uses some of the keys steps associated with Waterfall development in each iteration i.e. analyze, build, and test.

What is the business context for Agile Development?

The Agile methodology is typically used in scenarios where:

  • The requirements or details of outcomes are not very clear at the outset.

  • Business needs are changing rapidly and /or are continually evolving.

  • Where testing the feasibility of an available technology to solve a problem is critical

  • Where funds to develop the product may be made available incrementally based on the proven feasibility of the product.

  • Where bringing a version of the product to market as soon as possible is more critical than having all the bells and whistles.

How does Scrum fit into Agile Development?

While the Agile methodology can be applied to product development not only in the software industry but in other industries as well, Scrum is specific to software development.

Scrum is not a methodology. It simply provides structure, discipline and a framework for Agile development. The whole project is made up of a series of Sprints or Sprint Cycles (1 to n) where each Sprint is of the same duration. If ‘time’ is denoted by T, then T1 = T2 = T3 =… Tn. Sprints could be anywhere between 2 to 4 weeks. Sprints shorter than 2 weeks are not ideal and are used less frequently. At the end of each Sprint, a functional / working piece of software is produced that the users can actually test.

The diagram below illustrates the use of Scrum in Agile development.

Write here...

Key characteristics of Scrum

  • Scrum does not have a detailed series of steps or guidelines instructing how you go about software development.

  • Scrum does not have team leaders or sub-teams; it is just one cross-functional team where everyone is involved with the project from start to finish. The team includes Users, Stakeholders, Developers, the Scrum Master, the Product Owner.

  • There are 3 roles in Scrum – the Scrum Master, the Product Owner, the Team – note the conspicuous absence of a Project Manager.

  • Self-organizing teams are vital to Scrum i.e. people who are creative and disciplined at the same time and who do not need to be ‘managed’.

  • Scrum is a series of sprints. Each sprint can last anywhere from 2 to 4 weeks (recommended). A sprint is a complete mini-software development cycle (analyze, build, and test phase) . At the end of each sprint, the customer gets a working version of the product with new/additional functionality as compared to the previous sprint. The customer / users test the product and provide feedback to the team.

  • The developers and other technical staff in the team are usually highly experienced and understand both technology and business.

Key Steps in Scrum

The key steps in a Sprint a.k.a. the single basic unit/cycle of development in Scrum are:

STEP 1: Sprint Planning Session

  • Product Owner tells the Team what s/he wants in the Sprint. S/he picks these items from the Product Backlog.

  • The Product Backlog is a list of high level requirements.

  • Team decides what they can commit to.

  • Committed items become the Sprint Backlog – this is frozen and cannot change during the Sprint.

STEP 2: Sprint

  • The period when the team works on building the features identified in the Sprint Backlog for that sprint.

  • Daily Scrum sessions are held to review issues / problems / roadblocks / progress / commitments.

  • The Sprint Burn Down chart is updated each day to show progress / completion of items. The team reviews this daily to understand where and how effort needs to be expended.

  • The Sprint must end on time whether all items/tasks are completed or not.

STEP 3: Product Release (Incremental Release)

  • Release a working version of the product with the features committed to as part of the Sprint.

STEP 4: Sprint Review Meeting

  • Review work completed and not completed versus committed items for the Sprint.

  • Present work to stakeholders as a demo.

STEP 5: Sprint Retrospective

  • Team members review the sprint.

  • Discuss what worked well and what needs improvement.

  • Basically a self corrective session (lessons learned) to incorporate process improvements in preparation for the next Sprint.

So we can say Agile is just a philosophy and Scrum is an implementation of that philosophy.


What we mean by “data lake” and DW “augmentation”

When we say “data lake,” we’re referring to a centralized repository, typically in Hadoop, for large volumes of raw data of any type from multiple sources. It’s an environment where data can be transformed, cleaned and manipulated by data scientists and business users. A “managed” data lake is one that uses a data lake management platform to manage ingestion, apply metadata and enable data governance so that you know what’s in the lake and can use the data with confidence.

“Augmentation” means enhancing what you already have, not starting from scratch. With a data warehouse augmentation, you keep your data warehouse and existing BI tools, but add a complementary, integrated data lake. For example, you could use a complementary data lake to prepare datasets and then feed them back into a traditional data warehouse for business intelligence analysis, or to other visualization tools for data science, data discovery, analytics, predictive modeling and reporting.

Why DW augmentation? 

We find that companies typically consider a DW augmentation project for two scenarios: 

Blue sky – You want to be able to do new things, beyond the capabilities of the data warehouse. This could include supporting specific business use cases for more advanced big data analytics or data science to find new insights or generate revenue; for example, with new products and services or through improved, more personalized customer experience. 

Cut costs – You want to continue doing what you’re already doing with your data warehouse, but do it cheaper using commodity hardware.

Reference Architecture

What could a data warehouse augmentation look like in your environment? Let’s review some architecture diagrams.

The differences between a traditional data warehouse architecture and data lakes are significant. An DW is fed data from a broad variety of enterprise applications. Naturally, each application’s data has its own schema, requiring the data to be transformed to conform to the DW’s own predetermined schema. Designed to collect only data that is controlled for quality and conforming to an enterprise data model, the DW is capable of answering only a limited number of questions. Further, storing and processing all data in the data warehouse is cost prohibitive.

Typically an organization will augment a data warehouse with a data lake in order to enjoy a reduction in storage costs. The data lake is fed information in its native form and little or no processing is performed for adapting the structure to an enterprise schema. The data can be stored on commodity hardware, rather than expensive proprietary hardware. Required data can be pulled from the lake to leverage in the data warehouse. While this model provides significant cost savings it does not take advantage of the strategic business improvements that a data lake can provide. 

Reference Diagram #3: Data Warehouse Augmentation and Offload Diagram

The biggest advantage of data lakes is flexibility. By allowing the data to remain in its native format, a far greater stream of data is available for analysis. When an organization enlists the data lake to offload expensive data processing, in addition to storage, the entire business can benefit from more timely access to more data.




DML Error Logging

Have you ever tried to update 30 million records, only to have the update fail after twenty minutes because one record in 30 million fails a check constraint? Or, how about an insert-as-select that fails on row 999 of 1000 because one column value is too large? With DML error logging, adding one clause to your insert statement would cause the 999 correct records to be inserted successfully, and the one bad record to be written out to a table for you to resolve.

In the past, the only way around this problem was to process each row individually, preferably with a bulk operation using FORALLand the SAVE EXCEPTIONS clause


The syntax for the error logging clause is the same for INSERT, UPDATE, MERGE and DELETE statements.

LOG ERRORS [INTO [schema.]table] [('simple_expression')] [REJECT LIMIT integer|UNLIMITED]

The optional INTO clause allows you to specify the name of the error logging table. If you omit this clause, the first 25 characters of the base table name are used along with the "ERR$_" prefix.

The REJECT LIMIT is used to specify the maximum number of errors before the statement fails. The default value is 0 and the maximum values is the keyword UNLIMITED. For parallel DML operations, the reject limit is applied to each parallel server.


-- Create a destination table.
  id           NUMBER(10)    NOT NULL,
  code         VARCHAR2(10)  NOT NULL,
  description  VARCHAR2(50),
-- Create the error logging table.
  DBMS_ERRLOG.create_error_log (dml_table_name => 'dest');

PL/SQL procedure successfully completed.
The error table gets created with the name that matches the first 25 characters of the base table with the "ERR$_" prefix.

SQL> DESC err$_dest
 Name                              Null?    Type
 --------------------------------- -------- --------------
 ORA_ERR_NUMBER$                            NUMBER
 ORA_ERR_MESG$                              VARCHAR2(2000)
 ORA_ERR_ROWID$                             ROWID
 ORA_ERR_OPTYP$                             VARCHAR2(2)
 ORA_ERR_TAG$                               VARCHAR2(2000)
 ID                                         VARCHAR2(4000)
 CODE                                       VARCHAR2(4000)
 DESCRIPTION                                VARCHAR2(4000)


FROM source;

ERROR at line 2:
ORA-01400: cannot insert NULL into ("TEST"."DEST"."CODE")


The failure causes the whole insert to roll back, regardless of how many rows were inserted successfully. Adding the DML error logging clause allows us to complete the insert of the valid rows.

l_unique_number := i_batch_id || i_chunk_id; ( We will create a unique ID to query the errors associated with the below insert. The Unique ID is stored in the ora_err_tag$ column

FROM source
LOG ERRORS INTO err$_dest (l_unique_number) REJECT LIMIT UNLIMITED;

99998 rows created.


The rows that failed during the insert are stored in the ERR$_DEST table, along with the reason for the failure.

COLUMN ora_err_mesg$ FORMAT A70
SELECT ora_err_number$, ora_err_mesg$
FROM err$_dest
WHERE ora_err_tag$ = l_unique_number;

--------------- ---------------------------------------------------------
 1400 ORA-01400: cannot insert NULL into ("TEST"."DEST"."CODE")
 1400 ORA-01400: cannot insert NULL into ("TEST"."DEST"."CODE")

2 rows selected.

We can use the unique number to query the error log table and decide whether to rollback or COMMIT.

The same can be dome with Update, delete and Merge

USING source b
ON ( =
UPDATE SET a.code= b.code,
 a.description = b.description
INSERT (id, code, description)
VALUES (, b.code, b.description)

99998 rows merged.

DML Error Logging Handles:

  1. Too-large column values  (Except for LONG, LOB, or object type columns)
  2. Constraint violations (NOT NULL, unique, referential, and check constraints)
    1. Except for:
      1. Violated deferred constraints
      2. Any direct-path INSERT or MERGE operation that raises a unique constraint or index violation
      3. Any UPDATE or MERGE operation that raises a unique constraint or index violation
      4. Violation of a constraint on a LONG, LOB, or object type column
  3. Trigger execution errors 
  4. Type conversion errors arising from type conversion between a column in a subquery and the corresponding column of the table
  5. Partition mapping errors 
  6. A specific MERGE operation error (ORA-30926: Unable to get a stable set of rows)
1 Comment

The Secrets of Oracle Row Chaining and Migration

The following was published by Martin Zahn and examples are from Tom Kytes website.


If you notice poor performance in your Oracle database Row Chaining and Migration may be one of several reasons, but we can prevent some of them by properly designing and/or diagnosing the database.

Row Migration & Row Chaining are two potential problems that can be prevented. By suitably diagnosing, we can improve database performance. The main considerations are:

  • What is Row Migration & Row Chaining ?
  • How to identify Row Migration & Row Chaining ?
  • How to avoid Row Migration & Row Chaining ?

Migrated rows affect OLTP systems which use indexed reads to read singleton rows. In the worst case, you can add an extra I/O to all reads which would be really bad. Truly chained rows affect index reads and full table scans.

Oracle Block

The Operating System Block size is the minimum unit of operation (read /write) by the OS and is a property of the OS file system. While creating an Oracle database we have to choose the «Data Base Block Size» as a multiple of the Operating System Block size. The minimum unit of operation (read /write) by the Oracle database would be this«Oracle block», and not the OS block. Once set, the «Data Base Block Size» cannot be changed during the life of the database (except in case of Oracle 9i). To decide on a suitable block size for the database, we take into consideration factors like the size of the database and the concurrent number of transactions expected.

The database block has the following structure (within the whole database structure)


Header contains the general information about the data i.e. block address, and type of segments (table, index etc). It Also contains the information about table and the actual row (address) which that holds the data.

Free Space

Space allocated for future update/insert operations. Generally affected by the values of PCTFREE and PCTUSEDparameters.


 Actual row data.


While creating / altering any table/index, Oracle used two storage parameters for space control.

  • PCTFREE - The percentage of space reserved for future update of existing data.
  •  PCTUSED - The percentage of minimum space used for insertion of new row data.This value determines when the block gets back into the FREELISTS structure.
  •  FREELIST - Structure where Oracle maintains a list of all free available blocks.

Oracle will first search for a free block in the FREELIST and then the data is inserted into that block. The availability of the block in the FREELIST is decided by the PCTFREE value. Initially an empty block will be listed in the FREELIST structure, and it will continue to remain there until the free space reaches the PCTFREE value.

When the free space reach the PCTFREE value the block is removed from the FREELIST, and it is re-listed in the FREELIST table when the volume of data in the block comes below the PCTUSED value.

Oracle use FREELIST to increase the performance. So for every insert operation, oracle needs to search for the free blocks only from the FREELIST structure instead of searching all blocks.

Row Migration

We will migrate a row when an update to that row would cause it to not fit on the block anymore (with all of the other data that exists there currently).  A migration means that the entire row will move and we just leave behind the «forwarding address». So, the original block just has the rowid of the new block and the entire row is moved.

Full Table Scans are not affected by migrated rows

The forwarding addresses are ignored. We know that as we continue the full scan, we'll eventually get to that row so we can ignore the forwarding address and just process the row when we get there.  Hence, in a full scan migrated rows don't cause us to really do any extra work -- they are meaningless.

Index Read will cause additional IO's on migrated rows

When we Index Read into a table, then a migrated row will cause additional IO's. That is because the index will tell us «goto file X, block Y, slot Z to find this row». But when we get there we find a message that says «well, really goto file A, block B, slot C to find this row». We have to do another IO (logical or physical) to find the row.

Row Chaining

A row is too large to fit into a single database block. For example, if you use a 4KB blocksize for your database, and you need to insert a row of 8KB into it, Oracle will use 3 blocks and store the row in pieces. Some conditions that will cause row chaining are: Tables whose rowsize exceeds the blocksize. Tables with LONG and LONG RAW columns are prone to having chained rows. Tables with more then 255 columns will have chained rows as Oracle break wide tables up into pieces. So, instead of just having a forwarding address on one block and the data on another we have data on two or more blocks.

Chained rows affect us differently. Here, it depends on the data we need. If we had a row with two columns that was spread over two blocks, the query:

SELECT column1 FROM table

where column1 is in Block 1, would not cause any «table fetch continued row». It would not actually have to get column2, it would not follow the chained row all of the way out. On the other hand, if we ask for:

SELECT column2 FROM table

and column2 is in Block 2 due to row chaining, then you would in fact see a «table fetch continued row»


Migrated rows affect OLTP systems which use indexed reads to read singleton rows. In the worst case, you can add an extra I/O to all reads which would be really bad. Truly chained rows affect index reads and full table scans.

  • Row migration is typically caused by UPDATE operation

  • Row chaining is typically caused by INSERT operation.

  • SQL statements which are creating/querying these chained/migrated rows will degrade the performance due to more I/O work.

  • To diagnose chained/migrated rows use ANALYZE command , query V$SYSSTAT view

  • To remove chained/migrated rows use higher PCTFREE using ALTER TABLE MOVE.



Original Post ::


Collections - Visual