Wednesday, April 23

Really Simple Integration

Companies are challenged these days. So are individuals. The vast amount of information available and being created every minute is growing so fast how can one leverage that into something meaningful?

Along comes aggregation. It's seen for blog/news feeds, like AllTop, for the thousands of "top bloggers".

But what about organizations and integrating their information needs? Along comes SnapLogic with an open source RESTful architecture integration app. I spoke with Chris Marino, CEO, and John Bennett, Director of Marketing. They explained how they empower organizations through self-service... to "loosely couple a federation of systems" for enterprise mashups of data.

Here's a partial quote from their press release today:

"Really Simple Integration is a new approach to data integration" that "enables enterprises to quickly and easily make core IT data from data warehouses, Master Data Management data marts, SaaS apps, SOA Web Services and other sources."

Be sure to check them out if you're attending O'Reilly's Web 2.0 Expo in San Francisco this week. In Web 2.0/open source fashion, they have started a publicly available collection of free components, including a screencast showing a mashup of LinkedIn and

So where does this put Business Intelligence and more specifically ETL tools and the static nature of data warehouses. I think this is another step towards the end of the ETL era as we know it. Products like SnapLogic provide transformation functionality but have access to more than internal data sources. Ever heard of an ETL tool able to scrap data off a public website to merge with your sales data?

Business Intelligence has been typically limited to using an organization's internal databases, such as finance, CRM, marketing, and sales. But we're in the age of the Internet now (actually we've been here for quite some time) and to be competitive you need access to the vast amounts of information from external sources of information, such as SaaS applications and information websites.

The small to mid-sized market is where SnapLogic is positioned today. Tiny companies can leverage the open source community, while IT staff of mid-sized companies would deploy SnapLogic for efficiencies. So once the data warehouse is built, users/departments will start asking for the data in different ways. (The static issue with a DW). Or merrying DW data with data not in the DW (and they shouldn't wait 18 months for additional DW implementation to address this need). IT departments can shine by having an easy-to-use tool like SnapLogic.

Sometimes there is no time to wait for the perfect enterprise dimensional model to be designed. Organizations are organic and need to stay competitive and ever changing to keep ahead. Access to information is key.

Tuesday, April 8

10 Questions for Miriam Tuerk

It has been said by many, including Gartner and Forrester, that the next big innovation for BI & DW will most likely come from the data warehouse side. Sure Visualization is a hot topic lately but the "pain points" for many clients are on the back-end.

After speaking with Miriam for a few minutes, she mentioned their tool produces results from "three billion rows of data and resolved queries in seconds". So she caught my attention. And when Miriam Tuerk, CEO of Infobright, mentions a client roster of the likes of the Royal Bank of Canada, Xerox, and TradeDoubler, you know they are onto something.

Our conversation continued as we discussed Infobright's innovative solution, which I'm sharing with you.

Question 1: Hi Miriam, let’s start with your statement “Research shows that the volume of the world's data approximately doubles every three years... 92% of new information is stored on hard-disks." Do you think Infobright can help organizations analyze data faster in a more flexible way?

Answer: We have proved it at our customer sites. Our customers have wrestled with the problem of how to extract valuable information from the huge volume of data they collect. They know, as we do, that being able to quickly access key information about their business or their customers can be the difference between business success and business failure. That is why recent studies confirm that Business Intelligence is the #1 investment area for CIO’s today. Infobright designed an analytic data warehouse solution from the ground up specifically designed to provide fast answers to ad hoc, complex analytic queries without burdening IT with lengthy, resource-intensive projects.

Question 2: Okay, let's get right to the heart of Infobright's business. Why do I need your DW solution when I know I can already build dimensional models, cubes, reports, etc?

Answer: Three reasons – time, money, and the unknown. Given enough time and money, IT can develop a system perfectly designed to answer any question quickly – as long as they know the question. In today’s changing business world, however, business people don’t know in advance all of the questions they will need answers to in the future. They want the answers today, but most systems require a lot of manual work on the part of IT and database administrators to set up and maintain environments each time the business users want to perform new and different analytics on their business. Today, providing fast access to massive amounts of data requires a lot of IT resources and time. Infobright’s solution eliminates all of that work by IT, and doesn’t require buying lots of servers and storage as other products do. Instead, we developed a simple but very powerful solution that provides business users access to all of the data they need to get fast answers to unpredictable questions.

Question 3: How did you and Infobright get started? Was it a grass-roots entrepreneurial effort?

Answer: Infobright was born out of pioneering work done by a group of internationally recognized mathematicians in the emerging science of Rough Set Mathematics. They realized that they could use information about the data itself to quickly provide answers to complex queries, rather than require IT to do extensive work up front or rely on brute force from massive amounts of hardware. Seeing the benefits of this approach, RBC Capital and Flybridge Capital Partners (formerly IDG Ventures) funded the company and brought in an experienced management team to turn raw technology into industry-leading products and services. Over the past year we have expanded the capabilities of our software while growing our customer base and establishing Infobright as an emerging player in the market. For example, Infobright is the first analytic data warehouse provider to be named a MySQL Certified Storage Engine Partner. The combination of Infobright’s solution and MySQL provides organizations an analytic data warehouse that delivers unprecedented scalability, performance and ease-of-use.

Question 4: Where do you envision the DW market going, especially with recent consolidation of BI vendors?

Answer: In the “old” days – the 1990s, which is really not so long ago – smaller volumes of data, smaller and less diverse sets of users, fewer subject areas and simpler queries, allowed vendors to recommend one data warehouse solution to meet all of the business needs of the users. Just like hardware, where once we only had only the mainframe, the market is evolving and maturing such that there is no longer a “one-stop shopping” solution to meet the BI needs of businesses today. There are really two different types of workload in a data warehouse:

  1. One requirement is where you have a lot of users running the same query over and over. An example would be if you had a customer data warehouse being used to support a call center for a cell phone company. Every time a customer calls in, the customer profile is pulled from the data warehouse. This is a repetitive OLTP-like query and for this a highly designed and engineered system, optimized and tuned for the specific and repetitive queries are the best solution.

  2. A second requirement for data warehousing is analytics. Here, marketing, finance, sales, compliance, risk management, operations groups in companies are performing ad hoc, changing and unknown queries such as:
  • “How did our 2007 Christmas sales campaign perform as compared to our 2006 campaign? Was the customer retention higher – did more of those customers buy the value-add services?” or

  • “Let’s do a trend analysis understanding why there are more mortgage defaults in this area than previously – lets run a trend analysis of the last 12 months versus the last five years. Can we identify any indicators that would allow us to re-estimate/extrapolate what the defaults will be thru the end of 2008?”
These parts of the business use the data warehouse to design marketing and sales campaigns, to understand what the risk, compliance and security issues may be – and use that to operate and manage the business. Today, IT needs to assign resources and do manual work in support of all of these queries. And every day the business has new or different queries, IT must do more work. Business users need to have a “Google-like” experience for this type of data warehouse workload. They need to be able to just run the query against the data warehouse without manual intervention of IT.

This is the really big, hidden story in BI, that the growing analytic requirement is causing IT to drown under the workload it requires. They need a way to make things simpler and really change how things are run. Some technology companies are delivering value by consolidating platforms and creating integrated solutions. We have chosen to focus on the analytic use case and deliver the only product in the market that effectively solves that problem.

Question 5: What are the benefits of Brighthouse for an organization or manager looking for information?

Answer: Brighthouse delivers fast response to ad hoc, complex analytical queries across a large volume of data. It does so without requiring IT to spend time and effort to create new schemas, create indices or partition data. It is also lowers total cost of ownership through industry-leading compression that significantly reduces the amount of storage needed to support all this data. Business users get the answers they need quickly, and IT can meet high service levels with minimal effort or cost.

Question 6: You mentioned the phrase, "use the intelligence of the data." Can you share what you mean by that?

Answer: When data is loaded into the Brighthouse system, it is tightly compressed and stored in “data packs.” The Knowledge Grid automatically creates a highly compact set of metadata, which stores information about the relationship between packs and statistical information about the contents.

When a query is initiated, Brighthouse searches the grid to intelligently decide which data packs, if any, are required to resolve the query. The Knowledge Grid is created on-the-fly, dramatically increasing data load times, and eliminating the need for specialized data partitioning and indexing.

Question 7: So, what about competitors, like BI appliances, database vendors, and such? Are you taking data warehousing one step further?

Answer: The fact that there have been new entrants into the market in recent years is a clear indication that current technologies do not meet the needs of the business today. They are also an indication of high demand across a very broad spectrum of new requirements. Traditional solutions are very expensive, take a lot of time to build, and in fact, are not well suited to support the analytic queries of the business. That is why IT struggles to keep up with the demand of the business users. Many of the newer products on the market are very good at what they are designed to do – provide very fast query performance to predictable queries – but are not designed for ad hoc, unpredictable complex queries. What’s more, they all require substantial work on the part of database administrators and IT to implement and maintain.

Infobright’s solution is markedly different – it is incredibly simple to implement and maintain. Rather than extract-transform-load data, our solution is Load and Go. No new schemas, no index creation, no data partitioning. Brighthouse is simple, powerful and extremely cost effective – the best solution if you need fast answers to evolving business questions.

Question 8: How can your Brighthouse product fit within an existing BI/DW solution? Or is it better to use your product at the beginning of designing a new BI system?

Answer: When we built our product strategy, this was a very important question for us. Organizations have invested millions into their existing data warehouses and BI infrastructure. Offering a solution that leverages those investments and works within that environment was key to us and is a big part of our go-forward product road map. Brighthouse is very well suited to be added to an existing data warehouse environment. Because of our “just load it and go” capability, you can re-use all of the data modeling, ETL, and BI reports that you have already built. Many of our customers have large data warehouses already, but they aren’t able to support business users requests for ad hoc queries due to its performance impact on other users or high cost. In that case, they’ll implement Brighthouse as a complementary warehouse to provide the services their business users need.

In other cases, Brighthouse is implemented as the sole data warehouse for the company.

Question 9: During our call, you mention several amazing results seen by clients. Care to share some of those with readers?

Answer: I’d be glad to. A good example is the use of our technology in support of online advertising. Companies that advertise online want to track how well marketing campaigns are attracting their target audiences as well as detailed ROI of these campaigns. The marketing analytics providers depend on being able to rapidly run complex, ad hoc queries against huge amounts of click stream data and provide this to their customers.

Using Brighthouse, one digital marketer found that it could load 3.2 billion rows of data at an average rate of more than 300,000 rows per second! Brighthouse also compressed all-important fact tables at a ratio of 40:1 – meaning that 40 GB of raw data resulted in only 1 GB of storage, leading to huge savings in storage costs as well as improved performance.

Another user—a company that manages major online customer loyalty and incentive programs—found that Brighthouse returned query results 15 times faster than an existing solution. Brighthouse also surpassed this solution’s ability to compress data, reducing the footprint of fact tables some 35 to 43:1.

Question 10: Excellent talking with you Miriam. Do you have any additional links or information about Infobright you want to share?

Answer: For your readers who are looking for more information about what is new in data warehousing, we have an excellent white paper on our web site written by Claudia Imhoff, a well known expert in the field. Those people interested in finding out more about Brighthouse can also find additional information on our web site at, or contact us at any time via email at