The Way from Data to Information

Data Mining

Subscribe to Data Mining: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Mining: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories

Quick quiz! What’s the first thing that comes to mind when you hear the following phrases? Artificial grass Artificial sweeteners Artificial flavors Artificial plants Artificial flowers Artificial diamonds and jewelry Artificial (fake) news These phrases probably evoke thoughts such as “fake,” “not real,” or even “shabby.” Artificial is such a harsh adjective. The word “artificial” is defined as “imitation; simulated; sham” with synonyms such as fake, false, mock, counterfeit, bogus, phony and factitious. The word “artificial” may not be the right term to use to describe “Artificial Intelligence,” because “artificial intelligence” is anything but fake, false, phony, or a sham. Maybe a better term is “Augmented Human Intelligence,” or a phrase that highlights both the importance of augmenting the human’s intelligence as well as to alleviate the fears that AI means ... (more)

In-Stream Processing | @CloudExpo @robinAKAroblimo #BigData #AI #BI #DX

Most of us have moved our web and e-commerce operations to the cloud, but we are still getting sales reports and other information we need to run our business long after the fact. We sell a hamburger on Tuesday, you might say, but don't know if we made money selling it until Friday. That's because we still rely on Batch processing, where we generate orders, reports, and other management-useful pieces of data when it's most convenient for the IT department to process them, rather than in real time. That was fine when horse-drawn wagons made our deliveries, but it is far too slow for today's world, where stock prices and other bits of information circle the world (literally) at the speed of light. It's time to move to In-Stream Processing. You can't - and shouldn't - keep putting it off. [Figure 1, courtesy of the Grid Dynamics Blog] This diagram may look complicate... (more)

The Real Time Infrastructure Ultimatum

Infrastructure 2.0 Journal For months the infrastructure 2.0 blog has talked about the automation of IT from a network perspective, including the automation of the network itself. While few may question the need for network automation most businesses today still run their networks like they ran their “supply chains” decades ago, before the network. This great irony is about to change. Here’s why: As virtualization entered the data center it became an accidental standard bearer for network automation. The power of virtualization helped to drive a cultural (including x as a service) shift in expectations, just as Nicholas Carr was declaring war on traditional “old world” IT with the help of Google, Amazon and a host of other cloud (and not so cloud) players. IT directors watched operations pros create VMs in seconds while network teams could take hours (or days) to si... (more)

Cloud vs Grid

IBM Session at Cloud Expo I’ve been getting a lot of email, and there have been some posts on this blog, regarding CEP based services in the cloud. So as we go down this road, I thought it would behoove us to examine what a cloud is and isn’t and what a grid is and isn’t. I found this introductory article from IBM.  It’s a good start.  I’ll be looking for some more articles – if you’ve got anything you’d like to add to this conversation, please feel free to contact me or add your comments below. Personally, my experience with grid comes from Capital Markets and calculating prices & greeks for derivatives. think that some of the people reading this blog might be mistaking what I’m talking about when I say, “Current CEP products don’t seem to be a good fit in the cloud.” think that those people with Capital Market’s experience think grid when I say cloud, and think ... (more)

Reconciling Big Data & Data Residency in the Cloud

I recently blogged about “big data” and the value that big data analytics can bring to companies using this type of business intelligence to garner insights and develop competitive advantages. Numerous industry experts have highlighted ways that the cloud is actually enabling a transformation toward data mining, pattern recognition, and predictive analytics to enhance executive decision-making. For example, Booz|Allen|Hamilton’s December 2011 report, Massive Data Analytics and the Cloud, asserts that data cloud-based intelligence analysis will have an unprecedented, long-lasting, and far-reaching impact on business strategy development. ... (more)

Internet of Things Maturity Model By @TonyShan | @ThingsExpo [#IoT]

Internet of Things (IoT) is booming. The “Software for the Internet of Things (IoT) Developer Survey” report, published by Embarcadero Technologies last month, shows that 77% of development teams will have IoT solutions in active development in 2015 with almost half (49%) of IoT developers anticipating their solutions will generate business impacts by the end of this year. IoT Maturity Model (IoTMM) is a qualitative method to gauge the growth and increasing impact of IoT capabilities in an IT environment from both business and technology perspectives. It comprises  a set of criteria, parameters and factors that can be used to describe and measure the effectiveness of the IoT adoption and implementation. Five levels of maturity are defined: Advanced, Dynamic, Optimized, Primitive, and Tentative (ADOPT). The definitions of these 5 levels are specified below: Level Desc... (more)

Citizen Data Scientist, Jumbo Shrimp | @CloudExpo @Schmarzo #BigData

Citizen Data Scientist, Jumbo Shrimp, and Other Descriptions That Make No Sense Okay, let me get this out there: I find the term “Citizen Data Scientist” confusing. Gartner defines a “citizen data scientist as “a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics.” While we teach business users to “think like a data scientist” in their ability to identify those variables and metrics that might be better predictors of performance, I do not expect that the business stakeholders are going to be able to create and generate analytic models. I do not believe, nor do I expect, that the business stakeholders are going to be proficient enough with tools like SAS or R or Python or Mahout or MADlib to 1) create or generate the models, and then 2) be profi... (more)

A Hybrid Data Pipeline | @CloudExpo @ProgressSW #BigData #AI #DataLake

Building a Hybrid Data Pipeline for Salesforce and Hadoop My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sumit Sarkar, Product Marketing Engineer at Progress, will discuss how the key to delivering on this was using standard interfaces using a bi-directional data pipeline to connect the systems. On the Salesforce side, we were able to get frictionless access to the data lake using clicks-not-code via OData. On the Hadoop side,... (more)

CFP: 3rd International Conference on Adaptive Business InformationSystem

[ our apologies should you receive this message more than one time ] 3rd International Conference on Adaptive Business Information Systems Leipzig, Germany, 23-25 March http://siwn.org.uk/2009leipzig/ABIS09.htm Call for Papers Second SIWN Congress (SIWN 2009) collocated with SOFTWARE, AGENTS, AND SERVICES FOR BUSINESS, RESEARCH, AND E-SCIENCES (SABRE 2009) ============================================================ Papers of the conference will be invited to publish their revised versions in a journal issue of the International Journal Communications of SIWN (CoSIWN) (ISSN 1757-4439) ============================================================ Overview -------- Information Technologies in their broad sense have been profoundly changing the ways, the processes and the philosophies of businesses. Adaptive business applications support processes whose workflows, use... (more)

Algorithms of the Intelligent Web

I have recently finished writing the "Algorithms of the Intelligent Web" and it should hit the bookshelves in a few weeks. I would like to tell you what the book is about and why I wrote it -- to save some typing, hereafter, I will refer to the book as "AIW", "the AIW book", etc. The code for the book is hosted on Google Code here. The AIW book includes topics from the areas of machine learning, data mining, statistics, and discovery in knowledge bases. The literature on these topics is vast but it is, almost exclusively, academic and heavy in mathematical jargon. Nevertheless, the main ideas of the algorithms can be grasped and used by nearly every software engineer with a minimum of mathematical formalism and a little bit of effort. In fact, one of the goals that I set for the book was to describe every algorithm without writing a single mathematical equation; a ... (more)

Kognitio Celebrates 20 Years of WX2 Data Warehousing Platform

CHICAGO, May 6, 2009 — Kognitio today announced the 20th anniversary of its WX2 analytical database.  Since its introduction, WX2 has been a constant source of innovation in the field of data warehousing. It was the first platform to offer large-scale data warehousing, the first to enable Data Warehousing as a Service (DaaS) for organizations and one of the first to move onto an entirely software-based model. Kognitio has been responsible for innovating and advancing numerous data warehousing concepts and practices that are considered the leading edge in the field of business intelligence, such as the ability to query hundreds of terabytes of information within seconds instead of weeks, and enabling companies of all sizes to more easily take advantage of advanced data analytics at lower cost. The announcement was made at The Data Warehousing Institute’s (TDWI) World... (more)