The Way from Data to Information

Data Mining

Subscribe to Data Mining: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Mining: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Top Stories

Most of us have moved our web and e-commerce operations to the cloud, but we are still getting sales reports and other information we need to run our business long after the fact. We sell a hamburger on Tuesday, you might say, but don't know if we made money selling it until Friday. That's because we still rely on Batch processing, where we generate orders, reports, and other management-useful pieces of data when it's most convenient for the IT department to process them, rather than in real time. That was fine when horse-drawn wagons made our deliveries, but it is far too slow for today's world, where stock prices and other bits of information circle the world (literally) at the speed of light. It's time to move to In-Stream Processing. You can't - and shouldn't - keep putting it off. [Figure 1, courtesy of the Grid Dynamics Blog] This diagram may look complicate... (more)

Demystifying #DataScience | @CloudExpo #BigData #AI #ArtificialIntelligence

[Opening Scene]: Billy Dean is pacing the office. He’s struggling to keep his delivery trucks at full capacity and on the road. Random breakdowns, unexpected employee absences, and unscheduled truck maintenance are impacting bookings, revenues and ultimately customer satisfaction. He keeps hearing from his business customers how they are leveraging data science to improve their business operations. Billy Dean starts to wonder if data science can help him. As he contemplates what data science can do for him, he slowly drifts off to sleep, and visions of Data Science starts dancing in his head… [Poof! Suddenly Wizard Wei appears]: Hi, I’m your data science wizard to help alleviate your data science concerns. I don’t understand why folks try to make the data science discussion complicated. Let’s start simple with a simple definition of data science: Data science is a... (more)

Data Unification at Scale | @CloudExpo #BigData #DataLake #AI #Analytics

This term Data Unification is new in the Big Data lexicon, pushed by varieties of companies such as Talend, 1010Data, and TamR. Data unification deals with the domain known as ETL (Extraction, Transformation, Loading), initiated during the 1990s when Data Warehousing was gaining relevance. ETL refers to the process of extracting data from inside or outside sources (multiple applications typically developed and supported by different vendors or hosted on separate hardware), transform it to fit operational needs (based on business rules), and load it into end target databases, more specifically, an operational data store, data mart, or a data warehouse. These are read-only databases for analytics. Initially the analytics was mostly retroactive (e.g. how many shoppers between age 25-35 bought this item between May and July?). This was like driving a car looking at the ... (more)

$1B Opportunity: #DarkData | @CloudExpo #BigData #BI #AI #ML #DataScience

Every organization collects, stores and retains portions of dark data. It's the digital equivalent of emotional baggage which hangs around after every user interaction, transaction, and customer engagement. In fact, not using data effectively is costing United Airlines almost $1 Billion annually in lost revenue. Gartner Inc. describes dark data as "information assets that organizations collect, process and store in the course of their regular business activity, but fail to use for other purposes." For travel companies with a strong online presence, dark data represents a sizable portion of all data stored. Such examples might include: How many times a user resets their password IP address when a user logs into your website/app Last email communication date to your customers Mobile handset type, or web browser version Free text feedback on a hotel stay or recent flig... (more)

Citizen Data Scientist, Jumbo Shrimp | @CloudExpo @Schmarzo #BigData

Citizen Data Scientist, Jumbo Shrimp, and Other Descriptions That Make No Sense Okay, let me get this out there: I find the term “Citizen Data Scientist” confusing. Gartner defines a “citizen data scientist as “a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics.” While we teach business users to “think like a data scientist” in their ability to identify those variables and metrics that might be better predictors of performance, I do not expect that the business stakeholders are going to be able to create and generate analytic models. I do not believe, nor do I expect, that the business stakeholders are going to be proficient enough with tools like SAS or R or Python or Mahout or MADlib to 1) create or generate the models, and then 2) be profi... (more)

Algorithms of the Intelligent Web

I have recently finished writing the "Algorithms of the Intelligent Web" and it should hit the bookshelves in a few weeks. I would like to tell you what the book is about and why I wrote it -- to save some typing, hereafter, I will refer to the book as "AIW", "the AIW book", etc. The code for the book is hosted on Google Code here. The AIW book includes topics from the areas of machine learning, data mining, statistics, and discovery in knowledge bases. The literature on these topics is vast but it is, almost exclusively, academic and heavy in mathematical jargon. Nevertheless, the main ideas of the algorithms can be grasped and used by nearly every software engineer with a minimum of mathematical formalism and a little bit of effort. In fact, one of the goals that I set for the book was to describe every algorithm without writing a single mathematical equation; a ... (more)

Reconciling Big Data & Data Residency in the Cloud

I recently blogged about “big data” and the value that big data analytics can bring to companies using this type of business intelligence to garner insights and develop competitive advantages. Numerous industry experts have highlighted ways that the cloud is actually enabling a transformation toward data mining, pattern recognition, and predictive analytics to enhance executive decision-making. For example, Booz|Allen|Hamilton’s December 2011 report, Massive Data Analytics and the Cloud, asserts that data cloud-based intelligence analysis will have an unprecedented, long-lasting, and far-reaching impact on business strategy development. ... (more)

The CTOvision Disruptive IT List: Firms we believe all enterprise technologists should be tracking

By BobGourley Disruptive IT List The Disruptive IT List is our assessment of the technology firms with the greatest potential for virtuous disruption of enterprise IT. Our goal is to provide enterprise CTOs with advanced notice of firms they should be evaluating now for use in transforming their technology base. We believe the firms here meet a threshold of significance that warrants special attention.   The Disruptive IT List includes:   10gen: Production support for MongoDB 10gen’s comprehensive range of services enable you to get the most out of commercial-grade deployments of MongoDB. 10gen develops MongoDB, and offers production support, … [Read More...] Actifio: Radically Simple Actifio solutions are deployed in physical, virtual or hybrid IT environments in enterprise IT organizationsacross all vertical markets and in managed or cl... (more)

A Tale of BI – What Data Visualization Can Tell You about Your Business

You've heard the great promise of business intelligence: data visualization will completely change the way you think about your business. We strive for those life-altering "aha" moments that completely shift the way we think, and help move us forward in a way we just couldn't before due to a lack of knowledge, clarity or inspiration. But how can data visualization promise to give us more of those moments? Data visualization, the graphical representation of data and information, can help you discover trends, patterns and correlations in your businesses that might otherwise go unseen. It makes meaningful insights easier to spot, epiphanies more common to come by, and helps you create a business story that you can use as a springboard for smart change and success. Seeing with Kaleidoscope Eyes Taking large amounts of data and putting it into a business analysis system ... (more)

Internet of Things Maturity Model By @TonyShan | @ThingsExpo [#IoT]

Internet of Things (IoT) is booming. The “Software for the Internet of Things (IoT) Developer Survey” report, published by Embarcadero Technologies last month, shows that 77% of development teams will have IoT solutions in active development in 2015 with almost half (49%) of IoT developers anticipating their solutions will generate business impacts by the end of this year. IoT Maturity Model (IoTMM) is a qualitative method to gauge the growth and increasing impact of IoT capabilities in an IT environment from both business and technology perspectives. It comprises  a set of criteria, parameters and factors that can be used to describe and measure the effectiveness of the IoT adoption and implementation. Five levels of maturity are defined: Advanced, Dynamic, Optimized, Primitive, and Tentative (ADOPT). The definitions of these 5 levels are specified below: Level Desc... (more)

Big Data Business Model Maturity Index and IoT | @ThingsExpo #BigData #IoT #M2M #API #Wearables

Big Data Business Model Maturity Index and the Internet of Things (IoT) Antonio Figueiredo (@afigueiredo) recently challenged me on twitter with an interesting question: How would the Big Data Business Model Maturity Index (BDBMMI) change to support the Internet of Things (IoT)? My hope is that the BDBMMI would not need to change to support IoT. It is my hope that the BDBMMI could be used to guide any industry that is going through a data and analytics-driven transformation, such as what is happening to many industries due to IoT. Let’s see how one could use the BDBMMI to help organizations to exploit the IoT. But before we start that exercise, let’s start with some key definitions: The Big Data Business Model Maturity Index (BDBMMI) is a framework to measure how effective an organization is at leveraging data and analytics to power the business (see Figure 1). We ... (more)