The Way from Data to Information

Data Mining

Subscribe to Data Mining: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Mining: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Top Stories

Quick quiz! What’s the first thing that comes to mind when you hear the following phrases? Artificial grass Artificial sweeteners Artificial flavors Artificial plants Artificial flowers Artificial diamonds and jewelry Artificial (fake) news These phrases probably evoke thoughts such as “fake,” “not real,” or even “shabby.” Artificial is such a harsh adjective. The word “artificial” is defined as “imitation; simulated; sham” with synonyms such as fake, false, mock, counterfeit, bogus, phony and factitious. The word “artificial” may not be the right term to use to describe “Artificial Intelligence,” because “artificial intelligence” is anything but fake, false, phony, or a sham. Maybe a better term is “Augmented Human Intelligence,” or a phrase that highlights both the importance of augmenting the human’s intelligence as well as to alleviate the fears that AI means ... (more)

In-Stream Processing | @CloudExpo @robinAKAroblimo #BigData #AI #BI #DX

Most of us have moved our web and e-commerce operations to the cloud, but we are still getting sales reports and other information we need to run our business long after the fact. We sell a hamburger on Tuesday, you might say, but don't know if we made money selling it until Friday. That's because we still rely on Batch processing, where we generate orders, reports, and other management-useful pieces of data when it's most convenient for the IT department to process them, rather than in real time. That was fine when horse-drawn wagons made our deliveries, but it is far too slow for today's world, where stock prices and other bits of information circle the world (literally) at the speed of light. It's time to move to In-Stream Processing. You can't - and shouldn't - keep putting it off. [Figure 1, courtesy of the Grid Dynamics Blog] This diagram may look complicate... (more)

Application Performance Management Done Right

What is Application Performance Management (APM)? Like a lot of good questions, it depends on your business needs.  What is the goal of an ideal APM?  Does it mean 99.999% availability?  Perhaps it is a favorable overall end user experience when using the application but, as compared to what? My point is that Application Performance Management / Monitoring means different things to different businesses and it can even depend on the application involved. What is the Goal of APM “Begin with the goal in mind.” I wish I could take credit for that quote.  What is the goal of the APM? Have you listed out the objectives you hope to obtain from your APM strategy?  This approach will help your team ensure satisfaction with the final solution chosen.  Here are some examples. Minimum of 99.999% availability with lower Mean Time To Know (MTTK) and Mean Time To Repair (MTTR) Less ... (more)

Demystifying #DataScience | @CloudExpo #BigData #AI #ArtificialIntelligence

[Opening Scene]: Billy Dean is pacing the office. He’s struggling to keep his delivery trucks at full capacity and on the road. Random breakdowns, unexpected employee absences, and unscheduled truck maintenance are impacting bookings, revenues and ultimately customer satisfaction. He keeps hearing from his business customers how they are leveraging data science to improve their business operations. Billy Dean starts to wonder if data science can help him. As he contemplates what data science can do for him, he slowly drifts off to sleep, and visions of Data Science starts dancing in his head… [Poof! Suddenly Wizard Wei appears]: Hi, I’m your data science wizard to help alleviate your data science concerns. I don’t understand why folks try to make the data science discussion complicated. Let’s start simple with a simple definition of data science: Data science is a... (more)

Algorithms of the Intelligent Web

I have recently finished writing the "Algorithms of the Intelligent Web" and it should hit the bookshelves in a few weeks. I would like to tell you what the book is about and why I wrote it -- to save some typing, hereafter, I will refer to the book as "AIW", "the AIW book", etc. The code for the book is hosted on Google Code here. The AIW book includes topics from the areas of machine learning, data mining, statistics, and discovery in knowledge bases. The literature on these topics is vast but it is, almost exclusively, academic and heavy in mathematical jargon. Nevertheless, the main ideas of the algorithms can be grasped and used by nearly every software engineer with a minimum of mathematical formalism and a little bit of effort. In fact, one of the goals that I set for the book was to describe every algorithm without writing a single mathematical equation; a ... (more)

Enterprise Cloud Computing: Mumbai – November 18 – 19, 2013

Enterprise Cloud Computing: Mumbai – November 18 – 19, 2013 Monday November 18, 2013-Tuesday November 19, 2013 Mumbai India Venue TBD Price: Rs31,900.00 (including $6000 early discount) [converted to 570.00 USD]   Offered in partnership with C C & C Solutions We offer additional discounts for groups of three or more people, government or non-profit employees, people who’ve taken a ZapThink class before, or individuals who are paying out of their own pocket. Please email us at for a discount code you can use when registering. ZapThink Enterprise Enterprise Cloud Computing Course: The Leading Vendor Independent, Architecture-Focused Cloud Training The Enterprise Cloud Computing course an intensive, two day “fire hose” of information that prepares you to leverage the Cloud to achieve real business value. We cut through the hype and separate what really wor... (more)

Federal Big Data Community Convenes 6 Feb at Apache Hadoop Forum

By Bob Gourley If you are an analyst or executive or architect engaged in the analysis of big data, this is a “must attend” event. We will be there engaging with the community and hope to see you there. Registration is now open for the third annual Federal Big Data Apache Hadoop Forum! Register here. Join us on Thurs., Feb. 6, as leaders from government and industry convene to share Big Data best practices. This is a must attend event for any organization or agency looking to be information-driven and give access to more data to more resources and applications. During this informative event you will learn: > Key trends in government today and the role Big Data plays in driving transformation; > How leading agencies are putting data to good use to uncover new insight, streamline costs, and manage threats; > The role of an Enterprise Data Hub, and how it is a game... (more)

Citizen Data Scientist, Jumbo Shrimp | @CloudExpo @Schmarzo #BigData

Citizen Data Scientist, Jumbo Shrimp, and Other Descriptions That Make No Sense Okay, let me get this out there: I find the term “Citizen Data Scientist” confusing. Gartner defines a “citizen data scientist as “a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics.” While we teach business users to “think like a data scientist” in their ability to identify those variables and metrics that might be better predictors of performance, I do not expect that the business stakeholders are going to be able to create and generate analytic models. I do not believe, nor do I expect, that the business stakeholders are going to be proficient enough with tools like SAS or R or Python or Mahout or MADlib to 1) create or generate the models, and then 2) be profi... (more)

The Future Is Intelligent Apps | @ThingsExpo #IoT #M2M #BigData #Analytics

I have seen the future! Of course, I seem to say that every other month (maybe that’s because the future keeps changing?), but this is a good one. The future is a collision between big data (and data science) and application development that will yield a world of “intelligent apps.” These “intelligent apps” combine customer, product and operational insights (uncovered with predictive and prescriptive analytics) with modern application development tools and user-centric design to create a more compelling, more prescriptive user experience. These intelligent apps not only know how to support or enable key user decisions, but they continually learn from the user interactions to become even more relevant and valuable to those users. Several developments and posts by industry leaders over the past few weeks have started to add some substance to this intelligent apps tre... (more)

A Hybrid Data Pipeline | @CloudExpo @ProgressSW #BigData #AI #DataLake

Building a Hybrid Data Pipeline for Salesforce and Hadoop My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sumit Sarkar, Product Marketing Engineer at Progress, will discuss how the key to delivering on this was using standard interfaces using a bi-directional data pipeline to connect the systems. On the Salesforce side, we were able to get frictionless access to the data lake using clicks-not-code via OData. On the Hadoop side,... (more)

Web-Based Data Mining and Agile Reporting with AJAX

Izenda (, a leader in the development of web-based reporting tools for business users, has announced the release of Izenda Ad Hoc version 4.0. Leveraging the speed and responsiveness of AJAX technology, Ad Hoc 4.0 brings advanced reporting capabilities to non-technical users through a simple, web-based tool. Modern companies use databases to store huge amounts of raw data, such as sales results, financial reports, and operational metrics. Reporting systems allow them to extract and summarize the information they need to make better decisions. For example, a retailer that had millions of transactions last month would not want to look over every transaction. Instead, they may be interested in specific figures like the top ten products, the sales volume and profit margins. But because traditional reporting systems are extremely complex and require very s... (more)