The Way from Data to Information

Data Mining

Subscribe to Data Mining: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Mining: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories

What Tomorrow's Business Leaders Need to Know About Machine Learning Sometimes I write a blog just to formulate and organize a point of view, and I think it’s time that I pull together the bounty of excellent information about Machine Learning. This is a topic with which business leaders must become comfortable, especially tomorrow’s business leaders (tip for my next semester University of San Francisco business students!). Machine learning is a key capability that will help organizations drive optimization and monetization opportunities, and there have been some recent developments that will place basic machine learning capabilities into the hands of the lines of business. By the way, there is an absolute wealth of freely-available material on machine learning, so I’ve included a sources section at the end of this blog for folks who want more details on machine lea... (more)

Data Unification at Scale | @CloudExpo #BigData #DataLake #AI #Analytics

This term Data Unification is new in the Big Data lexicon, pushed by varieties of companies such as Talend, 1010Data, and TamR. Data unification deals with the domain known as ETL (Extraction, Transformation, Loading), initiated during the 1990s when Data Warehousing was gaining relevance. ETL refers to the process of extracting data from inside or outside sources (multiple applications typically developed and supported by different vendors or hosted on separate hardware), transform it to fit operational needs (based on business rules), and load it into end target databases, more specifically, an operational data store, data mart, or a data warehouse. These are read-only databases for analytics. Initially the analytics was mostly retroactive (e.g. how many shoppers between age 25-35 bought this item between May and July?). This was like driving a car looking at the ... (more)

Demystifying #DataScience | @CloudExpo #BigData #AI #ArtificialIntelligence

[Opening Scene]: Billy Dean is pacing the office. He’s struggling to keep his delivery trucks at full capacity and on the road. Random breakdowns, unexpected employee absences, and unscheduled truck maintenance are impacting bookings, revenues and ultimately customer satisfaction. He keeps hearing from his business customers how they are leveraging data science to improve their business operations. Billy Dean starts to wonder if data science can help him. As he contemplates what data science can do for him, he slowly drifts off to sleep, and visions of Data Science starts dancing in his head… [Poof! Suddenly Wizard Wei appears]: Hi, I’m your data science wizard to help alleviate your data science concerns. I don’t understand why folks try to make the data science discussion complicated. Let’s start simple with a simple definition of data science: Data science is a... (more)

R Tops Data Mining Software Poll

For the past 12 years, KDNuggets has conducted an annual poll asking "What analytics/data mining software you used in the past 12 months for a real project (not just evaluation)". In this year's poll, R was the top-ranked data mining solution, selected by 30.7% of poll respondents. Microsoft Excel was second, at 29.8%. Rapidminer, which took the #1 spot over R in 2011 and 2010, ranked third. And as Bob Muenchen notes, four of the top five ranked data mining solutions in this year's poll are open-source. R was also ranked in this poll as the most popular language for implementing data mining application, beating out SQL and Java. See the link below for the complete list of tools ranked in the KDNuggets poll. KDNuggets: Poll Results: Top Analytics,Data Mining, Big Data software used  ... (more)

Demystifying Big Data | @BigDataExpo #IoT #Cloud #BigData

The Dean of the University of San Francisco School of Management, Elizabeth Davis, recently asked me to sit on a Big Data panel at the Direct Sales Association conference. I was given a 5-minute slot to “demystify” Big Data to a non-technical group of about 1,000 people; to help them understand where and how this thing called “Big Data” could help them. Well if you know me, I can barely introduce myself in 5 minutes. But this was particularly challenging for me, as I’m used to talking about Big Data with organizations with at least some level of Big Data experience or understanding (maybe they should get my second book – the “Big Data MBA” – and start there!). So I accepted the challenge, and here is what I said (and yes, I did it within the 5-minute window). Myth #1: Every Business needs a Big Data strategy. Reality #1: You don’t need a Big Data strategy; you need ... (more)

A Hybrid Data Pipeline | @CloudExpo @ProgressSW #BigData #AI #DataLake

Building a Hybrid Data Pipeline for Salesforce and Hadoop My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sumit Sarkar, Product Marketing Engineer at Progress, will discuss how the key to delivering on this was using standard interfaces using a bi-directional data pipeline to connect the systems. On the Salesforce side, we were able to get frictionless access to the data lake using clicks-not-code via OData. On the Hadoop side,... (more)

It All Comes Down to YOU – The User

One of my favorite Security writers, Bruce Schneier, had an interesting entry last week called Reacting to Security Vulnerabilities where he discusses the recent reports about the security flaw in the SSL protocol and how we as users should relax and essentially, ‘do nothing.’  “What?!? – Do nothing??”  Yup, and he has some good reasons why.  Usually, new exploits, threats, breaches and the typical security stuff that garners the headlines, makes security folks jump.  Jump to search the internet for anything related, jump to see if our systems are infected or vulnerable, jump to put an action plan in place to reduce the risk.  These are reactionary behaviors when gloom gets delivered and we fully don’t understand the risk.  I’m not saying ignore warnings or plan for the worst, but since several new ‘weaknesses’ seem to get published on a monthly basis, you do need ... (more)

Data Mining Taken to a New Level

By Marcus Williams Some hot topics we are tracking: Data Mining Taken to a New Level During a recent expo Raytheon, the 5th largest defense contractor, displayed how their Rapid Information Overlay Technology could collect data a on user.  RIOT was designed to search through well known social media sites such as Facebook, Twitter, and Foursquare to gather information that could be  linked to a person’s everyday activity by the hour. Read More Budget Cuts that target Data Centers Across the Nation The imitative to close 400 data centers by October is predicted to save 5 billion dollars by 2015.  Consolidating  government data centers and utilizing the cloud could decrease in cost and increase in productivity over the next two years.  Read More Department of Defense and the Intelligence Community Searching For A Similar Solution Both entities are constructing a compreh... (more)

Citizen Data Scientist, Jumbo Shrimp | @CloudExpo @Schmarzo #BigData

Citizen Data Scientist, Jumbo Shrimp, and Other Descriptions That Make No Sense Okay, let me get this out there: I find the term “Citizen Data Scientist” confusing. Gartner defines a “citizen data scientist as “a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics.” While we teach business users to “think like a data scientist” in their ability to identify those variables and metrics that might be better predictors of performance, I do not expect that the business stakeholders are going to be able to create and generate analytic models. I do not believe, nor do I expect, that the business stakeholders are going to be proficient enough with tools like SAS or R or Python or Mahout or MADlib to 1) create or generate the models, and then 2) be profi... (more)

F5's BIG-IP with Oracle Access Manager to Enhance SSO and Access Control

Learn how F5's BIG-IP LTM/APM helps in conjunction with Oracle Access Manager centralizing web application authentication and authorization services, streamline access management, and reduce infrastructure costs Watch how BIG-IP APM can reduce TCO, lower deployment risk, and streamline operational efficiencies for customers along with having a unified point of enforcement to simplify auditing and control changes in configuring application access settings. ps twitter: @psilvas ... (more)

$1B Opportunity: #DarkData | @CloudExpo #BigData #BI #AI #ML #DataScience

Every organization collects, stores and retains portions of dark data. It's the digital equivalent of emotional baggage which hangs around after every user interaction, transaction, and customer engagement. In fact, not using data effectively is costing United Airlines almost $1 Billion annually in lost revenue. Gartner Inc. describes dark data as "information assets that organizations collect, process and store in the course of their regular business activity, but fail to use for other purposes." For travel companies with a strong online presence, dark data represents a sizable portion of all data stored. Such examples might include: How many times a user resets their password IP address when a user logs into your website/app Last email communication date to your customers Mobile handset type, or web browser version Free text feedback on a hotel stay or recent flig... (more)