The Way from Data to Information

Data Mining

Subscribe to Data Mining: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Mining: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories

Citizen Data Scientist, Jumbo Shrimp, and Other Descriptions That Make No Sense Okay, let me get this out there: I find the term “Citizen Data Scientist” confusing. Gartner defines a “citizen data scientist as “a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics.” While we teach business users to “think like a data scientist” in their ability to identify those variables and metrics that might be better predictors of performance, I do not expect that the business stakeholders are going to be able to create and generate analytic models. I do not believe, nor do I expect, that the business stakeholders are going to be proficient enough with tools like SAS or R or Python or Mahout or MADlib to 1) create or generate the models, and then 2) be profi... (more)

Demystifying Big Data | @BigDataExpo #IoT #Cloud #BigData

The Dean of the University of San Francisco School of Management, Elizabeth Davis, recently asked me to sit on a Big Data panel at the Direct Sales Association conference. I was given a 5-minute slot to “demystify” Big Data to a non-technical group of about 1,000 people; to help them understand where and how this thing called “Big Data” could help them. Well if you know me, I can barely introduce myself in 5 minutes. But this was particularly challenging for me, as I’m used to talking about Big Data with organizations with at least some level of Big Data experience or understanding (maybe they should get my second book – the “Big Data MBA” – and start there!). So I accepted the challenge, and here is what I said (and yes, I did it within the 5-minute window). Myth #1: Every Business needs a Big Data strategy. Reality #1: You don’t need a Big Data strategy; you need ... (more)

Big Data Business Model Maturity Index and IoT | @ThingsExpo #BigData #IoT #M2M #API #Wearables

Big Data Business Model Maturity Index and the Internet of Things (IoT) Antonio Figueiredo (@afigueiredo) recently challenged me on twitter with an interesting question: How would the Big Data Business Model Maturity Index (BDBMMI) change to support the Internet of Things (IoT)? My hope is that the BDBMMI would not need to change to support IoT. It is my hope that the BDBMMI could be used to guide any industry that is going through a data and analytics-driven transformation, such as what is happening to many industries due to IoT. Let’s see how one could use the BDBMMI to help organizations to exploit the IoT. But before we start that exercise, let’s start with some key definitions: The Big Data Business Model Maturity Index (BDBMMI) is a framework to measure how effective an organization is at leveraging data and analytics to power the business (see Figure 1). We ... (more)

De-mystifying the Big Data Business Model Maturity Index

I know that I’ve hit the big (data) time when concepts that I developed start to appear as infographics.  Today I am very proud to announce the launching of the demystified Big Data Business Model Maturity Index (BDBMMI) infographic! Through the usage of clear, simple language paired with practical examples that illustrate each stage of the big data maturity journey, the goal of this infographic is to demystify the BDBMMI – to make it easier for customers (and readers) to understand what the BDBMMI is, and how to use it to successfully leverage data and analytics to power their business models. Figure 1: Big Data Business Model Maturity Index Infographic [CLICK TO ENLARGE] Today you’ll learn more about Dave’s story and how he navigated through the stages of the Big Data Business Model Maturity Index and put big data to work for his organization. Before we get star... (more)

New Approaches for New Big Data Insights | @BigDataExpo #BigData

New Approaches for New Big Data Insights by Melvin Greer Business Intelligence has matured as a core competency necessary to sustain competitive advantage. Organizations of every size and industry are generating valuable data with each interaction, and that data can be captured, analyzed, and turned into business insight. These organizations are using analytics features like dashboards, advanced visualization, data warehousing, and other technologies to achieve their strategic business objectives. Many companies are taking a hybrid cloud approach to data analysis. Leveraging a hybrid cloud environment as part of a big data analytics strategy enables businesses to take advantage of cloud elasticity. This allows organizations to process data across clusters of computers, enabling analysis to occur across multiple cloud compute environments. As organizations' need for... (more)

When Is A Copy A Backup?

Ocarina's Carter George continued the conversation on backups, asking if the conventional backup paradigm was obsolete, and if file copies could serve the same purpose. As mentioned in our "What Is a Backup?" post, this is the same question posed by EMC's Scott Waterhouse recently. Putting Copies To The Test George suggests a copy-based scenario: "Why not just move files that are candidates for being backed up to a separate tier of storage, keeping them as files in their native format, and organizing them in time coherent views?" To determine whether this is truly a backup, let's apply our new rules to determine when a copy becomes a backup: A copy is, by definition, a copy of a set of data. This copy is not mentioned as being protected or offline, which worries the IT admin in me. Could they be overwritten or corrupted? Would they disappear along with the primary data... (more)

Will You Comply or Just Check the Box?

Some of both, apparently.  A recent Ponemon Institute PCI-DSS Compliance survey revealed that 71% of companies actually admitted that data security is not a top priority and 55% say they are only protecting credit card data and not other sensitive information like bank account info, social security numbers and drivers license data. Additional statistics show that a miniscule 28% of smaller companies (501-1000 employees) are PCI-DSS compliant and around 70% of large companies (>75,000 employees) say they meet the Regulations.  The one that jumps out for me is the small merchant stat.  I understand that cost is a large factor for smaller companies to be PCI compliant but just imagine how many companies and industries that fall into the 501-1000 employee category. And that doesn’t count all the even smaller ‘Family Owned’ restaurants, auto repair shops or any other servi... (more)

Revolution Analytics: R Language Features

R is an incredibly comprehensive statistics package. Even if you just look at the standard R distribution (the base and recommended packages), R can do pretty much everything you need for data manipulation, visualization, and statistical analysis. And for everything else, there's more than 5000 packages on CRAN and other repositories, and the big-data capabilities of Revolution R Enterprise. As a result, trying to make a list of everything R can do is a difficult task. But we've made an effort in this list of R Language Features, a new section on the Revolution Analytics website. It's broken up into four main sections (analytics, graphics and visualization, R applications and extensions, and programming language features), each with their own subsections: ANALYTICS Basic Mathematics Basic Statistics Probability Distributions Big Data Analytics * Machine Learning Opt... (more)

Bob Gourley on the Ethics, Analytics and Future of Big Data

Bob Gourley, editor of CTOvision as well as  founder and CTO of Crucial Point, LLC, was recently interviewed by WashingtonExec, where he shared his views on emerging information technology, government needs, and Big Data. The original article can be found here, and the interview is reproduced below: How does Bob Gourley, founder and CTO of Crucial Point, LLCand Editor of the popular tech blog CTOVision.com define big data? WashingtonExec caught up with Gourley to talk about what he learned from his time in government that has guided him in the private sector, what he predicts will be “the next big thing” for the IT industry, and also gave an update on the 2012 Big Data Solutions Awards. Securing IP addresses, banking apps and predictive analytics were also discussed in this interview. WashingtonExec:  Could you start out by telling us a little about your background?... (more)

The Cure for the Common Cloud-Based Big Data Initiative

There is no doubt that Big Data holds infinite promise for a range of industries. Better visibility into data across various sources enables everything from insight into saving electricity to agricultural yield to placement of ads on Google. But when it comes to deriving value from data, no industry has been doing it as long or with as much rigor as clinical researchers. Unlike other markets that are delving into Big Data for the first time and don't know where to begin, drug and device developers have spent years refining complex processes for asking very specific questions with clear purposes and goals. Whether using data for designing an effective and safe treatment for cholesterol, or collecting and mining data to understand proper dosage of cancer drugs, life sciences has had to dot every "i" and cross every "t" in order to keep people safe and for new therapi... (more)

Big Data Capability Model | @ThingsExpo #BigData #IoT #InternetOfThings

A capability model is a structure that represents the core abilities and competencies of an entity (department, organization, person, system, and technology) to achieve its objectives, especially in relation to its overall mission and functions. The Big Data Capability Model (BDCM) is defined as the key functionalities in dealing with Big Data problems and challenges. It describes the major features, behaviors, practices and processes in an organization, which can reliably and sustainably produce required outcomes for Big Data demands. BDCM consist of the following elements: Collection: collect raw data, sources, formats, discovery, protocols, staging ELT: extract, load and transform data Store: NoSQL repository, key-value, column-based, document-oriented, graph, Hadoop, MPP, in-memory, cache Integration: data move, messaging, consumption, access, connector Processing... (more)