The Way from Data to Information

Data Mining

Subscribe to Data Mining: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Mining: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories

This term Data Unification is new in the Big Data lexicon, pushed by varieties of companies such as Talend, 1010Data, and TamR. Data unification deals with the domain known as ETL (Extraction, Transformation, Loading), initiated during the 1990s when Data Warehousing was gaining relevance. ETL refers to the process of extracting data from inside or outside sources (multiple applications typically developed and supported by different vendors or hosted on separate hardware), transform it to fit operational needs (based on business rules), and load it into end target databases, more specifically, an operational data store, data mart, or a data warehouse. These are read-only databases for analytics. Initially the analytics was mostly retroactive (e.g. how many shoppers between age 25-35 bought this item between May and July?). This was like driving a car looking at the ... (more)

In-Stream Processing | @CloudExpo @robinAKAroblimo #BigData #AI #BI #DX

Most of us have moved our web and e-commerce operations to the cloud, but we are still getting sales reports and other information we need to run our business long after the fact. We sell a hamburger on Tuesday, you might say, but don't know if we made money selling it until Friday. That's because we still rely on Batch processing, where we generate orders, reports, and other management-useful pieces of data when it's most convenient for the IT department to process them, rather than in real time. That was fine when horse-drawn wagons made our deliveries, but it is far too slow for today's world, where stock prices and other bits of information circle the world (literally) at the speed of light. It's time to move to In-Stream Processing. You can't - and shouldn't - keep putting it off. [Figure 1, courtesy of the Grid Dynamics Blog] This diagram may look complicate... (more)

Demystifying #DataScience | @CloudExpo #BigData #AI #ArtificialIntelligence

[Opening Scene]: Billy Dean is pacing the office. He’s struggling to keep his delivery trucks at full capacity and on the road. Random breakdowns, unexpected employee absences, and unscheduled truck maintenance are impacting bookings, revenues and ultimately customer satisfaction. He keeps hearing from his business customers how they are leveraging data science to improve their business operations. Billy Dean starts to wonder if data science can help him. As he contemplates what data science can do for him, he slowly drifts off to sleep, and visions of Data Science starts dancing in his head… [Poof! Suddenly Wizard Wei appears]: Hi, I’m your data science wizard to help alleviate your data science concerns. I don’t understand why folks try to make the data science discussion complicated. Let’s start simple with a simple definition of data science: Data science is a... (more)

Citizen Data Scientist, Jumbo Shrimp | @CloudExpo @Schmarzo #BigData

Citizen Data Scientist, Jumbo Shrimp, and Other Descriptions That Make No Sense Okay, let me get this out there: I find the term “Citizen Data Scientist” confusing. Gartner defines a “citizen data scientist as “a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics.” While we teach business users to “think like a data scientist” in their ability to identify those variables and metrics that might be better predictors of performance, I do not expect that the business stakeholders are going to be able to create and generate analytic models. I do not believe, nor do I expect, that the business stakeholders are going to be proficient enough with tools like SAS or R or Python or Mahout or MADlib to 1) create or generate the models, and then 2) be profi... (more)

$1B Opportunity: #DarkData | @CloudExpo #BigData #BI #AI #ML #DataScience

Every organization collects, stores and retains portions of dark data. It's the digital equivalent of emotional baggage which hangs around after every user interaction, transaction, and customer engagement. In fact, not using data effectively is costing United Airlines almost $1 Billion annually in lost revenue. Gartner Inc. describes dark data as "information assets that organizations collect, process and store in the course of their regular business activity, but fail to use for other purposes." For travel companies with a strong online presence, dark data represents a sizable portion of all data stored. Such examples might include: How many times a user resets their password IP address when a user logs into your website/app Last email communication date to your customers Mobile handset type, or web browser version Free text feedback on a hotel stay or recent flig... (more)

The #InternetOfThings Will Generate Terabytes of Data | @ThingsExpo #IoT #M2M #API #Microservices

The Internet of Things Will Generate Terabytes of Data. What Will We Do with All of It? By Elle Wood In less than 5 years, "the Internet of Things will transform the data center," says Gartner. This transformation is predicted to trickle across industries and affect business models, how we market products and even inspire new technology developments. With a sensor on absolutely everything - from cars and houses to your family members - it goes without saying there will be some challenges with these massive amounts of data. Furthermore, there is a lot of uncertainties associated with IoT because of this data. Is it even useful? How do we use it? And, one of the more important questions, how secure is the data in the cloud anyway? Fortunately, developing management tools to hone all of this data have helped to answer several of these questions. Data mining for fast... (more)

Kognitio Celebrates 20 Years of WX2 Data Warehousing Platform

CHICAGO, May 6, 2009 — Kognitio today announced the 20th anniversary of its WX2 analytical database.  Since its introduction, WX2 has been a constant source of innovation in the field of data warehousing. It was the first platform to offer large-scale data warehousing, the first to enable Data Warehousing as a Service (DaaS) for organizations and one of the first to move onto an entirely software-based model. Kognitio has been responsible for innovating and advancing numerous data warehousing concepts and practices that are considered the leading edge in the field of business intelligence, such as the ability to query hundreds of terabytes of information within seconds instead of weeks, and enabling companies of all sizes to more easily take advantage of advanced data analytics at lower cost. The announcement was made at The Data Warehousing Institute’s (TDWI) World... (more)

When Is A Copy A Backup?

Ocarina's Carter George continued the conversation on backups, asking if the conventional backup paradigm was obsolete, and if file copies could serve the same purpose. As mentioned in our "What Is a Backup?" post, this is the same question posed by EMC's Scott Waterhouse recently. Putting Copies To The Test George suggests a copy-based scenario: "Why not just move files that are candidates for being backed up to a separate tier of storage, keeping them as files in their native format, and organizing them in time coherent views?" To determine whether this is truly a backup, let's apply our new rules to determine when a copy becomes a backup: A copy is, by definition, a copy of a set of data. This copy is not mentioned as being protected or offline, which worries the IT admin in me. Could they be overwritten or corrupted? Would they disappear along with the primary data... (more)

Breach Is The Word, Is The Word, Is The Word That You Heard

…to the tune of $6.6 Mil per-r-r Breach.  Yup – according to Ponemon Institute the average cost of a data breach is $6.6 million and they also report that it costs about $215 per compromised record (pdf).  McAfee estimates $1 trillion in losses yearly, due to data theft – that’s 10 to the 12th dollars.  Imagine if IT budgets could get that back? The past two years saw a significant increase in large scale attacks with the January 2007 TJX breach starting the massive flurry.  As of October 2007, TJX said that more than were 94 million accounts affected at a cost of over $256 million.  At the time it was the largest data loss incident to date.  The crooks kept it up, however.  Hannaford Grocers was hit Dec 2007 but they didn’t discover it until February 2008 and announced in March 2008 that 4.2 million cards had been exposed  leading to over 1800 cases of fraud.  In ... (more)

SAS to Analyze Social Media Fraud

Social Media on Ulitzer To prevent 2009 from becoming the “Year of the Phish” in Asia Pacific (as opposed to the Year of the Ox), SAS, the leader in business analytics, has launched innovative new software to fight fraud, but also to better understand customer sentiment. SAS Social Network Analysis and SAS’ social media analysis solution help organizations uncover hidden relationships between individuals and data; detect patterns and trends; and mine text and other unstructured data. From phishing and skimming to claims fraud and money laundering, increasingly sophisticated fraud techniques are causing huge losses and harming customer relationships. So too are negative online comments that quickly spread through Web sites, blogs, Twitter and other channels, resulting in lost business and damaged brands. The rapid adoption of social media has given the public a pla... (more)

Security – Still in the Driver’s Seat

Security Track at Cloud Computing Expo A couple of recent surveys reveal that for 2010, Security is back at the top of IT’s focus.  It seemed for a while there that Cloud Computing was starring in most questionnaires that asked about future IT spending plans.  If you remember, Security was still riding shot-gun slamming on the imaginary brakes in the passenger seat.  ‘Hey Cloud, You still can’t turn down that alley without my presence,’ Security would constantly nag from the navigator position.  Don’t get me wrong, Cloud Computing is still a powerful IT resource but according to a recent Infonetics survey, ‘Security upgrades, both for IT security and physical security, was the #1 change named by respondent organizations when asked what major changes they planned for their data centers over the next two years……For those who are expecting ‘the cloud’ to be a savior of... (more)