The Way from Data to Information

Data Mining

Subscribe to Data Mining: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Data Mining: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories

What Tomorrow's Business Leaders Need to Know About Machine Learning Sometimes I write a blog just to formulate and organize a point of view, and I think it’s time that I pull together the bounty of excellent information about Machine Learning. This is a topic with which business leaders must become comfortable, especially tomorrow’s business leaders (tip for my next semester University of San Francisco business students!). Machine learning is a key capability that will help organizations drive optimization and monetization opportunities, and there have been some recent developments that will place basic machine learning capabilities into the hands of the lines of business. By the way, there is an absolute wealth of freely-available material on machine learning, so I’ve included a sources section at the end of this blog for folks who want more details on machine lea... (more)

Data Unification at Scale | @CloudExpo #BigData #DataLake #AI #Analytics

This term Data Unification is new in the Big Data lexicon, pushed by varieties of companies such as Talend, 1010Data, and TamR. Data unification deals with the domain known as ETL (Extraction, Transformation, Loading), initiated during the 1990s when Data Warehousing was gaining relevance. ETL refers to the process of extracting data from inside or outside sources (multiple applications typically developed and supported by different vendors or hosted on separate hardware), transform it to fit operational needs (based on business rules), and load it into end target databases, more specifically, an operational data store, data mart, or a data warehouse. These are read-only databases for analytics. Initially the analytics was mostly retroactive (e.g. how many shoppers between age 25-35 bought this item between May and July?). This was like driving a car looking at the ... (more)

In-Stream Processing | @CloudExpo @robinAKAroblimo #BigData #AI #BI #DX

Most of us have moved our web and e-commerce operations to the cloud, but we are still getting sales reports and other information we need to run our business long after the fact. We sell a hamburger on Tuesday, you might say, but don't know if we made money selling it until Friday. That's because we still rely on Batch processing, where we generate orders, reports, and other management-useful pieces of data when it's most convenient for the IT department to process them, rather than in real time. That was fine when horse-drawn wagons made our deliveries, but it is far too slow for today's world, where stock prices and other bits of information circle the world (literally) at the speed of light. It's time to move to In-Stream Processing. You can't - and shouldn't - keep putting it off. [Figure 1, courtesy of the Grid Dynamics Blog] This diagram may look complicate... (more)

Demystifying #DataScience | @CloudExpo #BigData #AI #ArtificialIntelligence

[Opening Scene]: Billy Dean is pacing the office. He’s struggling to keep his delivery trucks at full capacity and on the road. Random breakdowns, unexpected employee absences, and unscheduled truck maintenance are impacting bookings, revenues and ultimately customer satisfaction. He keeps hearing from his business customers how they are leveraging data science to improve their business operations. Billy Dean starts to wonder if data science can help him. As he contemplates what data science can do for him, he slowly drifts off to sleep, and visions of Data Science starts dancing in his head… [Poof! Suddenly Wizard Wei appears]: Hi, I’m your data science wizard to help alleviate your data science concerns. I don’t understand why folks try to make the data science discussion complicated. Let’s start simple with a simple definition of data science: Data science is a... (more)

A Hybrid Data Pipeline | @CloudExpo @ProgressSW #BigData #AI #DataLake

Building a Hybrid Data Pipeline for Salesforce and Hadoop My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sumit Sarkar, Product Marketing Engineer at Progress, will discuss how the key to delivering on this was using standard interfaces using a bi-directional data pipeline to connect the systems. On the Salesforce side, we were able to get frictionless access to the data lake using clicks-not-code via OData. On the Hadoop side,... (more)

Reconciling Big Data & Data Residency in the Cloud

I recently blogged about “big data” and the value that big data analytics can bring to companies using this type of business intelligence to garner insights and develop competitive advantages. Numerous industry experts have highlighted ways that the cloud is actually enabling a transformation toward data mining, pattern recognition, and predictive analytics to enhance executive decision-making. For example, Booz|Allen|Hamilton’s December 2011 report, Massive Data Analytics and the Cloud, asserts that data cloud-based intelligence analysis will have an unprecedented, long-lasting, and far-reaching impact on business strategy development. ... (more)

Kognitio Celebrates 20 Years of WX2 Data Warehousing Platform

CHICAGO, May 6, 2009 — Kognitio today announced the 20th anniversary of its WX2 analytical database.  Since its introduction, WX2 has been a constant source of innovation in the field of data warehousing. It was the first platform to offer large-scale data warehousing, the first to enable Data Warehousing as a Service (DaaS) for organizations and one of the first to move onto an entirely software-based model. Kognitio has been responsible for innovating and advancing numerous data warehousing concepts and practices that are considered the leading edge in the field of business intelligence, such as the ability to query hundreds of terabytes of information within seconds instead of weeks, and enabling companies of all sizes to more easily take advantage of advanced data analytics at lower cost. The announcement was made at The Data Warehousing Institute’s (TDWI) World... (more)

Decade Old Data Centers

Most data centers are now hitting their teens when it comes to age.  How do I know this?  I used to work for Exodus, The Data Center Company back at the turn of the century (actually wearing an old EXDS t-shirt as I write this.)  The ‘heyday’ of the Co-Location.    ‘Daddy, what was the datacenter like when you were a kid?’  Well, we’d find a somewhat remote location and build these massive non-descript buildings, some more that 200,000 sq.ft. all over the world.  The walls were Kevlar lined.  We had multiple internet carriers dropping fiber at all sides of the building along with power from distinct sub-stations. There were multiple, huge CAT power generators that would kick in to keep the place running during a outage – even had contracts with fuel vendors to replenish the diesel for non-stop service.  We had racks and racks of DL380’s & Sun Sparcs humming through... (more)

Dumpster Diving vs. The Bit Bucket

Which is safer – a digital shopping cart or a metal shopping cart?  Most (or many...some?) of us take great care to keep our personal Identity information safe.  We make sure we send sensitive info over an encrypted tunnel, we use strong passwords for our various digital vaults, and other protective measures when navigating the treacherous Internet.  But you might not have known that Stolen wallets and physical documents accounts for 43% of all identity theft (pdf) which means we also need to shred our printed materials.  Many might feel uncomfortable entering their credit card for online purchases but have no problem handing that same credit card to a stranger (who then walks away with it) to pay for a meal at a restaurant even though online methods only accounted for 11% of all Identity Theft. There were almost 10 million Identity Theft victims in 2008, up 22% from... (more)

Can my PAN ride the LAN out the WAN?

In 2005, a Preventsys (now McAfee) and Qualys survey found that 52% of companies rely on a ‘Moat & Castle’ approach to Network Security but also admitted, at the time, that once the perimeter is penetrated, they are at risk. I haven’t been able to find a more recent statistic but I’m still betting that once a network is breached, it’s at risk. Networks are evolving, expanding and exploding with more data than ever before which means they also need to be smarter about who and what they allow on. They have become Application Delivery Networks and soon, truly Identity Aware. At the same time, many Enterprise networks are making interconnections with other Corporate networks enabling Federation or trust between the two to create an extended network. The good news/bad news about this is that according to Verizon Business’ “2009 Data Breach Investigations Report (pdf)” 32... (more)

Don’t Say a Word

This will probably be a short post since there are not that many security terms that begin with the 17th letter of our alphabet.  However, keeping Quiet is a common theme in security. As mentioned numerous times, locking passwords, logins, and other sensitive information in your mouth vault keeps them from leaking to others.  Social Engineering has always been about compromising that vault.  Recently there was a post by Roger Thompson, AVG’s Chief Research Officer, which actually suggested to Write Down your passwords, especially complex, hard to remember passwords. While this practice has been frowned upon for many years – as in the ever popular post-it’s stuck to laptops – there is some sense in creating (and writing down) difficult passwords that are extremely hard to guess.  Just put that paper in a safe location.  Our own Alan Murphy offered some advice about... (more)