Saturday 28 February 2015

Big Data, Hadoop Standards Group: Who's In, Who's Missing?

All eyes in the big data world are on the Open Data Platform -- a new association that strives to promote big data technologies and open source platforms like Hadoop. While promising and backed by big names like GE and IBM, the Open Data Platform initiative is also missing some key names. Here's a reality check.

Read about it here on +Information Management

10 Modern Statistical Concepts Discovered by Data Scientists

You sometimes hear from some old-fashioned statisticians that data scientists know nothing about statistics, and that them - the statisticians - know everything. Here we prove that actually it is the exact opposite: data science has its own core of statistical science research, in addition to data plumbing, statistical API's, and business / competitive intelligence research.

Continue reading +Vincent Granville 's blog here

Friday 27 February 2015

Microsoft makes its Hadoop on Azure service available to Linux users

Microsoft is making available a preview version of its Azure HDInsight (Hadoop on Azure) service running on Linux as of February 18.

hdinsightlinux.jpgLike its Windows counterpart, the HDInsight on Linux service is built on top of the Hortonworks Data Platform (HDP). HDInsight includes full compatibility with Apache Hadoop, as well as integration with Microsoft's own business-intelligence tools, such as Excel, SQL Server and PowerBI. And as it does with the Windows version, Microsoft plans to contribute back code it developed for the Linux HDInsight version to the Apache community, company officials said.

Read about it here on +ZDNet

How to Lie with Data Visualization

Data visualization is one of the most important tools we have to analyze data. But it’s just as easy to mislead as it is to educate using charts and graphs. In this article we’ll take a look at 3 of the most common ways in which visualizations can be misleading.

Read al about it here on Heap's data blog

Thursday 26 February 2015

Network structure and dynamics in online social systems

I rarely work with social network data, but I’m familiar with the standard problems confronting data scientists who work in this area. These include questions pertaining to network structure, viral content, and the dynamics of information cascades.

Continue reading here on +O'Reilly

A million rows isn’t cool. You know what’s cool? A billion rows.

If you’re just getting started doing analytic work with SQL on Hadoop, a table with a million rows might seem like a good starting point for experimentation. Isn’t that a lot of data? While you can exercise the features of a traditional database with a million rows, for Hadoop it’s not nearly enough. Think billions of rows instead.

Continue reading here from +O'Reilly

Wednesday 25 February 2015

Anomaly Detection is the New Black

In a smooth-running business, something that stands out from normal usually is not good. But even if it’s a happy accident, you still need to look at it

Sounds simple, but with huge amounts of data this can be challenging, and the volume of incoming data is growing fast

Continue reading here on +Data Informed

Big data vendors back open source tools

IBM, GE, Teradata, Infosys, VMware, Pivotal, SAS and others will develop on and test out Apache Hadoop open source tools.  These major players in the big data space are joining forces to support open source software with the creation of industry association, Open Data Platform (ODP).

Continue reading here on +CIO


Tuesday 24 February 2015

IBM Accelerates Data Science Success for the Enterprise

IBM shifts data science into high gear today with the announcement of new In-Hadoop analytics technologies to accelerate the conversion of data into valuable insight for the business. IBM is delivering machine learning, R, and many new features that can run over large-scale data in the new IBM BigInsights for Apache Hadoop

Continue reading the announcement here from +IBM Big Data & Analytics

You can also read this article on it from +ZDNet as I think it describes it a bit more easily.

Tales from the big data front line: Are skills and size really holding companies back?

Italian companies are starting to embrace big data - analyzing vast amounts of data produced at speed by numerous sources - in order to boost their revenues and better understand their customers. But there's still a long way to go before the approach becomes mainstream.

Continue reading here on +ZDNet

Monday 23 February 2015

Big Data Will Lift Chief Data Officers (CDOs) Into Boardroom Seats

Fully 92 percent of CIOs believe Chief Data Officers (CDOs) will be the new keepers of data strategy and data quality within large enterprises -- with CDOs grabbing corporate boardroom seats by 2020, according to an Experian survey of 254 CIOs

Continue reading here on +Information Management

SLIDESHOW: 10 Chief Data Officer (CDO) Trends

Why are chief data officers (CDOs) in high demand – and how might their roles evolve within large enterprises worldwide? Some intriguing answers surfaced in “Dawn of the Chief Data Officer (CDO)” – a survey and research report from Experian. The survey involved 254 CIOs worldwide. Here are 10 key takeaways.

Look at the slideshow here on +Information Management

Sunday 22 February 2015

9 Generic Big Data Use Cases to Apply in Your Organization

Big Data means something different for every organization and every industry. What Big Data can do for your organization depends on the type of company, the amount of data that you have, the industry that you are in and a whole lot of other variables.

Continue reading here on +Datafloq

Big data digest: The backlash begins

Big data may have just crested the wave of inflated expectations and be barrelling towards the trough of disillusionment, at least if you’re following along with the Gartner Hype Cycle.

Continue reading here on IT World

Saturday 21 February 2015

Optimizing SQL Server I/O

In I/O terms SQL Server should be running on a platform that allows it to achieve a single figure latency (ms - millisecond) from the underlying storage.

This blog looks at various considerations and settings to help make IO quicker.

Getting Data Governance and Legal to Work Together

As data governance becomes increasingly part of the business (rather than an offshoot of IT), it must proactively seek to build links with other areas of the enterprise.

Continue reading here on +Information Management

Friday 20 February 2015

Data Science’s Most Used, Confused, and Abused Jargon

Big data is hot. A global system of networked devices now generates terabytes of data each second. Affordable storage makes it possible to record seemingly arbitrary amounts of information. And machine learning algorithms, together with distributed computing, increasingly rise to the task of extracting actionable intelligence from this information. But what precisely does "big data" mean?

Continue reading here on +KDnuggets

Improve the Customer Experience with Data Science as a Service

Many enterprises struggle with the concept of how using data can affect customers’ impression of the brand and their long-term loyalty. But clients that are extremely happy with your service are unlikely to try out a competitor.

Continue reading here on +Data Informed

Thursday 19 February 2015

Free Report on Real-World Active Learning

Machine learning isn't a set-it-and-forget-it operation. Even with solid examples, ML algorithms can still fail and end up blocking important emails, filtering out useful content, and causing a variety of other problems.

In this report, industry analyst Ted Cuzzillo examines real-world examples of active learning and you'll discover, the point at which algorithms fail is precisely where there's an opportunity to insert human judgment to actively improve the algorithm's performance

Get your free report here from +O'Reilly

There’s No Such Thing as Anonymous Data

About a decade ago, a hacker said to me, flatly, “Assume every card in your wallet is compromised, and proceed accordingly.” He was right. Consumers have adapted to a steady thrum of data breach notifications, random credit card charges, and out-of-the-blue card replacements.

Continue reading here from +Harvard Business Review

2015 Survey of Data Scientists Reveals Strategic Insights

CrowdFlower Report Reveals: Data Scientists Say Messy Data and Lack of Time for Analysis Are Their Top Career Obstacles.

2015 Survey of Data Scientists Uncovers What's Holding Them Back in Their Jobs and How Organizations Can Empower Them to Deliver Greater Strategic Value.

Read the blog entry from +Renette Youssef

Wednesday 18 February 2015

Free eBook - Practical Data Cleaning

19 Essential Tips to Scrub Your Dirty Data

Collecting and cleaning data can be very time consuming, but following a few simple rules can make the process much less painful.

Get this free e-book by +Lee Baker here

Predictive Analytics or Data Science?

I caught up with an old grad school friend a few weeks back. He's a top-notch statistician who's built a successful career working in quants departments of large insurance and health care companies. With little simplification, I'd characterize his role over the last 20 years as a predictive modeling expert. His work is primarily “big iron” -- revolving on Teradata, Oracle and SAS. Besides being a senior statistician, he's also a more-than-capable data integration and statistical programmer.

Continue reading here on +Information Management

Tuesday 17 February 2015

400 Categorized Job Titles For Data Scientists

Job titles for data scientists, including details about the simple but powerful classifier used to categorize these job titles. This analysis provides a break down per job category, and granular reports that you can download for free (job titles broken down per company, category and level), as well as NLP (natural language processing) source code. It is based on analysing connections from multiple LinkedIn profiles – totalling more than 10,000 professionals. The first study was published in June 2013.

Read it here on +BusinessIntelligence.com

Big data studies come with replication challenges


The truth can be hard to find with millions of data points and lots of room for error.  As researchers pan for nuggets of truth in big data studies, how do they know they haven't discovered fool's gold?

Read all about the challenges here on +Science News


Monday 16 February 2015

Free eBook - SQL Server Source Control Basics

If you want to implement database source control but aren't sure where to start, this free eBook gives a detailed walkthrough of database source control concepts, with code samples and clear examples.

Download here from +SQLServerCentral

How Big Data Improves Care at Children’s Healthcare of Atlanta

What began as a limited use of Hadoop at Children’s Healthcare of Atlanta is becoming a full-fledged big data initiative that is helping the organization provide better care for patients and deliver information that could potentially help citizens of Georgia avoid health problems in the future.

Read about it here on +Information Management

Sunday 15 February 2015

SLIDESHOW - 10 Phrases That Kill Big Data Projects

What are the 10 dangerous phrases and mindsets that CIOs need to refute? You’re about to find out. The phrases can also undermine big data projects – especially in organizations where emerging chief data officers (CDOs) face entrenched, risk-averse leaders who fear change. Here's the top 10 risky phrase list from Gartner, along with +Information Management ’s big data perspectives mixed in.

What’s the Biggest Big Data Challenge?

Doug Laney is a research VP at Gartner Research covering business analytics, big data use cases, “infonomics,” and other data-governance-related issues. He has also led a number of international analytics and information-management-related projects. Before joining Gartner, he launched the Meta Group's Enterprise Analytics Strategies research and advisory service and established and co-led the Deloitte Analytics Institute. He recently talked with +Information Management ’s editorial director John McCormick about one of the biggest big data challenges – the variety of data. What you can find here is an edited transcript of their conversation.

Saturday 14 February 2015

10 Predictions For The Big Data Analytics Space

The end and beginning of a new year are always filled with trends and predictions for the new year and I have one more interesting infographic that I would like to share with you. This infographic has been developed by Aureus Analytics and covers 10 different Big Data Analytics trends. Let’s briefly discuss a few of them here on +BusinessIntelligence.com

How Big Data Pieces, Technology, and Animals fit together

A great summary of Big Data from +KDnuggets pulling together information from a variety of source to give an explanation of what all the names of the objects that make it up mean.  I hadn't realised just how all of these connected and the similarity between the names to animals.

Read it here on +KDnuggets

Friday 13 February 2015

What’s the Biggest Big Data Challenge?

Doug Laney is a research VP at Gartner Research covering business analytics, big data use cases, “infonomics,” and other data-governance-related issues. He has also led a number of international analytics and information-management-related projects. Before joining Gartner, he launched the Meta Group's Enterprise Analytics Strategies research and advisory service and established and co-led the Deloitte Analytics Institute. He recently talked with Information Management’s editorial director John McCormick about one of the biggest big data challenges – the variety of data. What follows is an edited transcript of their conversation.

Read here on +Information Management

Big Data vs. Fast Data

When you hear the term "fast data," your first thought is probably the velocity of the data. That's not unusual in the realm of big data -- where velocity is one of the V's everyone talked about. However, fast data encompasses more than a data characteristic; it is about how quickly you can get and use insight.

Continue reading here on +Information Management

Thursday 12 February 2015

6 Tips For Being An Awesome Data Scientist

In 2012, Harvard Business Review cited data scientist as the sexiest job of the 21st century. Just two months ago LinkedIn shared the “25 Hottest Skills that Got People Hired in 2014” – guess what type of workers possessed these skills? This attention has been followed with a slew of articles telling budding analysts the skills they’ll need to get to the top of the data scientist food chain. We all know the usual list: strong background in statistics and other maths, programming skills, analytical skills, and so on. But what are the things that make an analyst great?

Read here on +BusinessIntelligence.com

Cheaper to Keep ’Em: Use Data to Reduce Customer Churn

If you think about all the costs involved in turning prospects into customers, including launching marketing campaigns, generating and following up on leads, etc., it's easy to see that keeping customers on board is critical to your company’s success. High customer turnover, also known as churn, not only consumes more resources on the marketing and sales side, but also can hurt your brand and keep your company from achieving its true growth potential.

Read more here on +Data Informed

Wednesday 11 February 2015

MapR to Offer Free Online Hadoop Training

Hadoop distribution provider MapR Technologies announced today that it is sponsoring a free online training and certification program in Hadoop for developers, analysts, and administrators.

The program is designed to expand worldwide adoption of Hadoop as a big data analytics technology.

Read more here on +Data Informed

Data Science 102: K-means clustering is not a free lunch

K-means is a widely used method in cluster analysis, but what are its underlying assumptions and drawbacks? David Robinson examines what happens for non-spherical data and unevenly sized clusters.

Read here on KD Nuggets

Tuesday 10 February 2015

Top 5 tips for startups to leverage Big Data successfully in 2015

When we talk about Big Data, we mostly refer to large enterprises, capable of affording big data products that are quite expensive or resource-intensive. Small businesses and startups are often out the picture. Yet, budding business ventures can employ an effective big data strategy and embarking on a data driven strategy with huge success. Here is how!

Continue reading here on +Big Data Made Simple

7 Data Management Tips and Tricks You Need to Learn Today

Data management is not just about backing up files or storing data in the cloud. Those activities are part and parcel of the fundamentals you should know about keeping your files in place, but that’s certainly not the end of it. You should also ensure that your data is properly protected and easily retrievable.

Continue reading this blog post by +Infinit Datum

Monday 9 February 2015

Visually Map Data with Google Fusion Tables: Tutorial

Ever wanted to quickly visually share some data with your colleagues or with the world and struggled with the tools available? After sharing the data, what if the viewer wanted to zoom in on a specific location, city or town to see what's going on there.

Google Fusion Tables is a free tool to show your data on a map & allow viewers to zoom in specific areas that they want to explore further. vHomeInsurance, a data driven home insurance analysis service, has detailed location data on home insurance rates & has used their data to create a guide to use Google Fusion tables to represent home insurance rates visually on a map.

Continue to read this blog post by +Vincent Granville

Data Governance Initiative expands the Hadoop ecosystem

Hadoop has, for the most part, moved beyond the proof-of-concept phase and the initial chasm of adoption. More and more organizations are putting the open-source framework to work on mountains of complex Big Data. The next step in Hadoop’s evolution is getting a handle on governance.

Continue reading here on +SD Times

Sunday 8 February 2015

3 Questions to Get the Most Out of Your Company’s Data

Our world is sentient. Websites watch where we look. Mobile applications keep track of our response times. Companies learn which buttons we like to press and which we don’t.  With cameras, microphones, and thermometers, the human race is giving inanimate objects everywhere eyes, ears, and skin. And with all this observation, we’ve created a massive new layer of information.

Continue to read here on +Harvard Business Review

Mobile BI Success

Between 2012 and 2014, mobile BI adoption shot up: Forrester survey data shows that the percentage of technology decision-makers who make some BI applications available on mobile devices has nearly quadrupled, and the percentage who state that BI is delivered exclusively via mobile devices has risen from 1% in 2012 to 7% in 2014.

Read the blog here on +Information Management

Saturday 7 February 2015

How Predictive Analytics Reinvents These Six Industries

Predictive analytics serves that very purpose by driving mass-scale processes empirically, guiding them with predictions generated from data. Millions of predictions a day improve decisions as to whom to call, mail, approve, test, diagnose, warn, investigate, incarcerate, set up on a date, and medicate.

Read the article here on +Information Management

IBM Named Hadoop Leader, Launches Big Data Mainframe

IBM's huge investments in Big Data initiatives seem to be paying off, as it was recently named a leader in Hadoop accessibility by a research firm and this week launched a new Hadoop-capable mainframe computer it describes as "one of the most sophisticated computer systems ever built."

Continue reading here on the Application Development Times website

Friday 6 February 2015

Textbook Examples: How Big Data Is Shaping Education Today

Big data has infiltrated almost every aspect of our day-to-day lives and education is no exception. Big data is changing the way students learn and receive instruction in and out of the classroom. Algorithms analyze behavior as well as performance on tests, quizzes, papers and all other aspects of school.

Read here on +BusinessIntelligence.com

Building Your Big Data Team in 2015 – Top 5 Pieces of Real-World Advice

There’s lots of advice out there on building a big data team, from industry or expert analysts and leading publications. But we wanted to see how this is being implemented in real life, so we talked to the real world big data mavericks – those who've faced the challenge of gaining true business value from big data and succeeded.

Click here to read this blog from +Pentaho

Thursday 5 February 2015

Shining Light on Dark Data

Dark Data is the ever-present, relatively unknown, and unmanaged volumes of data that exist in every corner of one’s business. Here’s what to do with it.

Read about it here on +KDnuggets

Top SlideShare Presentations on Big Data, updated

REST APIs and crawling offer two different ways to gather big data presentations from SlideShare, but they provide different results and lead to a very different view of the data. +KDnuggets  examine why and find a useful data science lesson.

Read here.

Wednesday 4 February 2015

Data scientists: How to hire and how to get the best from them

Data scientists are the among the most in-demand tech workers: CIOs reveal where they find the best candidates and how they use them.

Read more here on +ZDNet

Meet the ASF's newest Top-Level Projects: Apache BookKeeper and Apache Samza

Two more projects are graduating from the Apache Software Foundation’s (ASF) incubator this week. The organization has announced that both Apache BookKeeper and Apache Samza have become Top-Level Projects (TLPs).

Read more here on +SD Times 

Tuesday 3 February 2015

Update on the US DATA Act

Well, it’s now been about nine months -- and time to check in on the gestation of the US DATA Act.  But before we start on what’s happened since the law passed on May 9, 2014, let’s take a quick look at what it is, and what government organizations have to work with.

Read more on +Information Management

The Free 'Big Data' Sources Everyone Should Know

+Bernard Marr  always makes the point that data is everywhere – and that a lot of it is free. Companies don’t necessarily have to build their own massive data repositories before starting with big data analytics. The moves by companies and governments to put large amounts of information into the public domain have made large volumes of data accessible to everyone.

Read his blog here.

Monday 2 February 2015

How India's BJP used data analytics to swing voters

The Bharatiya Janata Party (BJP) won a landslide victory earlier this year on the back of a carefully scripted communication and public relations strategy. While it is no secret that it used social media effectively to communicate its message very little is known what went behind the scenes. Arvind Gupta the national technology head of India's ruling party finally reveals how his team used analytics to influence voters in the 2014 General Elections.

Read here on +PR Week

Microsoft to offer a paid version of its internal Cosmos big-data service

Microsoft is developing an Azure-hosted, paid version of its internal-facing Cosmos big-data computation, analysis and storage service.

Mary J Foley speculated last August that Microsoft was poised to make Cosmos one of its next big Azure service offerings. Based on information shared with me by sources, it appears this, indeed, is happening.

Read more here on +ZDNet

Sunday 1 February 2015

Microsoft Buys Revolution Analytics for Big Data Effort

Microsoft has acquired Revolution Analytics, which develops R-based predictive analytics solutions in the big data market.

"Revolution Analytics provides an enterprise-class platform for the development and deployment of R-based analytic solutions that can scale across large data warehouses and Hadoop systems.

Read here on +Information Management

SLIDESHOW - 10 Best Paying Big Data Career Skills

Big data and cloud experts earned the best U.S. technology paychecks in 2014, according to Dice.com’s 2014 salary survey results. But which 10 technology skill sets deliver those big salaries? Here are the top 10.

View here on +Information Management