Monday 30 November 2015

Microsoft's Graph wants to turn user data into business intelligence it can sell via @pcworld

How does data become information? Through context. And that’s what Microsoft’s new Microsoft Graph aims to do: Collect data points about you, then turn around and sell it to apps and services–with your permission, of course.

Interesting article from PCWorld.

Analyzing 1.1 billion NYC taxi and Uber trips, with a vengeance via @todd_schneider

An open-source exploration of the city's neighbourhoods, nightlife, airport traffic, and more, through the lens of publicly available taxi and Uber data by Todd Schneider

Great example of an analysis using public data well worth going through.

Sunday 29 November 2015

Secrets from winners of the @AnalyticsVidhya best ever Data Hackathon!

An excellent learning for beginners, this compilation is an exciting reference for all aspiring & professional Data Scientists to gear up!  Well worth a read.

Beginner’s guide to Web Scraping in Python (using BeautifulSoup) via @AnalyticsVidhya

In this post on AnalyticsVidhya Sunil Ray tkes us through how to web scrape in Python using BeautifulSoup. Nice cleat article with code - a great place to start if you ever need to do this but don't know how.

A recommendation system for blogs: Setting up the prerequisites [1/3] via @m__technologist

This is the first in a series of three blog posts where Thom Hopmans will elaborate on how we can build a recommendation engine for the readers on The Marketing Technologist (TMT). TMT currently has over fifty blog posts covering varying topics from Data Science to coding in ReactJS. Browsing through all the blog posts is time consuming, especially as the number of posts is still increasing. Also chances are readers are only interested in a select few blog posts that lie in their area of interest. If a recommendation engine is able to select those articles an user is interested in then this can definitely be classified as creating value from data and preventing information overload.

Nice start to the set of three blogs which takes you through some of the thought steps to follow as well comments on how to do it in Python. Good to look at even if you are not into Python.

Saturday 28 November 2015

Kaggle Bike Sharing Demand Prediction – How I got in top 5 percentile of participants? via @AnalyticsVidhya

From AnalyticsVidhya here's one of the Top 5 percentile Solution of Kaggle Bike Sharing Demand Prediction, take it as a reference for your next competition.

Included R code. Definitely worth a bookmark and a look next competition you enter on Kaggle.

Stream Processing with Apache Flink via @brakmic

Great blog entry from Harris explaining what Apache Flink is and how to process stream data with it. Includes examples and GUTHUB contains the source code.

Friday 27 November 2015

Get ignited with Apache Spark – Part 2 via @kumarchinnakali

Great post by Kumar Chinnakali discussing the Basics of Spark - Concepts like Resilient Distributed Datasets, Shared Variables, SparkContext, Transformations, Action, and Advantages of using Spark along with examples and when to use Spark.

The Data Science Industry: Who Does What via @DataCamp

Great Infographic from DataCamp.

Laetitia Van Cauwenberge has made the point on Data Science Central that that one of the core competencies of the data scientist is to automate the process of data analysis, as well as to create applications that run automatically in the background, sometimes in real-time, e.g. to find and bid on millions of Google keywords each day (eBay and Amazon do that, and most of these keywords have little or no historical performance data, so keyword aggregation algorithms - putting keywords in buckets - must be used to find the right bid based on expected conversion rates), buy or sell stocks,
monitor networks and generate automated alerts sent to the right people (to warn about a potential fraud, etc.) or to recommend products to a user, identify optimum pricing, manage inventory, or identify fake reviews (a problem that Amazon and Yelp have failed to solve to this day)

Great blog post from DataCamp.

Thursday 26 November 2015

WEBINAR: Understand the Use Cases and Enterprise Requirements for Big Data in the Cloud - 2 December 2015

Logo


We are excited to announce this 3-part webinar series, Analytics on the Cloud, that will help you build high-performance analytics and Big Data solutions.

Part 1: Understand the Use Cases and Enterprise Requirements for Big Data in
the Cloud

Wednesday, December 2, 2015 

Cloud computing is an deal match for big data projects — in the cloud, computing time and data storage are commodities. Yet, decision makers in charge of big data projects face important decisions, especially around which cloud option is the right choice for their needs.

This session will provide an overview of various use cases for big data on cloud, including a detailed review of enterprise requirements. You’ll then take a deep dive through a rich set of advanced analytics capabilities that allow you to analyze massive volumes of structured and unstructured data in their native formats. 


Register here

WEBINAR: Data Sharing & Governance - 2 December 2015


Data Sharing & Governance
Complimentary Web Seminar
December 2, 2015
12 PM ET/9 AM PT
Brought to you by Information Management
Data sharing and governance … is your enterprise’s data getting shared everywhere without any controls?
Sharing data is one of the great challenges that organizations face today. We all know that this data is valuable, and that it has uses in many different scenarios. However, the reputational risk and possible legal exposure that comes with using and sharing data inappropriately cannot be underestimated. These issues exist in all types of organizations, and even between parts of the same organization, especially as they cross national boundaries. Governing the sharing of data is one of the critical challenges for today’s data professionals.
Attend this session to learn how the Chief Data Steward at one of High Tech’s largest and fastest growing companies rolled out its data sharing and governance program. Learn how Peggy McCoy and the NetApp Enterprise Data Management team developed a successful data sharing process along with insight into the critical tasks, relationships and policies that make this program successful and sustainable.
Featured Presenters:
Speaker:
Peggy McCoy
Chief Data Steward
NetApp Inc.
Speaker:
Aaron Zornes
Chief Research Officer
The MDM Institute & Conference Chairman at MDM & Data Governance Summit


Register here


Introduction to Spark with Python via @KDnuggets

Get a handle on using Python with Spark with this hands-on data processing tutorial. By Srini Kadamati, Data Scientist at Dataquest.io on KDnuggets.

Interesting tutorial. Please note it is 3 pages long.

The Power, Promise and Pitfalls of Big Data via @infomgmt

Big data’s limitless potential only can be realized if people are capable of managing information, interpreting it correctly, and acting wisely.

Interesting viewpoint article by Joe Lodewyck on Information Management.  I think that even with all the tools, software and staff with the skills to use the data, we still need to make sure that the data is of good quality, and well organised, so we can actually use it and make correct conclusions/decisions based upon it.

Wednesday 25 November 2015

WEBINAR: Big Data and the Connected Car - 2 December 2015

Azure is the most open, broad and flexible cloud platform for every customers needs, regardless of the application, framework, data source or operating system their solution may require. Whether you’re interested in various Linux flavors, Docker, MongoDB, Hadoop or languages like Java, Python, PHP and Ruby, you will find first-class support for all.

Recent innovations in the Internet-enabled connected cars that we drive today have spawned a whole new set of opportunities and challenges for automakers. The opportunities come from the ability to capture detailed, current data on how drivers operate their vehicles and how those vehicles respond to that use. Join this webinar to learn how this data can become critical in uses such as preventative maintenance, product development, manufacturing optimization, infotainment & paid content, as well as recall avoidance. As usual, very hands-on approach, with lots of demos!

Register now for this live webinar to learn about Big Data and the Connected Car on Azure!



Dave Russell

Solution Engineer, Horton

WEBINAR: Jump over the Data Preparation Hurdle with Spark - 1 December 2015



Overview
Title: Jump over the Data Preparation Hurdle with Spark
Date: Tuesday, December 01, 2015
Time: 09:00 AM Pacific Standard Time
Duration: 1 hour
Summary
Jump over the Data Preparation Hurdle with Spark
Data scientists don’t scale. In using them to do manual data preparation, you’re missing a huge opportunity to extract the most value from your intellectual assets.
The good news? By automating and accelerating much of this raw data crunching and ETL work, you enable non-data scientists to do data preparation rapidly and simply—and ask their own questions and find their own answers. What’s more, in this new Big Data Discovery environment, answers come in minutes, not months. Data scientists are able to focus on Spark-driven advanced analytics that yield game-changing answers.
In this next DSC webinar, you will learn:
  • How to automate your data integration process to set up your organization to be truly data-driven
  • How to manage your data as a self-service feature at the speed of thought
  • How to effectively unearth big insights that effectively impact the bottom line in the most efficient cycles.
Speaker: Josh Och -- Platfora
Hosted by: Bill Vorhies, Editorial Director -- Data Science Central
Image result for platfora logo
Register here



Is the 'Internet of Things' the Most Over-Hyped Trend In IT? via @infomgmt

Beecham Research analysts are warning companies planning to get into the Internet of Things (IoT) market “not to believe all the hype and over optimistic predictions.”

Interesting article by David Weldon on Information Management. From my perspective there potential is good for this technology and analytics, but IoT items are not well enough established to make the use of this data effective enough yet.

7 Must Watch Documentaries on Statistics and Machine Learning via @AnalyticsVidhya

This list is released by Manish Saraswat on DataVidhya.  These movies reveal the smart use of data and machine learning to make our lives better.

It's a good idea to watch these as they might make you think more or even confirm something you've always believed.

Tuesday 24 November 2015

How Engineering Company Siemens Creates Value for Their Customers Using Big Data Analytics via @Datafloq

Siemens is a 168-years old engineering company that has prepared itself for the future. Recently, have really moved forward and combined their engineering capability with great new analytical capabilities to really help their customers perform better. In this article they look at three examples of how they are changing the game and creating lasting for their customers.


Goodbye Big Data, Hello Thick Data via @GreenBook @scribbett

Big Data is here to stay, but it’s only half the job - Thick Data fills the gaps and enables truly people-shaped or human-centred development and visceral business.

Interesting blog by Stephen Cribbett on Green Book I can definitely see the need for data to be more party focussed - that's how to get the best value from it for sure.

Monday 23 November 2015

“Shrinking bull’s-eye” algorithm speeds up complex modeling from days to hours via @mit

Algorithm may be applied to a broad range of complicated problems.

Great article from MIT News.  This sounds very exciting - I can't wait to see the results of it's use in a wider context.

Pinterest And Facebook Take Big Data To Another Level via @Forbes @BernardMarr

Big data is a critical cornerstone of most social media businesses and Pinterest is no exception. The underlying algorithms that make Pinterest successful and fun - the ones that suggest new pins you might like based on things you've liked before, for example - are an example of big data at its finest.

Great article (as always) from Bernard Marr on Forbes. Please note this is a 2 page article.

Sunday 22 November 2015

Business Intelligence and The Curve of Predictive Analytics via @intelligent_app

Business Intelligence basically connects the computers and intelligent algorithms (IA) to work in tandem and produce predictions using historic data. They help the business owners make good decisions. It used to be a simple task at some point in the past, and making such analyses were easy to do.

Continue reading here on the Intelligent Analytics website

Mind your Database Ps and Qs via @drsql

I heartfelt plea to all those that create or share code/databases/anything else - documentation, structure and comments are vital if you want anyone else to be able to understand what you have done.

Great blog by Louis Davidson.  I would go a bit further - if you have an ego and want everyone to think the best of you and that what you have done is great/brilliant/amazing then structure it well, leave comments in the code, provide clear documentation and then it will be easy and obvious to see just how clever you really are.

Saturday 21 November 2015

11 Essential Tips for Effective Data Collection @chi2innovations

We live in an increasingly rich world of data – the amount of data that currently exists doubles every 18 months. That’s a phenomenal rate of growth and we’re just at the beginning of an incredible journey creating awesome intelligent applications that can handle these unimaginable amounts of data automatically.

Interesting blog. I don't necessarily agree with everything (not sure you need to start on paper), but it points out a number of things that should be obvious but might not be.

Facebook M — The Anti-Turing Test via @arikaleph

Facebook's new AI, called M, is said to have capabilities that far exceed those of competing AIs. Some people claim that it's actually human-assisted but M insists it's an AI. A Turing Test won't work here because M's objective is precisely to not pass a Turing test. This is a fun exploration of how to test what's really going on behind the scenes.

Very interesting post from Arik Sosman which can be found here.

Friday 20 November 2015

Computer, respond to this email via @googleresearch

Last week, Google announced that the Inbox by Gmail mobile apps for Android and iOS will now include a free tool, Smart Reply, which uses AI to scan the contents of messages, pick three of a possible 20,000 common responses and suggest them to you. Sadly, Smart Reply no longer has an overwhelming tendency to suggest the response "I love you" to almost anything.

Great research blog from Google. I love the way the use of machine intelligence is going.

Needed: More women in data science via @Stanford

A recent gathering at Stanford on the emerging science of big data turned the usual gender ratio of science conferences on its head.  What big data needs now is for more women to move into the field, said Persis Drell, dean of Stanford's School of Engineering. Drell and other female scientists said they've long had the experience of being surrounded by men on all sides at science conferences. So the inaugural Women in Data Science conference held at the Arrillaga Alumni Center Nov. 2 was noteworthy not only for its depth of experts but also because it was a rare all-female science meeting.

Great report on the conference from Stanford.

Thursday 19 November 2015

A Call For An Analytics Web API Standard via @infomgmt

Wouldn't it be a dream to assemble Big Data, IoT and analytics in the cloud using different software vendors?

Interesting article from Information Management pointing out we have a pressing need for a standard for web based APIs.  Lets hope one can be agreed on soon to avoid rework for essentially the same thing.

Analytics Challenge Celebrates Top College Data Analytics Talent and Luring Top Campus Talent to the Field of Data Analytics via @infomgmt

Many universities and technology vendor companies are taking steps to make the field of data analytics more appealing, and offering students more real-world exposure to the power of data and business intelligence. A case in point is the Adobe Analytics Challenge.

A two part article from Information Management focussing on how to get students into data and analytics.

Analytics Challenge Celebrates Top College Data Analytics Talent can be found here

Luring Top Campus Talent to the Field of Data Analytics can be found here


Wednesday 18 November 2015

WEBINAR: Best Fit Engineering for SQL on Hadoop - 24 November 2015


Overview
Title: Best Fit Engineering for SQL on Hadoop
Date: Tuesday, November 24, 2015
Time: 09:00 AM Pacific Standard Time
Duration: 1 hour
Summary

Best Fit Engineering for SQL on Hadoop
Join us for our latest DSC Webinar series as we discuss how enterprises have increasingly large volumes of structured and semi-structured data generated by all sorts of applications.  Much of that data is increasingly finding its way into Hadoop clusters for analytics because of its versatility and the economical, linear scalability of both data storage and compute.  And SQL is still the best option for querying it:
  • SQL is the universal connector to many BI tools and technologies
  • Prevalent SQL skills overcome the Hadoop skills gap
  • Hadooponomics enables more analytics on more data at a much lower cost
Forrester recently concluded that organizations need to choose more than one SQL-on-Hadoop tool to satisfy all requirements. Hortonworks and Teradata agree in this “best fit engineering” approach designed to match the benefits of each tool set to map to actual workload requirements, while remaining true to 100% open source innovation. 
You will learn about SQL on Hadoop best practices, including:
  • A brief history of SQL on Hadoop
  • Architecture and use cases for Hive and Presto
  • Technical deep dive and futures for Hive and Presto 
Speakers: 
Mark Shainman, Program Manager -- Teradata
Mark Lochbihler, Director, Partner Engineering -- Hortonworks
Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Teradata Hortonworks logo

Register here




Big science problems, big data solutions via @radar

Scientists from across the world, industries, and disciplines, are coming together to solve the frontier scientific problems affecting the world, from astronomical to organismal, from molecular all the way down to subatomic physics, using big data analytics solutions. This is a fascinating look at how Lawrence Berkeley National Lab’s supercomputing centre is tackling 10 data analytics problems across the sciences.

Interesting article from the O'Reilly Radar.  Please note access to the article is free but you will have to register to see it. I particularly like problems 1,6,and 7 and would love to see the results of them..

New Milestones in Artificial Intelligence Research via @facebook

Facebook CTO Mike Schroepfer announced some milestones achieved in Facebook’s long-term artificial intelligence research (FAIR) including: the intersection of natural language understanding and image recognition technology; predictive learning goals, and a system that taught machines to distinguish objects in a photo 30 percent faster with 10 times less training. FAIR plans to release a paper outlining the system in December.

Interesting developments in this newsroom article.

Tuesday 17 November 2015

WEBINAR: Empowering the Citizen Data Scientist with Self-Service Advanced Analytics - 19 November 2015






Empowering the Citizen Data Scientist with Self-Service Advanced Analytics Date: Thursday, November 19th  Time: 11 a.m. ET (60 min)
TwoPeople_Laptop.jpgThe tsunami of valuable data has hampered the ability of the few highly skilled technologists to take advantage of it. To bridge the gap, more and more companies are providing advanced analytics tools to its super users – a group Gartner calls “citizen data scientists.” In this live, one-hour webinar, you’ll learn how democratizing data can help:
  • quickly place massive data sets into a business context and provide critical insight
  • more easily tap into predictive modeling and other sophisticated analyses once reserved for data technologists
  • explore new methods of analyzing data that can lead to cutting-edge decision making.
Presenters:Dan Donovan, Lead Technology Evangelist, Lavastorm 
Dan Donovan is the Lead Technology Evangelist at Lavastorm and served in several strategic roles including Head of Partner Development and Director of Customer Solutions. Earlier in his career he held various technical roles at Telution, IKON and Dan_donovan-Headshot.pngFreeDrive. With over 15 years of software experience, Mr. Donovan has a deep understanding of the evolving big data market and the challenges that enterprises face with blending complex, large-volume data sets, often from disparate sources, in order to produce accurate data insights for the business. He earned a BS in chemistry and computer science from the University of Illinois.


Register here

Prepare Now for the IoT Revolution via @Data_Informed

Gadi Lenz of AGT International discusses the challenges that companies will face in handling the new type of data that the Internet of Things will deliver.

Good article by Gadi Lenz on Data Informed.  I agree with him - we need to work out the how and the what or all this data we are going to get from IoT.

SLIDESHOW: Gartner’s Top 10 Technology Trends for 2016 via @infomgmt

David Cearley, vice president and Gartner Fellow at Gartner Group, shares his thoughts on the Top 10 Strategic Technology Trends that will impact IT leaders and data analytics in 2016.

Look at the slideshow here on Information Management.  A few things we expected and a few new new ones as well.  Good to know the direction everything is going.

Monday 16 November 2015

Categorizing Numeric Variables -- a Cautionary Tale via @infomgmt

Three takeaways from Frank's Harrell's lectures I've always kept close are to be attentive to both non-linearity and interaction effects among independent variables, and to be wary of categorizing continuous, numeric attributes.

Good article from Information Management.  Please read carefully as there are some links embedded in the article text that take you to some further information.

Turning a numeric variable into a categorical one is a risky thing to do and can change the direction of an analysis. During the phase you are cleaning the data before doing the analysis I would be tempted to change them all to a character value just to make sure there are no errors or misuse later on.

10 traits and tenets of Big Data via @intelligent_app

In this article by Intelligent Applications they describe some of the traits or tenets of Big Data that we should all be thinking about when researching Big Data. Well worth reading and making sure that you included all of them when you have/are going to implement some kind of Big Data solution.

I see Validity as key as if the data is not valid, any insight or result you gain based on it, is worthless.

Sunday 15 November 2015

Google Offers Free Software in Bid to Gain an Edge in Machine Learning via @nytimesbits

Google is making much of its machine-learning technology (TensorFlow) freely available as open-source software. This sounds great on the face of it, but it hinges on several factors:

  • How will it be maintained?
  • Who will maintain it?
  • How good actually is it?

I look forward to the answers to those questions.

Get ignited with Apache Spark – Part 1 via @kumarchinnakali

Great article containing an overview of Spark and it's components from Kumar Chinnakali on Big Data Made Simple. Well worth a read to refresh yourself if nothing else. Make sure you return to get the next part from him.

Saturday 14 November 2015

An intelligent approach to AI via @sdtimes

There are approximately 11 million developers in the world, creating all manner of software for all manner of business. Much of this software would benefit from data intelligence to gain efficiencies and greater revenue for the business. However, there are approximately 100,000 data scientists in the world, and they are expensive.

Interesting article from SD Times.

Facebook touts advancements in AI via @sdtimes

Facebook wants to be more than just a social media platform. The company has been dabbling in virtual reality, software development, open-source software, Internet access, and more recently artificial intelligence.

“Many people think of Facebook as just the big blue app, or even as the website, but in recent years we've been building a family of apps and services that provide a wide range of ways for people to connect and share,” wrote Facebook CTO Mike Schroepfer in a blog post. “From text to photos, through video and soon VR, the amount of information being generated in the world is only increasing. The best way I can think of to keep pace with this growth is to build intelligent systems that will help us sort through the deluge of content.

Read about it here

Friday 13 November 2015

Top 5 Business Intelligence Myths Revealed via @BigData_Review

A recent article in Information Age discusses the top five myths about Business Intelligence software, and since the main purpose is to help you make the best possible decisions on enterprise-class solutions for your organization, Tim King has summarised the post and share his own thoughts.

Great article which should make you think a bit harder about the 5 myths he has written about.  I completely agree with all of his points.


Streaming Analytics at Heart of Toyota's New AI and Robotics Labs via @infomgmt

Toyota Motor Corp. will spend $1 billion to form a research institute focused on artificial intelligence and robotics, as the world’s largest auto maker looks to elevate its role in reducing traffic fatalities.

Interesting news - I can't wait to see what comes out of all this investment.

Thursday 12 November 2015

WEBINAR: How Data Science is Preventing College Dropouts and Advancing Student Success - 17 November 2015

Overview
Title: How Data Science is Preventing College Dropouts and Advancing Student Success
Date: Tuesday, November 17, 2015
Time: 09:00 AM Pacific Standard Time
Duration: 1 hour
Summary
How Data Science is Preventing College Dropouts and Advancing Student Success
Educational institutions have a wealth of data around student demographics, admissions, academic performance and more. In addition to these structured data sources, they also have unstructured data from sources such as student activity on academic discussion forums, campus network access and ID card usage. All of these data sources can be brought together in an institutional data lake to predict and influence student behavior – including attendance patterns, academic performance and time to graduate.
In this next DSC webinar, two Pivotal data scientists will discuss a recent collaborative project with a top university, in which many data sources were used to build a 360-degree profile of student activity on campus and help predict student success. The session will also provide an overview of the data science pipelines that were developed for training and scoring multiple models in parallel, in-database. These pipelines are now being used to predict student metrics (such as GPA, course grade and time to graduate), and even as intervention tools to help prevent students from dropping out.
Speakers: 
Regunathan Radhakrishnan, Principal Data Scientist -- Pivotal
Srivatsan Ramanujam, 
Principal Data Scientist -- Pivotal
Hosted by: Bill Vorhies, Editorial Director -- Data Science Central
Register here



Big Data Investments Pay Off Big, But Oh Those Costs via @infomgmt

Big data investments are paying off in a big way this year, with organizations that are investing in the ‘big-four’ technology trends experiencing up to 53 percent higher revenue growth.

I particularly found the reasons why people are not investing in Big Data interesting at the end of the article.

TED Playlist for Artificial Intelligence via @TEDTalks

Fascinating playlist of TED AI talks for smart, non-experts. There are 7 talks to choose from that range from 10-20 minutes each.

If you have some time I can highly recommend you spend some time and watch them.  I guarantee you will learn something.

Wednesday 11 November 2015

WEBINAR: Improving Efficiency and Accessibility in the Cloud - 19 November 2015


Improving Efficiency and Accessibility in the Cloud
Complimentary Web Seminar
November 19, 2015
1 PM ET/10 AM PT
45 Minutes
Brought to you by Information Management
Businesses today can gain efficiencies as well as save time and money working across the boundaries of computing platforms. This webinar will explore best practice strategies on how to achieve simple, seamless, and secure access to all your mainframe data as you build out new cloud and mobile apps.
Additionally, we will discuss how to attain the right processes to ensure the power of the mainframe is as easy as accessing any database, object, document, or spreadsheet ultimately bringing the agility needed to speed app development.
Sponsored by:
Sponsor

Register here


America's Big Data Obesity Problem via @infomgmt

Our data centers are obese, bloated with terabytes of big data with minimal “nutritional value”, and splitting at their seams. More importantly, it’s negatively impacting your bottom line.

Strange article from Information Management.  It does however highlight some important points.


  • We are all guilty of storing and keeping too much data - be that at home on our computer, in our filing cabinet (both at home and at work), or within our company.  Yes I know some data has to be kept for a specified number of years for regulatory reasons, but do we need it all available immediately?  
  • Do we have an archive strategy?  I know the cost of storage is going down but it is not an excuse for keeping everything.
  • Do we even have valid/correct data?  There is no point doing analyses on incorrect data which either wastes time, money or both and can subsequently lead to bad decisions.





5 Best Machine Learning APIs for Data Science via @KDnuggets @dezyreonline

Machine Learning APIs make it easy for developers to develop predictive applications. Here we review 5 important Machine Learning APIs: IBM Watson, Microsoft Azure Machine Learning, Google Prediction API, Amazon Machine Learning API, and BigML.

Very useful list worth reading and understanding.  Post from KDnuggets by Khushbu Shah.

Tuesday 10 November 2015

Adoption of Self-service Analytics Tools Remains Stagnant via @infomgmt


Despite general agreement that self-service analytics tools are important to the organization, adoption of these tools remains stagnant.

Article with some statistics to prove their point from Information Management.  Part of me is surprised - often business users get frustrated so I would anticipate they would embrace the chance to have some freedom and choice, and part of me is not - they probably lack some necessary skills and knowledge to make use of it.

Predictive Analytics Captures Value in Every Business Sector via @Data_Informed @gilesnelson

Giles Nelson of Software AG discusses the role that predictive analytics can play in improving services and revenues in various industries.

Very interesting article by Giles from Data Informed. I can see a few potential holes in what he can foresee.  Some people are a bit paranoid so tick boxes to not contact or don't enter information, and also don't have location turned on their mobile phone.  I do however see the benefits of predictive analytics as a means to work out future customer patterns and behaviour.

Monday 9 November 2015

WEBINAR: Use Real-Time Data to Make Better Decisions - 12 November 2015




Use Real-Time Data to Make Better Decisions
Complimentary Web Seminar
November 12, 2015
2 PM ET/11 AM PT
Brought to you by Information Management
Organizations want to benefit and make better decisions from enterprise data by combining it with new real-time streams of information: Web logs, customer behaviors, social media and the Internet of things. But many businesses are unsure how to go about this, what is involved or even if they can do it without lots of third-party help.
Join us as Data Management domain experts Matthew Magne and Clark Bradley discuss trends in self-service data preparation, runtime flexibility, and how SAS can help organizations manage their data where it lives for improved governance and productivity.
Featured Presenters:
Moderator:
Jim Ericson
Consultant
Editor Emeritus
Information Management
Speaker:
Matthew Magne
Principal Product Marketing Manager of Data Management
SAS
Speaker:
Clark Bradley
Principal Technical Architect, Global Data Management Practice
SAS


Sponsor

Register here

35 Metrics You Should Use to Monitor Data Governance via @Datafloq @lhen71

Data Governance (DG) is the most difficult area of the business to work in. When all is working and no issues are causing problems, your efforts go unnoticed. When issues arise and things go wrong Data Governance is the first area to blame – processes don’t work, you are not controlling this properly! Sound familiar? Well, here is a list of 35 metrics that will help you improve your Data Governance.

Great list of metrics - well worth a read as it will start you thinking.

The Ultimate Comparison: Data Science vs Analytics via @Datafloq @kogodbizonline

Depending on how much you know about big data, you may be surprised to learn that a data scientist and a business analyst don’t provide the same results. If that’s the case, then you’re not alone—since these two professions are often confused with one another. To learn more about the differences between data scientists and business analysts, check out the infographic to make sure you’re hiring the right type of professional to meet your unique business needs.

Great article with some good points.

Sunday 8 November 2015

Despite Best Intentions, Most Organizations Misinterpret, Misuse Data via @infomgmt

Organizations for the most part agree on the great value of corporate date. Unfortunately, for the most part data professionals believe their organizations do a poor job of interpreting and using that data.

Interesting article from Information Management.

Insights You Werent Expecting from Big Data via @7wdata

Many organizations often don’t realize how much value there might be in all of the content in their various information sources. But those organizations that do recognize this potential are generating insights from their own internal big data and advanced analytics techniques. Read more on 7wData.

Saturday 7 November 2015

7 tools in every data scientist’s toolbox via @crossentropy

Nice collection of statistical and machine learning concepts that are widely used and consistently useful in a large variety of domains and problem settings.  Good to see them in the same place as it encapsulates some of the main tools used by a Data Scientist.

Google Turning Its Lucrative Web Search Over to AI Machines via @business

Google has been one of the biggest corporate sponsors of AI and has invested heavily in it for videos, speech, translation, and, of course, search. Here's an overview of the recently announced RankBrain, including a high-level view of how it works and how it fits into Google's search algorithms.

Fascinating insight into Google and their use and expansion into AI.

Friday 6 November 2015

WEBINAR: How to Develop a Recommendation Engine in Spark - 10 November 2015

WEBINAR: Microsoft Developer Division’s Journey to DevOps - 2 December 2015

Microsoft
Microsoft Developer Division’s Journey to DevOps
DATE: Wednesday, December 2, 2015
TIME: 1PM ET
In 2010, Microsoft’s Developer Division began Visual Studio Online, the Software-As-A-Service (SaaS) offering based on Team Foundation Server. This is the story of moving a traditional software business to SaaS, Cloud-First Development and Seven Habits of Effective DevOps:
  • Team Autonomy and Enterprise Alignment
  • Management of Technical Debt
  • The Flow of Customer Value
  • Hypothesis-Driven Development
  • Evidence Gathered in Production
  • Production-First Mindset
  • Managing Infrastructure as a Flexible Resource
Underneath, the technologies included enterprise git, a modern release pipeline, automated testing, usage and performance monitoring, log analysis, a data-driven backlog, lean cycle metrics, and public cloud hosting. The talk combines the cultural transformation, experiences and practices, technical choices, metrics, and shows how you can apply them to start your journey.
FEATURED SPEAKER:
 Microsoft-SamGuckenheimer-GTW
Sam Guckenheimer, Product Owner, Microsoft

Register here


The Human Element of Data Science via @KDnuggets

By Derek Steer, CEO at Mode. WrangleConf 2015 posted on KDnuggets.  About 100 data scientists gathered for the first ever WrangleConf (hosted by Cloudera) to explore the topic of the importance of humans throughout the data science process.

Really interesting post which really makes you think.  Well worth a read and thinking about it.

Using Linear Regression to Predict Energy Output of a Power Plant via @datascienceplus

Teja Kodali describes how to use linear regression to predict the energy output of a power plant on Data Science +

Contains R code and a walked example.

Thursday 5 November 2015

WEBINAR: Best Practices for Delivering End-user Governed Data - 11 November 2015

logo



Best Practices for Delivering End-user Governed Data

Gain a competitive advantage and improve data quality,
all while managing risk with Forrester Analyst, Michele Goetz


It is no longer enough for companies to integrate, blend and optimize their data for analytics.  Today, it is equally important to ensure that data is delivered to end-users in a highly governed way.   But, many organizations are scrambling to just keep up with evolving business demands for data and are stretched for time to focus on quality and security.

Based a Pentaho-commissioned study of 164 business and IT leaders, guest Forrester Analyst, Michele Goetz will discuss what is required to ensure data quality, accuracy, consistency, and ultimately usability of data with the right mix of data governance, technology and process.  Join our webinar on 11/11 at 8 am PST/ 11 am EST to hear about the four key factors that your business needs to consider to create a streamlined process for delivering data to your end-users, including:


  • Managing the various data sources involved
  • Maintaining data quality and properly securing data
  • Keeping with the needs of your business
  • Collaborating with key business stakeholders cross-functionally


Guest Speaker: Michele Goetz, Forrester Analyst
Moderator: Chuck Yarbrough, Pentaho Director of Product Marketing
Date: Wednesday, November 11 at 8 am PST/ 11 am EST

Register here








WEBINAR: From Visual Analysis to Presentation - 10 November 2015




Overview
Title: From Visual Analysis to Presentation
Date: Tuesday, November 10, 2015
Time: 09:00 AM Pacific Standard Time
Duration: 1 hour
Summary

From Visual Analysis to Presentation
While most of us understand how to analyze our data using visualization quite well, and well aware of the overall importance of using visualization to present our findings, the process of going from one to the other is still often a challenge and otherwise poorly understood.
In our next DSC Webinar Series, Data Visualization Research Scientist Robert Kosara, will walk through an example of an analysis project to illustrate the process of analyzing data, finding the key insights, and then turning them into an effective presentation.
Speaker: 
Robert Kosara, Data Visualization Research Scientist -- Tableau Software
Hosted by: 
Bill Vorhies
, Editorial Director -- Data Science Central
  
Register here

WEBINAR: Three Things You Need to Know About Document Data Modeling in NoSQL - 12 November 2015

sdtimes


Three Things You Need to Know About Document Data Modeling in NoSQL
DATE: Thursday, November 12, 2015
TIME: 1PM ET
We’re all familiar with modeling data the relational way. When we move to a document database we need to think about things a little differently. In this webinar, we’ll show you how to best plan, model and maintain your data using a NoSQL document database.
Join Matthew Revell, Dir. Developer Advocacy at Couchbase, as he takes you through some brand new NoSQL capabilities that will allow you more flexibility when modeling your data including:
  • Basic hygiene for modelling sustainable JSON documents
  • Creating manual secondary indexes
  • Automating secondary indexing with views
  • Modelling documents for the best level of query-ability

FEATURED SPEAKER:
Matthew-Revell-400x400
Matthew Revell
Dir. Developer Advocacy, Couchbase

Register here

Beyond algorithms: Optimizing the search experience via @radar

Daniel Tunkelang describes why augmented intelligence trumps artificial intelligence in O'Reilly Radar.  Good to understand this.

Visual Information Theory via @ch402

Chris Olah's post is an interesting read on a visual approach to probability and information theory. It's worth a look.

I really like this article and the clear way it explains everything. My advice would be to get a hot drink and sit and read this carefully - it needs your full attention and commitment to read through to the end.

Wednesday 4 November 2015

Michael Dell: What our $67bn merger means for Dell and EMC via @ZDNet @NickJHeath

Michael Dell reveals his plans for the future of the soon-to-be-merged Dell/EMC -- touching on what will happen to overlapping products and services and how the company will serve the growing cloud market.

Great write-up from ZDNet.  I'm genuinely excited to see what happens going forward for the IOT side of things as well as the Cloud area.  As with all mergers there will be some pain, but they seem to have the right attitude to start with, by seeing the strengths of each when they have what appear (on the face of it) to be an overlap.

WEBINAR: Facing the Future of Data Modeling 10 November 2015



Great Scott! Dealing with New Datatypes
Tuesday, November 10, 2015
11:00am Pacific / 2:00pm Eastern
Data modeling is going back to the future! No, it doesn't include a hoverboard (yet), but it does include some new datatypes that capture temporal and spatial information. In the past, datatypes were used to classify various types of data, whether integers, characters, or alphanumeric strings. With the technologies introduced in recent years, these basic datatypes can’t address everything – data modelers now need more specialized datatypes for specific needs and new formats.
Multiple database platforms have introduced new datatypes that can make it easier to support more advanced data concepts in physical data models. If you do not know about what new things are happening in the physical data modeling world, or what to do with them, Karen Lopez will discuss using a variety of new datatypes including:
  • Temporal, such as period, with keywords
  • Spatial, including geospatial
  • Others, incorporating JSON/BSON/UBJSON usage



About Karen Lopez
Karen López is Senior Project Manager and Architect at InfoAdvisors. She has more than twenty years of experience in helping organizations implement large, multi-project programs. She is a Microsoft MVP, SQL Server, and a Dun and Bradstreet MVP.
InfoAdvisors is a Toronto-based data management consulting firm. We specialize in the practical application of data management. Our philosophy is based on assessing the cost, benefit, and risk of any technique to meet the specific needs of our client organizations.
Register here

Analytics and Big Data: The Skeptics vs. the Enthusiasts via @infomgmt

A closer look at the advocates and the naysayers of business analytics -- and what's at stake for both parties.  I think some of this is really looking at Big Data not Analytics, but a good read and make provoke some thought.

Random vs Pseudo-random – How to Tell the Difference via @KDnuggets

Statistical know-how is an integral part of Data Science. Explore randomness vs. pseudo-randomness in this explanatory post from KDnuggets with examples.

We all need to read posts like this - if for no other reason but to remind us of things we may have forgotten.

Tuesday 3 November 2015

Sqoop vs. Flume – Battle of the Hadoop ETL tools via @dezyreonline

Good and detailed article by DeZyre on the differences between Sqoop and Flume. Includes the features, how they actually work, and who actually uses them.  Well worth a read even if you have never heard of these before.

Data Science for Losers, Part 3 via @brakmic

In this part 3 Harris is taking us all through Apache Spark.  He explains the structure of Spark and begins to describe how to use it.  I'm looking forward to more on this in a later blog entry.

You can find the blog article for part 3 here

Monday 2 November 2015

How Applications of Big Data Drive Industries via @simplilearn

Comprehensive analysis of 10 industry areas and their use or potential use of Big Data by Maryanne Gaitho on Simplilearn.

Where's The Money In Data (Part V) via @infomgmt

The final entry in Anne Buff's five-part data monetisation series focuses on new business models. Contains links to the previous 4 parts if you missed them.

Well worth a read - makes you think about data in a slightly more focussed way.