Data: May 2017

Wednesday, 31 May 2017

Breaking the data prep barrier by Eliot Knudsen via @infomgmt

Unification technologies, like those created at MIT’s Computer Science and AI Lab, are helping companies make better use of analytics.

I like this well thought out and written article. I love the concept of Data Unification and the potential benefits - just read this and think about how it could provide great benefits for your organisation.

Tuesday, 30 May 2017

Top 15 Python libraries for data science by @ibobriakov via @Medium

Here's a list of Python libraries for working with data, broken down by: core libraries (like NumPy and pandas) visualisation, machine learning, natural language processing, data mining, and statistics.

If you are learning or are into Python this is an essential list and worth checking in case you missed one.

Monday, 29 May 2017

The Guerrilla Guide to Machine Learning with R by Matthew Mayo via @kdnuggets

This post is a lean look at learning machine learning with R. It is a complete, if very short, course for the quick study hacker with no time (or patience) to spare.

This is a great article and includes many instructional videos and links to find out more.

Sunday, 28 May 2017

How You Can Improve Customer Experience With Fast Data Analytics by @Ronald_vanLoon and @jKoolCloud via @DataScienceCtrl

Using Fast Data Analytics you can take your data mining and analytics to the next level to improve customer service and your business’ overall customer experience faster than you ever thought possible.

This is a great article and really gives examples of what is possible if you use fast data analytics. Definitely something that needs to be investigated and incorporated into your plans even if you aren't in a position to do this right now.

Saturday, 27 May 2017

The information management challenges of data monetisation by David M. Raab via @infomgmt

Organisations must adjust to blurred lines between known and anonymous customer identity information.

I agree with this article - nowadays the data available will contain data that can but also cannot be connected to known data. All organisations need to have a strategy on this data . I recall in the past the unknown data was ignored and deleted. Nowadays this is missing a valuable source of information and potential customers. My advice would be to have default values for all fields so that you do not have any empty fields and do not find that joins will not work or have to be outer ones. That way you can use everything as much as you can.

Friday, 26 May 2017

WEBINAR: Advancing your business intelligence with location analytics - 1 June 2017

Web Seminar Advancing your business intelligence with location analytics

Jun. 01, 2017 | 2 PM ET/11 AM PT
Hosted by Information Management

Nearly 70 percent of business data contains some level of location information. But business analysts rarely use this data within their BI and analytics workflows.

Location analytics is a powerful way to put this trapped information to use in visualisations and reports that identify patterns and opportunities to make better business decisions.

Esri and SAS will demonstrate the role that location analytics can play, including:

The right uses for location analytics
Skill sets and resources required to get the most from location data
Best practices and strategies

Featured Presenters:


AJ Rice Global Alliance Manager Esri (Presenter)	Rick Styll Senior Manager, Visual Analytics Product Management SAS (Presenter)	Jim Ericson Consultant, Editor Emeritus Information Management (Moder

Sponsor Content From:

Sponsor

Cloud storage keeps data out of reach of criminals by @peternowak via @globeandmail

As chief information security officer for Amazon Web Services, Stephen Schmidt is surprised by how many businesses still fail to see the dangers of storing information on computers and servers in their offices rather than in the cloud.

Important points have been made in this well thought out article by Peter which I think can be used as another justification for moving data to the cloud.

Thursday, 25 May 2017

WEBINAR: Unlocking Big Data Insights with Machine Learning and Spark - 31 May 2017

Summary

A new era of Data Science and Big Data access, processing, and visualisation is finally here.

On May 31st at 1 PM EDT (10 AM PT) join Dean Abbott, internationally recognised data mining and predictive analytics expert and Dr. Mamdouh Refaat, Senior Vice President and Chief Data Scientist at Angoss, for a discussion on:

Big Data technologies and trends
Democratising Big Data analytics
Big Data and the Cloud
Importance of Data Science: Predictive Analytics, Machine Learning, AI, and BI
Overcoming Big Data challenges associated with: data access, data processing, data visualisation, and deployment
Application of Data Science and Big Data in: Banking, Insurance, Retail, and Telco
Operationalising your business and eliminating costs associated with proprietary database warehouse appliances and numerous analytics applications
Finding a collaborative data science platform that unifies infrastructure, technology, and data science teams

Abstract:

It is inevitable that organisations will continue to accrue vast amounts of data, not only from traditional sources that are product level focused but also from digital outlets such as mobile devices, social media networks or the Internet of Things. Accumulation of data collected from these sources is also known as Big Data. Regardless of where the data comes from organisations have instinctively determined that Big Data is a precious asset, one that can positively shape the direction of the business.

As organizations make the shift towards Big Data they direct their focus towards:

Adopting Big Data technologies like Hadoop & Spark
Implementing open standards and libraries
Using new data sources for decision making
Creating heterogeneous analytics teams that are comprised of various skills and tools
Integrating with Enterprise applications and BI tools
IT centralisation
Increasing focus on Security and Governance

Don’t miss the May 31st A New Era of Data Science - Unlocking Big Data Insights with Machine Learning and Spark Webcast with Dean Abbott and Dr. Mamdouh Refaat as they take you on a journey that explores the potential of Big Data and showcases how effortless analytics on Big Data can be, using a single, fully-integrated Data Science Platform.

Speakers

Dean Abbott
Co-Founder and Chief Data Scientist
SmarterHQ

Dean Abbott is Co-Founder and Chief Data Scientist of SmarterHQ.

Mr. Abbott is an internationally recognised data mining and predictive analytics expert with three decades of experience applying advanced data mining algorithms, data preparation techniques, and data visualisation methods to real-world problems, including customer analytics, fraud detection, risk modelling, text mining, survey analysis, planned giving, and many more.

Mr. Abbott is the author of Applied Predictive Analytics (Wiley, 2014) and co-author of IBM SPSS Modeller Cookbook (Packt Publishing, 2013). He is a highly-regarded and popular keynote and technical track speaker at Predictive Analytics and Data Mining conferences worldwide, and is on the Advisory Boards for the UC/Irvine Predictive Analytics Certificate as well as the UCSD Data Mining Certificate programs.

He has a B.S. in Mathematics of Computation from Rensselaer (1985) and a Master of Applied Mathematics from the University of Virginia (1987).

Dr. Mamdouh Refaat
Senior Vice President Chief Data Scientist
Angoss Software

As Senior Vice President and Chief Data Scientist, Mamdouh manages the company’s Global Analytics Center of Excellence, including managing pre-sales technical support, training and data modelling services delivery for customers.

Mamdouh is an expert and published author with over 20 years of experience in predictive analytics and data mining, having led numerous projects in the areas of marketing, CRM and credit risk for Fortune 500 companies in North America and Europe.

Mamdouh holds a PhD in Engineering from the University of Toronto and an MBA from University of Leeds.

How Thick Data Can Unleash the True Power of Big Data by @sudheer_kiran via @Datafloq

While Big Data helps us find answers to well-defined questions, Thick Data connects the dots and gives us a more realistic picture.

I'd never heard of thick data so this was very interesting to me.

Wednesday, 24 May 2017

Why AI is the Catalyst of IoT by @BanafaAhmed via @Datafloq

Businesses across the world are rapidly leveraging the Internet-of-Things to create new products and services that are opening up new business opportunities and creating new business models. The resulting transformation is ushering in a new era of how companies run their operations and engage with customers. However, tapping into the IoT is only part of the story.

For companies to realise the full potential of IoT enablement, they need to combine IoT with rapidly-advancing Artificial Intelligence technologies, which enable ‘smart machines’ to simulate intelligent behaviour and make well-informed decisions with little or no human intervention.

This is a very well thought out article that is well worth reading. It has some great diagrams that I think make it easier for you to understand the points he is making.

Tuesday, 23 May 2017

Taser will use police body camera videos "to anticipate criminal activity" by @eyywa via @theintercept

Taser is bullish on body cams. Rebranded as Axon, the company claims that applying AI analysis to police video footage will enable real-time facial recognition, free up officers from video tagging, and eventually produce predictive insights into criminal behaviour (à la Minority Report and RoboCop). The Intercept explores the ethical implications.

Interesting but I can quite see the ethical implications this article points out.

Monday, 22 May 2017

A new AI algorithm summarises text amazingly well by Will Knight via @techreview

Training software to accurately sum up information in documents could have great impact in many fields, such as medicine, law, and scientific research.

This sounds great and a definite step forward.

Sunday, 21 May 2017

An open online grocery shopping dataset by Jeremy Stanley via @Instacart

Instacart has released an anonymized dataset containing a sample of over 3 million grocery orders from more than 200,000 users.

This is a great resource and I can only congratulate them for their generosity and foresight in making this data available.

Saturday, 20 May 2017

Predicting Customer Churn both with and without Machine Learning by @jmsmistral and @dbatalov

Customer Churn is a fairly standard thing to want to know for any business. However do you know how to calculate it correctly and how to interpret the results? Here are two excellent articles that can help you whichever method you choose. Both are great and I advise people to read them both and decide your method based on what you have access to.

Predicting Customer Churn with Amazon Machine Learning by Denis V. Batalov

and

Predicting Churn without Machine Learning by Jonathan Sacramento

Friday, 19 May 2017

3 big open data trends in the United States by @sammcclenney via @opensourceway

What are the practical ramifications of city leadership, data standards, and data sharing across the country?

I think open data is a great thing and any step to remove sharing and access to that data is a retrograde step. Insight into data can come from anyone if the data is open, and that data can be combined with new data sources for new insights.

Thursday, 18 May 2017

10 Breakthrough Technologies 2017 via @techreview

You may be working on some of these 10 breakthrough technologies in the list from MIT.

A very interesting list to review and think about. There are a few items on this list I wasn't aware of.

Wednesday, 17 May 2017

Many firms struggle to use customer data effectively by David Weldon via @infomgmt

New study reveals large gap between consumer expectations and how organisations act on them.

There is so much data available and it is possible to analyse the data to come to so many conclusions but the key thing is to actually do the correct analytics and reach the right conclusions (then act on them).

Tuesday, 16 May 2017

WEBINAR: Success with data governance projects - 25 May 2017

May 25, 2017 | 2 PM ET/11 AM PT
Hosted by Information Management

One of the most important elements in any effective data management program is data governance. Data governance includes the rules, processes, and standards that ensure that data is created and stored in ways that anyone accessing it will have the same experience. Learn how data governance can help you easily find data you can trust.

Among the topics to be discussed:

What is the role of data governance in a data management or MDM program?
What are the needed elements of an effective data governance program?
Who should take ownership of a data governance program, and what does that mean?

Featured Presenters:

Moderator:
Lenny Leibmann
Contributing Editor
SourceMedia

Sponsored By:

Big Data Exposes Big Falsehoods by John Pollock via @techreview

Analysis by Semantic Visions reveals intriguing differences between Russian and Western commentary about the shooting down of an airliner in 2014.

I find this fascinating and it is yet another real world example of Big Data and a useful application of it. Great article with a lot of good information.

Monday, 15 May 2017

Opinion Who owns the order data? by Andrew White via @infomgmt

A new agreement implies that the NYSE has ownership of transaction information submitted by a trader. The issue is the traders thought they owned that information. So who really does?

I think this is a crucial thing that has to be agreed and documented. You need to know the definition of data the data flows, the meaning, the system of record and the ownership. It a needs to be documented somewhere and it needs to be available to all in some format so that there are no surprises and total understanding between all parties.

Sunday, 14 May 2017

How to Learn Machine Learning in 10 Days by @rasbt via @kdnuggets

10 days may not seem like a lot of time, but with proper self-discipline and time-management, 10 days can provide enough time to gain a survey of the basic of machine learning, and even allow a new practitioner to apply some of these skills to their own project.

Interesting. I enjoyed the videos and as he quite rightly says - 10 days is a little short if you want to learn it properly.

Saturday, 13 May 2017

What is a productive data engineering team? by @jessetanderson via @OReillyMedia

Jesse Anderson explains how to merge the gaps between data science and engineering and what each side can learn from the other.

I love this article and the diagram Figure 2 in the article is something I think we all can learn from and use to analyse our own organisation.

Friday, 12 May 2017

Organisations take data warehousing to the cloud by David Weldon via @infomgmt

A growing number of firms are attracted to promises of unlimited scalability and low cost storage.

I find the concept of having a data warehouse in the cloud quite exciting. If you think about it, I can see that it can make economic sense to have it stored this way.

Thursday, 11 May 2017

New Online Data Science Tracks for 2017 by Brendan Martin via @kdnuggets

In 2017 there are many new and revamped data science tracks that are much more comprehensive for beginners than ever before. The tracks are designed to give you the skills you need to grab a job in data science, and some even have a job guarantee.

This is a 2 page article. Some great courses exist out there now. No idea how effective they are, but it is worth you time to look at them.

Wednesday, 10 May 2017

Why Customer Data Silos May Be Hindering Greater Market Success by @ricknotdelgado via @Datafloq

The premise is simple: use big data collected from customers to achieve better outcomes, but too much data leads to a lot of problems

It's key as this article says to join all this customer data together to have a total view of a customer not lots of silos that don't help the business to make good decisions nor come to useful conclusions.

Tuesday, 9 May 2017

WEBINAR: Apache® Spark™ MLlib 2.x: Productionize your Machine Learning Models - 16 May 2017

Overview

Title: Apache® Spark™ MLlib 2.x: Productionize your Machine Learning Models

Date: Tuesday, May 16, 2017

Time: 09:00 AM Pacific Daylight Time

Duration: 1 hour

Summary

Apache® Spark™ MLlib 2.x: Productionise your Machine Learning Models

Apache Spark has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these models to a production environment? How do I embed what I have learned into customer facing data applications?

In this latest Data Science Central webinar, we will discuss:

Best practices on how customers productionise machine learning models
Case studies with actual customers
Live tutorials of a few example architectures and code in Python, Scala, Java and SQL

Speaker: Richard Garris, Principal Solutions Architect -- Databricks Inc.

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Join here

Data validation with the assertr package by/via @tonyfischetti

The assertr package has some wonderful validation constructs: even if you don’t spend a lot of time in R, it’s worth reading this piece purely for its approach to scale-able data validation.

The code examples and the comments with replies are a must read too as I think if you add the article, the examples and the questions it adds to the complete value in looking at this. Definitely worth a look for anyone who uses R but can also give some overall help for anyone who validates data.

Monday, 8 May 2017

Vast majority of firms fear impact of GDPR non-compliance by David Weldon via @infomgmt

The measure goes into effect in 2018 and global corporations are concerned they're not ready to meet the new EU data management and retention regulation's requirements.

The cock is ticking and this needs to be ready.

Sunday, 7 May 2017

16 Free and Open-Source Business Intelligence Tools by Samuel Scott via @DZone

Companies need to analyse all of the data that they collect — and that is where data science and business intelligence tools come in.

I have experience of using 9,10,15 and 16 but know nothing much about the others - it might be fun to try some of them to see if any of them are better.

Saturday, 6 May 2017

What IT Managers Should Know about Quantum Computing by Richard Hackathorn via @hackernoon

This article is directed to analytic-mature technology-savvy managers who deal with corporate IT infrastructure and strategy. The article explains quantum computing in terms relevant to IT managers and suggests future business opportunities to exploit this new technology.

I love this and it is very good at giving examples that you can relate to.

Friday, 5 May 2017

How to determine the quality of data by Sandeep Godbole via @infomgmt

There are specific parameters that can help organisations gauge the condition of their information stores.

This is great and very useful to read.

Thursday, 4 May 2017

Google’s Duelling Neural Networks Spar to Get Smarter, No Humans Required by Cade Metz via @WIRED

“What an AI cannot create, it does not understand.”

Very interesting and certainly added to my understanding of a few things.

Wednesday, 3 May 2017

SLIDESHOW: 10 top digital intelligence platforms by David Weldon via @infomgmt

Adobe, Google and IBM are among the market leaders, according to the new Forrester Wave report.

Some new companies in there for me that are worth investigating.

Tuesday, 2 May 2017

Which machine learning algorithm should I use? by Hui Li via @SASsoftware

Here's an introduction to machine learning algorithms that can help beginners determine which algorithms to use to solve their specific problems.

This is an incredible resource from SAS in one of their blogs. The cheat sheet is priceless and you really MUST bookmark this.

Monday, 1 May 2017

The Value of Exploratory Data Analysis by Chloe Mawer via @kdnuggets

In this post, the author will give a high level overview of what exploratory data analysis (EDA) typically entails and then describe three of the major ways EDA is critical to successfully model and interpret its results.

This is crucial and it's important that it is done properly.