Data: August 2018

Friday, 31 August 2018

WEBINAR: Getting Data Down to a Science – Code-free and Code-friendly ML - 5th September 2018

Data Science helps answer some of the most basic - and the most complex - business questions. In this latest Data Science Central webinar you will learn how to get data down to a science with code-free and code-friendly self-service analytics platforms. Decisive Data’s Lead Data Scientist Tessa Jones will use a sample data set from a global corporation to answer some of the most common data science questions applicable across businesses.

Learn how to use code-free and code-friendly Machine Learning:

Dive – Swim in the data and dive into a few common business questions with answers in data science including demand forecasting and customer segmentation.
Build – Walk through two data science models including code-free time series and clustering machine learning models.
Customize – Implement custom R code into models.
Refine – Enhance your methods with rapid self-service techniques.
Display – Creatively display information visually in Tableau and tell a story that makes the findings clear and captivating using the Art + Data methodology.

Speakers:
Tessa Jones, Lead Data Scientist -- Decisive Data
Scott Trauthen, Director of Marketing -- Alteryx

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	Getting Data Down to a Science – Code-free and Code-friendly Machine Learning
Date:	Wednesday, September 5^th, 2018
Time:	9 AM - 10 AM PDT

Join here

Saturday, 18 August 2018

WEBINAR: Production ML for Data Scientists: What You Can Do and How to Make it Easy - 22 August 2018

Production ML for Data Scientists:
What You Can Do and How to Make it Easy
August 22, 2018 | 10am PT/1pm ET

For many data scientists in the enterprise, the deployment of machine learning into production environments has become a second job - and one that most do not want. Current IT and operations teams and tools can't account for the complexities of deploying, managing and scaling ML applications, leaving data science and data engineering teams on the hook for the success - or failure - of ML and AI initiatives.

In this webinar, data scientists will be introduced to MLOps - an approach for machine learning operationalization that:

Breaks down the silos between data science and IT
Streamlines deployment and orchestration
Adds advanced functionality like ML Health, governance and business metrics

Get Your ML Experiments to Production

On August 22 at 10am PT/1pm ET, join Nisha Talagala, CTO, and Craig Michaud, Sales Engineer, from ParallelM - the MLOps Company - for a look at how much easier machine learning can be with the right technology and processes in place. You'll see how to upload code from your existing data science platforms, run it in a sandbox against production data, conduct AB tests and perform timeline captures.

Friday, 17 August 2018

WEBINAR: Harnessing the Power of AI with Azure Databricks - 21 August 2018

Harnessing the power of AI on streaming data generated by thousands of IoT devices is no easy task. Lennox International came to this realization as they looked to build a smarter HVAC system by analyzing large data sets, combined with external data sources such as weather data, and predicting equipment failure with high levels of accuracy along with their influencing patterns and parameters.

Join this latest Data Science Central webinar to learn how Lennox leveraged Azure Databricks and PySpark to solve their biggest data challenges and improve data science and engineering productivity, resulting in complex machine learning models that run in 40 minutes with minimal tuning and predict failures with accuracy of about 90%.

This webinar will cover:

The data orchestration challenges Lennox faced which impacted model accuracy levels and data processing times
How they use Azure Databricks to build the data engineering pipelines, appropriate machine learning models and extract predictions using PySpark
How they also implemented stacking, ensemble methods using H2O driverless AI and Sparkling Water on Azure Databricks clusters, which can scale up to 1000 cores

Speaker: Prasad Chandravihar, Lead Data Scientist -- Lennox International

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	Harnessing the Power of AI with Azure Databricks
Date:	Tuesday, August 21^st, 2018
Time:	09:00 AM - 10:00 AM PDT

Wednesday, 15 August 2018

Tying an agile data management strategy to business goals by Rudraksh Bhawalkar via @infomgmt

As organisations evolve from regular business processes to digital businesses, those that do not have a sharp focus on data will fail to keep up with their more advanced peers.

I think Rudraksh is right and that going digital does not change the needs for accurate and correct data it emphasises it. Make sure you add time to sort out the data properly as part of your data initiatives.

Tuesday, 14 August 2018

Only one in three AI projects reported to succeed by Elliot M. Kass via @infomgmt

IT execs point to inconsistent data, incompatible technologies and organisational silos as major impediments

From my own perspective I have observed the following common issues (even if they shouldn't be common).

Inconsistent formats used for the same data element in different systems.
Inconsistent definitions of the same data in different systems.
Inconsistent values of the same data in different systems.

This is why many organisations have data warehouses and why people like me map between the different systems and the data warehouse so that some ETL code can be used in order to bring it into a common data format, data type and data values so it can be joined easily and you can compare like with like.

Monday, 13 August 2018

Data veracity challenge puts spotlight on trust by Pat Sullivan via @infomgmt

The data veracity challenge is one that most businesses have yet to come to grips with, but if we’re to fully harness data for the full benefit to businesses and society, then this challenge needs to be addressed head on.

I think automation of reports are great for businesses yes, but as this article from Pat says/suggests, you absolutely have to be confidence in your data, that you can rely on the quality of that data, that you know the journey of that data from the original source into wherever you use it from in your reporting, that you understand the meaning of the data (data management), that you can join it with other data and produce something useful and that any data analysis/visualisations/algorithms are correctly defined and are not biased if your business is going to be run using it and investment that is based on it is not wasted.

Wednesday, 8 August 2018

How decision trees work by/via @_brohrer_

This is a fantastic overview of how decision trees work by Brandon Rohrer. Includes lots of diagrams, easy to follow descriptions and a short video if you'd rather watch.

I love that it tells you what to look out for so that you hopefully won't fall into some of the common pitfalls. I really suggest you look at his other blog entries which are incredibly useful and worth bookmarking.

Tuesday, 7 August 2018

10 tips for making high availability more affordable in the public cloud by Dave Bermingham and Joey D'Antoni via @infomgmt

10 ways organisations can utilise public cloud services more cost-effectively while also maintaining appropriate service levels for all applications.

This is a great article. I would add that you should tune everything very carefully and try to make the best use of the resources you are paying for. If you understand the way the cost structure works and the way your code works being clever and careful with tuning code can potentially save (or waste) a lot of money.

Monday, 6 August 2018

How to spot bad data, and know the limitations when it's good by Kayla Matthews via @infomgmt

Accurate and reliable data can bring context to research studies, help people understand trends, aid business managers in knowing what’s working well for achieving company goals and much more. However, not all data is as beneficial as it seems at first.

I completely agree with Kayla's observations. If you have a large enough team I think to reduce the possibility of bias it would be a good practice to get a colleague to prepare the data for you with only a vague idea of what you need. I cannot stress enough that in order to guarantee the quality of your data you need to take it from the system of record, not have had it modified before you receive it, and ensure that you really understand what the data elements really mean.

Data