Friday 29 November 2019

Coding habits for data scientists by David Tan via @thoughtworks

Code to train ML models can get messy fast. This article identifies the bad habits that add complexity in code and suggests good habits to cultivate in order to declutter your code.

Some great advice in this article and some great examples of python code - both good and bad.

Thursday 28 November 2019

WEBINAR: Automating Regulatory Compliance with Data Wrangling - 10 December 2019

Data Science Central Webinar Series Event
Automating Regulatory Compliance with Data Wrangling
Join us for the latest DSC Webinar on December 10th, 2019
register-now
Knowledge workers typically a) get information b) perform logic on that information and c) finally reach a conclusion and can take action. The time to just clean and prepare data for analysis can cause significant bottlenecks and delay the ability to take any action. Automating these series of tasks, mechanizes repetitive and manual tasks. This frees up knowledge workers to focus on more value-added activities.

A series of lessons will be shared with a case study from the banking trenches on how to leverage data wrangling to help automate these series of tasks on a use case within Risk Management and reduce a 10,000 hour regulatory process down to 10 hours.

In this latest Data Science webinar you will learn:
  • How manual and repetitive tasks are costing organizations trillions of dollars in non-productive work
  • Understand how to drive adoption of new technologies within your organization
  • How assembly line thinking has led to mistakes in the way we approach data pipelines
Featured Speakers:
Salah Khawaja, Managing Director, Automation Global Risk -- Bank of America
Raj Anand, Sr. Vice President, Automation Global Risk -- Bank of America
Will Davis, Head of Marketing -- Trifacta

Hosted by: Stephanie Glen, Editorial Director -- Data Science Central
 
Title: Automating Regulatory Compliance with Data Wrangling
Date: Tuesday, December 10th, 2019
Time: 9 AM - 10 AM PST
 
Space is limited so please register early:
Reserve your Webinar seat now

Wednesday 27 November 2019

Quantum Computing Holds Promise for Banks, Executives Say by @SCastellWSJ via @WSJ

“In the universe of industries where there is a potential quantum advantage, you could argue that finance has got the shortest path to impact,” says Jeremy Glick, head of research-and-development engineering at Goldman Sachs. But first, we need to build hardware that doesn’t exist yet, and then we need to come up with a really good idea on how to use it.

I love the promise of this and can't wait to see it more widely used - the benefits will be massive and give a great advantage to those companies who utilise it fully.

Tuesday 26 November 2019

WEBINAR: Train & Tune Your Computer Vision Models at Scale - 5 December 2019

Data Science Central Webinar Series Event
Train & Tune Your Computer Vision Models at Scale
Join us for this latest DSC Webinar on December 5th, 2019
Register Now!
tableau
Whether you are training a self-driving car, detecting animals with drones, or identifying car damage for insurance claims, the steps needed to effectively train a computer vision model at scale remain the same.

In this latest Data Science Central webinar, we’ll walk through best practices for managing a computer vision project including staffing, budgeting, and roles and responsibilities. Learn how to collect and label the data that will train and tune your machine learning algorithm, and which types of data labeling best fit your project along with the tools that will get the job done.
In this webinar, you’ll learn how to:

  • Identify key success factors when scoping a computer vision project
  • Determine what kind of source data you need to make it successful
  • Select tools that best fit your project
  • Label your dataset so your algorithms can learn and perform as designed

Speaker: Meeta Dash, Director of Product -- Figure Eight

Hosted by: Stephanie Glen, Editorial Director -- Data Science Central

Title: Train & Tune Your Computer Vision Models at Scale
Date: Thursday, December 5th, 2019
Time: 9:00 AM - 10:00 AM PST

Space is limited so please register early:
Reserve your Webinar seat now

Monday 25 November 2019

Google denies it’s using private health data for AI research by Gerrit De Vynck via @infomgmt

Google’s deal with Ascension has been under scrutiny since the Wall Street Journal reported on Monday the company was collecting identifiable data on millions of patients and using it to build new products.

Interesting. I'm sure there is a data privacy issue there, although how would you know they have been using your data??

Friday 22 November 2019

Wednesday 20 November 2019

Getting better at predicting organised conflict by Tate Ryan-Mosley via @techreview

New techniques, machine learning, and better data gathering have made predictions both more useful and more granular. In this MIT Technology Review article, one predictive model is applied to look at violence in Ethiopia since the election of Abiy Ahmed, the new Nobel Peace Prize winner.

I loved this really insightful article which has some great diagrams that help with understanding.

Monday 18 November 2019

WEBINAR: 20 Predictions for 2020 from AI to Data Management - 21 November 2019

Data Science Central Webinar Series Event
20 Predictions for 2020 from AI to Data Management
Join us for the latest DSC Webinar on November 21st, 2019
register-now
AI, machine learning, cloud, self-service, data governance, etc...there is no shortage of buzzwords in data today. Every organization is seeking to outpace its competition by leveraging data to drive differentiation for their business. To win this race, companies are building up data science teams, investing in faster/more scalable cloud data platforms and utilizing the growing variety of publicly available datasets and algorithms. How do you stay ahead of what’s next and help drive the successful adoption of new technology and processes within your organization?

This latest Data Science Central webinar will be interactive and will review where we think data management, analytics and ML/AI are headed next. The session will also focus on how to use the predictions and data we share in the session to drive modernization efforts at your company.

In this webinar you can expect to learn:


  • Will cloud-native services & kubernetes fundamentally change our approach to data infrastructure & application integration?
  • Will the buzz around machine learning continue or will the first ML initiatives stumble out of the gates?
  • How will the nature of self-service change with an increased focus on data governance & security?

Featured Speakers:
Will Davis, Head of Marketing -- Trifacta
Eric Kavanagh, CEO -- The Bloor Group
Evren Cakir, Senior Analyst -- The Bloor Group

Hosted by: Stephanie Glen, Editorial Director -- Data Science Central

Title: 20 Predictions for 2020 from AI to Data Management
Date: Thursday, November 21st, 2019
Time: 9 AM - 10 AM PST

Space is limited so please register early:
Reserve your Webinar seat now

Want a data science job? Use the weekend project principle to get it by @mrdbourke via @Medium

Online course certificates are great. But projects of your own are better.

My suggestions are to join Kaggle, Data Science Central or some other forum where you can access free data and do some analyses that show or prove something.

Friday 15 November 2019

New Survey: Nearly Two Thirds of Analytics Projects Are Jeopardised Due to Poor Access to the Right Data by/via @insideBigData

According to a recent survey, 57% of organizations have been unable to access real-time analytics or suffered inaccurate business intelligence because of a lack of access to the right data.

I think mirrors of production databases are useful places to run real-time data analytics against. You just need to be very careful to understand that data so that you still use facts and truth and not a subset of it.

Thursday 14 November 2019

WEBINAR: Hadoop-to-Cloud Migration: How to modernize your data and analytics architecture - 21 November 2019

Hadoop-to-Cloud Migration: How to modernize your data and analytics architecture
 
Hadoop-to-Cloud Migration: How to modernize your data and analytics architecture
November 21, 2019
10:00 AM PT

Hi there,

Many Hadoop customers struggle with its system complexity, unscalable infrastructure, and DevOps burden, and are exploring how a migrate their data and workloads to modern cloud based data platform to better meet their needs. Migrations and modernization can help accelerate big data projects and open new frontiers around data science and machine learning.

Sign-up for our webinar with Anand Venugopal, Migration Solutions Director at Databricks, to learn how best practices on evaluating cloud migration and data platform modernization. We'll cover topics like how technology components map from on-premise to cloud model, cost savings from the cost compute model, and new use cases enabled by modern data architectures.
 
Save Your Spot
 

Sincerely,
The Databricks Team

Wednesday 13 November 2019

Common Data Mistakes to Avoid by/via @geckoboard

“Statistical fallacies are common tricks data can play on you, which lead to mistakes in data interpretation and analysis.” Here’s a look at some of the common fallacies, with examples, a downloadable poster, and - more importantly - ways to avoid them.

This was really useful to remind you of all the potential mistakes you can make. There is also a great poster that can be downloaded to remind you of all these great points. Definitely, something to bookmark and keep.

Monday 11 November 2019

WEBINAR: Enterprise-ready Data Science and ML with Python - 19th November 2019

Data Science Central Webinar Series Event
Enterprise-ready Data Science and ML with Python
Join us for the latest DSC Webinar on November 19th, 2019
Register Now!Databricks
Many Data Scientists spend much of their time on laptops, working with familiar tools like Jupyter and Conda, on data that fits on their machine.

In this latest Data Science Central webinar we will discuss a laptop-like experience for Data Science and Machine Learning, supporting the same tools and workflows you have become accustomed to. We will highlight how Databricks augments that experience with collaborative features like co-editing and commenting, as well as enterprise-level security, scalability and reliability.


Speaker:
Clemens Mewald, Director of Product Management, Machine Learning and Data Science -- Databricks

Hosted by: Rafael Knuth, Contributing Editor -- Data Science Central

Title: Enterprise-ready Data Science and ML with Python
Date: Tuesday, November 19th, 2019
Time: 09:00 AM - 10:00 AM PST

Space is limited so please register early:
Reserve your Webinar seat now

When it comes to data, why the 'garbage in, garbage out' doctrine is all wrong by Michael Kanellos via @infomgmt

The problem is that there’s way too much of it and it’s not organized in a way that makes it easy to understand. It doesn’t form beautiful crystalline patterns like salt: it’s more like a huge pile of gravel.

It's clear to me that you can check the quality of your data, but you shouldn't throw away anything that doesn't match your vision or correctness. Flag it as not being "right" but don't lose it - it could still give useful insights.  Think of it this way - financial data must equal what is going into the financial ledgers. If you include the bad data it probably will. just make sure you mark r it in some way.

Friday 8 November 2019

Four people your data team needs to win the model deployment relay by Sarah Gates via @infomgmt

To be effective at model management you need a strong team. The good news is that you don’t need a lot of people to accomplish this. Just like a relay race, the right four people can manage the complete model lifecycle.

This is a great read if you have no idea how to do this and want to know how many people are needed to do all of that.  Very useful article.

Wednesday 6 November 2019

Why is a data governance business case hard to get approved? by Nicola Askham via @infomgmt

It can be a real struggle to get your data governance initiative approved in the first place. So I wanted to have a look at the reasons why this might be the case so that you can both plan for and mitigate them.

I agree - it is actually very important BUT it is almost like a last resort if, and only if, there is time or there is enough of a benefit that can be clearly shown.

Tuesday 5 November 2019

WEBINAR: Real-Time Actionable Data Analytics - 13 November 2019

IoT Central Webinar Series Event
Real-Time Actionable Data Analytics
Join us for this latest IoTC Webinar on November 13th, 2019
Register Now!tableau
In IoT, understanding the health of thousands of devices is critical for deployment at scale, especially when troubleshooting an issue. Customers need visibility into their devices with actionable data to reference in real time.

In this latest IoT Central webinar, learn how a fully-integrated IoT platform team built a metrics system on Telegraf, Kubernetes, and InfluxDB Cloud to deploy a customer-facing product that provides critical and relevant data analytics.

Speaker:
Cullen Murphy, Site Reliability Engineer -- Particle.io

Hosted by: David Oro, Editorial Director -- IoT Central

Title: Real-Time Actionable Data Analytics
Date: Wednesday, November 13th, 2019
Time: 9:00 AM - 10:00 AM PST

Space is limited so please register early:
Reserve your Webinar seat now

Monday 4 November 2019

This New Google Technique Help Us Understand How Neural Networks are Thinking by @jrdothoughts via @TDataScience


Interpretability remains one of the biggest challenges of modern deep learning applications. The recent advancements in computation models and deep learning research have enabled the creation of highly sophisticated models that can include thousands of hidden layers and tens of millions of neurons.

I found this fascinating and it is worth a read as well as a bookmark.

Friday 1 November 2019

3 tips on how to stop misusing or under-utilising corporate data by Alex Toews via @Infomgmt

Few organizations have assessed how their data can be put to work in the most productive way. This leaves them vulnerable to inefficiencies and can prevent important information from making its way 'to the top.'

A data model is a great place to start as you can begin to understand how the data relates to each other. The data dictionary is also useful as you can see which fields are repeated which is crucial if you want to understand how you can join data together from different sources. Just pay attention to formats and if any conversion needs to be done.