Data: 2019

Friday 20 December 2019

What is data privacy, really, and what tools are required for it? by Ernest Martinez via @infomgmt

Data privacy requirements necessitate not only identifying the location and nature of impacted data, but also the flow and transformation that it takes throughout the application landscape.

A great explanation and worth a read just to make sure you really understand the topic.

Wednesday 18 December 2019

How to build pipelines with pandas using pdpipe by Tirthajyoti Sarkar via @TDataScience

This tutorial describes how to build intuitive and useful pipelines with pandas DataFrames using the pdpipe library.

A great tutorial which includes some code too. Definitely worth a bookmark.

Monday 16 December 2019

An introduction to Kubernetes by/via @jeremyjordan

This is a great blog which will tell you what it is. How to use it. What it’s good for.

This is a perfect place to start learning about Kubernetes and thinking about what you can use it for. There are great code extracts as well as a list of useful links at the bottom.

Friday 13 December 2019

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead by Adrian Colyer via @kdnuggets

The two main takeaways from this paper: firstly, a sharpening of my understanding of the difference between explainability and interpretability, and why the former may be problematic; and secondly some great pointers to techniques for creating truly interpretable models.

I enjoyed this article and his points which are very relevant.

Wednesday 11 December 2019

The Problem with “Biased Data” by Harini Suresh via @Medium

Poorly defined terminology could actually play a role in biased data, says Harini Suresh. “The right terminology forms a mental framework, making it that much easier to identify problems, communicate, and make progress. The absence of such a framework, on the other hand, can be actively harmful, encouraging one-size-fits-all fixes for ‘bias,’ or making it difficult to see the commonalities and ways forward in existing work.”

I like this great article by Harini Suresh. I have noticed that you need to have an agreed set of definitions for all the data fields, the calculations, the methodologies, and even the data sources because that there are so many synonyms and opposing definitions for all of those that you need to measure like with like in the same way if you want to try and avail bias - if you do not you have already lost the battle.

Tuesday 10 December 2019

WEBINAR: From Degas to Dashboards: Lessons of the Great Masters - 17 December 2019

Data Science Central Webinar Series Event

From Degas to Dashboards: Lessons of the Great Masters
Join us for this latest DSC Webinar on December 17^th, 2019

For over 30,000 years, we have expressed ourselves through visual art, and there are lessons we can draw from painting and apply them to viz. What do Impressionists teach us about dashboard interactivity? How does Cubism help us tell a data story?

Set against a canvas of art history, in this latest Data Science Central webinar we will learn a dozen specific techniques and tools for building meaningful, engaging, and visually striking dashboards.

Speaker:
Jeff Pettiross, User Experience Designer -- Tableau

Hosted by: Rafael Knuth, Contributing Editor -- Data Science Central

Title:	From Degas to Dashboards: Lessons of the Great Masters
Date:	Tuesday, December 17^th, 2019
Time:	9:00 AM - 10:00 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

Monday 9 December 2019

Deep learning has hit a wall by Alex Woodie via @datanami

“The rapid growth in the size of neural networks is outpacing the ability of the hardware to keep up,” said Naveen Rao, vice president and general manager of Intel’s AI Products Group. Solving the problem will require rethinking how processing, network, and memory work together.

This sounds like a physical limitation that needs a two-pronged approach - one needs to be hardware advances but the other is an adaptation to the tools and techniques used to do AI and deep learning.

Friday 6 December 2019

How to Speed up Pandas by 4x with one line of code by @GeorgeSeif94 via @kdnuggets

Pandas is the go-to library for processing data in Python. It’s easy to use and quite flexible when it comes to handling different types and sizes of data. It has tons of different functions that make manipulating data a breeze.

I sure hope this works - I can certainly see what he means.

Wednesday 4 December 2019

Nordic data debacles tell story of numbers that aren’t true by Nick Rigillo and Catherine Bosley via @infomgmt

Scandinavia is offering a fresh case study this month in how even the world’s richest countries can struggle to measure their own economies and trust the data.

This is a lesson which we should all learn from and use it to make absolutely sure that we are sure of our numbers and the data source as well as the methodology we use to make any calculation within analytics.

Tuesday 3 December 2019

WEBINAR - ML/AI Models: Continuous Integration & Deployment 11 December 2019

Data Science Central Webinar Series Event

ML/AI Models: Continuous Integration & Deployment

Join us for this latest DSC Webinar on December 11^th, 2019

Some things are best learned through real-world experience. Machine learning is no different. Getting machine learning right requires evolving your analytics platform to support moving data science from research into operations. It all begins with repeatable data wrangling processes that support building and deploying models. It also requires collaboration between data scientists, engineers and business analysts. With the help of tools like SAS® Model Manager, these teams can continuously and automatically train models at scale and ensure the best models are put into production.

In this latest Data Science Central webinar we will discuss:

Model validation best practices
Various model deployment options including open source models
Model scoring and training services
Model performance monitoring
Orchestrating a continuous learning platform

Featured Speakers:
Wayne Thompson, Chief Data Scientist -- SAS
Lora Edwards, Principal Product Manager -- SAS

Hosted by: Rafael Knuth, Contributing Editor -- Data Science Central

Title:	ML/AI Models: Continuous Integration & Deployment
Date:	Wednesday, December 11^th, 2019
Time:	9:00 AM - 10:00 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

WEBINAR: Real-Time Analytics at Scale with High Velocity Data - 12 December 2019

Data Science Central Webinar Series Event

Real-Time Analytics at Scale with High Velocity Data
Join us for this latest DSC Webinar on December 12^th, 2019

Performing analytics at the edge, in the data center or in the cloud, is needed in today’s distributed landscape. Edge Computing allows the flexibility of virtualized computation, network and storage resources to the edge, as an integrated solution combined with ML and AI libraries. At the heart of the solution is the open-source time series database, InfluxDB, and the data processing framework Kapacitor.

In this latest Data Science Central webinar, we will share how to build this point-and-click solution to help customers unlock the power of high-frequency data in real-time to become a data-driven organization.

Speakers:
Anil Joshi, CEO -- AnalyticsPlus, Inc.
Pankaj Bhagra, Co-Founder and Software Architect -- Nebbiolo Technologies

Hosted by: Rafael Knuth, Contributing Editor -- Data Science Central

Title:	Real-Time Analytics at Scale with High Velocity Data
Date:	Thursday, December 12^th, 2019
Time:	9:00 AM - 10:00 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

Monday 2 December 2019

'Big data' and 'analytics' - Two of the top buzzwords everyone secretly hates by ohn-David McKee via @infomgmt

Buzzwords are frequently abused as an attempted credibility builder. A way of showing others that you're in the know.

I agree - they are often used out of context and that just tells me that the user doesn't actually understand the word properly and what it entails to be actually delivered properly. I think Artificial Intelligence is used too often and that it is used too much as the fall guy by people who don't understand it.

Friday 29 November 2019

Coding habits for data scientists by David Tan via @thoughtworks

Code to train ML models can get messy fast. This article identifies the bad habits that add complexity in code and suggests good habits to cultivate in order to declutter your code.

Some great advice in this article and some great examples of python code - both good and bad.

Thursday 28 November 2019

WEBINAR: Automating Regulatory Compliance with Data Wrangling - 10 December 2019

Data Science Central Webinar Series Event

Automating Regulatory Compliance with Data Wrangling
Join us for the latest DSC Webinar on December 10^th, 2019

Knowledge workers typically a) get information b) perform logic on that information and c) finally reach a conclusion and can take action. The time to just clean and prepare data for analysis can cause significant bottlenecks and delay the ability to take any action. Automating these series of tasks, mechanizes repetitive and manual tasks. This frees up knowledge workers to focus on more value-added activities.

A series of lessons will be shared with a case study from the banking trenches on how to leverage data wrangling to help automate these series of tasks on a use case within Risk Management and reduce a 10,000 hour regulatory process down to 10 hours.

In this latest Data Science webinar you will learn:

How manual and repetitive tasks are costing organizations trillions of dollars in non-productive work
Understand how to drive adoption of new technologies within your organization
How assembly line thinking has led to mistakes in the way we approach data pipelines

Featured Speakers:
Salah Khawaja, Managing Director, Automation Global Risk -- Bank of America
Raj Anand, Sr. Vice President, Automation Global Risk -- Bank of America
Will Davis, Head of Marketing -- Trifacta

Hosted by: Stephanie Glen, Editorial Director -- Data Science Central

Title:	Automating Regulatory Compliance with Data Wrangling
Date:	Tuesday, December 10^th, 2019
Time:	9 AM - 10 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

Wednesday 27 November 2019

Quantum Computing Holds Promise for Banks, Executives Say by @SCastellWSJ via @WSJ

“In the universe of industries where there is a potential quantum advantage, you could argue that finance has got the shortest path to impact,” says Jeremy Glick, head of research-and-development engineering at Goldman Sachs. But first, we need to build hardware that doesn’t exist yet, and then we need to come up with a really good idea on how to use it.

I love the promise of this and can't wait to see it more widely used - the benefits will be massive and give a great advantage to those companies who utilise it fully.

Tuesday 26 November 2019

WEBINAR: Train & Tune Your Computer Vision Models at Scale - 5 December 2019

Data Science Central Webinar Series Event

Train & Tune Your Computer Vision Models at Scale

Join us for this latest DSC Webinar on December 5^th, 2019

Whether you are training a self-driving car, detecting animals with drones, or identifying car damage for insurance claims, the steps needed to effectively train a computer vision model at scale remain the same.

In this latest Data Science Central webinar, we’ll walk through best practices for managing a computer vision project including staffing, budgeting, and roles and responsibilities. Learn how to collect and label the data that will train and tune your machine learning algorithm, and which types of data labeling best fit your project along with the tools that will get the job done.
In this webinar, you’ll learn how to:

Identify key success factors when scoping a computer vision project
Determine what kind of source data you need to make it successful
Select tools that best fit your project
Label your dataset so your algorithms can learn and perform as designed

Speaker: Meeta Dash, Director of Product -- Figure Eight

Hosted by: Stephanie Glen, Editorial Director -- Data Science Central

Title:	Train & Tune Your Computer Vision Models at Scale
Date:	Thursday, December 5^th, 2019
Time:	9:00 AM - 10:00 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

Monday 25 November 2019

Google denies it’s using private health data for AI research by Gerrit De Vynck via @infomgmt

Google’s deal with Ascension has been under scrutiny since the Wall Street Journal reported on Monday the company was collecting identifiable data on millions of patients and using it to build new products.

Interesting. I'm sure there is a data privacy issue there, although how would you know they have been using your data??

Friday 22 November 2019

30 Helpful Python Snippets That You Can Learn in 30 Seconds or Less by @FatosMorina via @TDataScience

Sometimes all you need is a code snippet.

Very useful code snippets and useful to check your own code knowledge.

Wednesday 20 November 2019

Getting better at predicting organised conflict by Tate Ryan-Mosley via @techreview

New techniques, machine learning, and better data gathering have made predictions both more useful and more granular. In this MIT Technology Review article, one predictive model is applied to look at violence in Ethiopia since the election of Abiy Ahmed, the new Nobel Peace Prize winner.

I loved this really insightful article which has some great diagrams that help with understanding.

Monday 18 November 2019

WEBINAR: 20 Predictions for 2020 from AI to Data Management - 21 November 2019

Data Science Central Webinar Series Event

20 Predictions for 2020 from AI to Data Management
Join us for the latest DSC Webinar on November 21^st, 2019

AI, machine learning, cloud, self-service, data governance, etc...there is no shortage of buzzwords in data today. Every organization is seeking to outpace its competition by leveraging data to drive differentiation for their business. To win this race, companies are building up data science teams, investing in faster/more scalable cloud data platforms and utilizing the growing variety of publicly available datasets and algorithms. How do you stay ahead of what’s next and help drive the successful adoption of new technology and processes within your organization?

This latest Data Science Central webinar will be interactive and will review where we think data management, analytics and ML/AI are headed next. The session will also focus on how to use the predictions and data we share in the session to drive modernization efforts at your company.

In this webinar you can expect to learn:

Will cloud-native services & kubernetes fundamentally change our approach to data infrastructure & application integration?
Will the buzz around machine learning continue or will the first ML initiatives stumble out of the gates?
How will the nature of self-service change with an increased focus on data governance & security?

Featured Speakers:
Will Davis, Head of Marketing -- Trifacta
Eric Kavanagh, CEO -- The Bloor Group
Evren Cakir, Senior Analyst -- The Bloor Group

Hosted by: Stephanie Glen, Editorial Director -- Data Science Central

Title:	20 Predictions for 2020 from AI to Data Management
Date:	Thursday, November 21^st, 2019
Time:	9 AM - 10 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

Want a data science job? Use the weekend project principle to get it by @mrdbourke via @Medium

Online course certificates are great. But projects of your own are better.

My suggestions are to join Kaggle, Data Science Central or some other forum where you can access free data and do some analyses that show or prove something.

Friday 15 November 2019

New Survey: Nearly Two Thirds of Analytics Projects Are Jeopardised Due to Poor Access to the Right Data by/via @insideBigData

According to a recent survey, 57% of organizations have been unable to access real-time analytics or suffered inaccurate business intelligence because of a lack of access to the right data.

I think mirrors of production databases are useful places to run real-time data analytics against. You just need to be very careful to understand that data so that you still use facts and truth and not a subset of it.

Thursday 14 November 2019

WEBINAR: Hadoop-to-Cloud Migration: How to modernize your data and analytics architecture - 21 November 2019

Hadoop-to-Cloud Migration: How to modernize your data and analytics architecture

November 21, 2019

10:00 AM PT

Hi there,

Many Hadoop customers struggle with its system complexity, unscalable infrastructure, and DevOps burden, and are exploring how a migrate their data and workloads to modern cloud based data platform to better meet their needs. Migrations and modernization can help accelerate big data projects and open new frontiers around data science and machine learning.

Sign-up for our webinar with Anand Venugopal, Migration Solutions Director at Databricks, to learn how best practices on evaluating cloud migration and data platform modernization. We'll cover topics like how technology components map from on-premise to cloud model, cost savings from the cost compute model, and new use cases enabled by modern data architectures.

Save Your Spot

Sincerely,
The Databricks Team

Wednesday 13 November 2019

Common Data Mistakes to Avoid by/via @geckoboard

“Statistical fallacies are common tricks data can play on you, which lead to mistakes in data interpretation and analysis.” Here’s a look at some of the common fallacies, with examples, a downloadable poster, and - more importantly - ways to avoid them.

This was really useful to remind you of all the potential mistakes you can make. There is also a great poster that can be downloaded to remind you of all these great points. Definitely, something to bookmark and keep.

Monday 11 November 2019

WEBINAR: Enterprise-ready Data Science and ML with Python - 19th November 2019

Data Science Central Webinar Series Event

Enterprise-ready Data Science and ML with Python
Join us for the latest DSC Webinar on November 19^th, 2019

Many Data Scientists spend much of their time on laptops, working with familiar tools like Jupyter and Conda, on data that fits on their machine.

In this latest Data Science Central webinar we will discuss a laptop-like experience for Data Science and Machine Learning, supporting the same tools and workflows you have become accustomed to. We will highlight how Databricks augments that experience with collaborative features like co-editing and commenting, as well as enterprise-level security, scalability and reliability.

Speaker:
Clemens Mewald, Director of Product Management, Machine Learning and Data Science -- Databricks

Hosted by: Rafael Knuth, Contributing Editor -- Data Science Central

Title:	Enterprise-ready Data Science and ML with Python
Date:	Tuesday, November 19^th, 2019
Time:	09:00 AM - 10:00 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

When it comes to data, why the 'garbage in, garbage out' doctrine is all wrong by Michael Kanellos via @infomgmt

The problem is that there’s way too much of it and it’s not organized in a way that makes it easy to understand. It doesn’t form beautiful crystalline patterns like salt: it’s more like a huge pile of gravel.

It's clear to me that you can check the quality of your data, but you shouldn't throw away anything that doesn't match your vision or correctness. Flag it as not being "right" but don't lose it - it could still give useful insights. Think of it this way - financial data must equal what is going into the financial ledgers. If you include the bad data it probably will. just make sure you mark r it in some way.

Friday 8 November 2019

Four people your data team needs to win the model deployment relay by Sarah Gates via @infomgmt

To be effective at model management you need a strong team. The good news is that you don’t need a lot of people to accomplish this. Just like a relay race, the right four people can manage the complete model lifecycle.

This is a great read if you have no idea how to do this and want to know how many people are needed to do all of that. Very useful article.

Wednesday 6 November 2019

Why is a data governance business case hard to get approved? by Nicola Askham via @infomgmt

It can be a real struggle to get your data governance initiative approved in the first place. So I wanted to have a look at the reasons why this might be the case so that you can both plan for and mitigate them.

I agree - it is actually very important BUT it is almost like a last resort if, and only if, there is time or there is enough of a benefit that can be clearly shown.

Tuesday 5 November 2019

WEBINAR: Real-Time Actionable Data Analytics - 13 November 2019

IoT Central Webinar Series Event

Real-Time Actionable Data Analytics
Join us for this latest IoTC Webinar on November 13^th, 2019

In IoT, understanding the health of thousands of devices is critical for deployment at scale, especially when troubleshooting an issue. Customers need visibility into their devices with actionable data to reference in real time.

In this latest IoT Central webinar, learn how a fully-integrated IoT platform team built a metrics system on Telegraf, Kubernetes, and InfluxDB Cloud to deploy a customer-facing product that provides critical and relevant data analytics.

Speaker:
Cullen Murphy, Site Reliability Engineer -- Particle.io

Hosted by: David Oro, Editorial Director -- IoT Central

Title:	Real-Time Actionable Data Analytics
Date:	Wednesday, November 13^th, 2019
Time:	9:00 AM - 10:00 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

Monday 4 November 2019

This New Google Technique Help Us Understand How Neural Networks are Thinking by @jrdothoughts via @TDataScience

Interpretability remains one of the biggest challenges of modern deep learning applications. The recent advancements in computation models and deep learning research have enabled the creation of highly sophisticated models that can include thousands of hidden layers and tens of millions of neurons.

I found this fascinating and it is worth a read as well as a bookmark.

Friday 1 November 2019

3 tips on how to stop misusing or under-utilising corporate data by Alex Toews via @Infomgmt

Few organizations have assessed how their data can be put to work in the most productive way. This leaves them vulnerable to inefficiencies and can prevent important information from making its way 'to the top.'

A data model is a great place to start as you can begin to understand how the data relates to each other. The data dictionary is also useful as you can see which fields are repeated which is crucial if you want to understand how you can join data together from different sources. Just pay attention to formats and if any conversion needs to be done.

Wednesday 30 October 2019

WEBINAR: Continuous Integration/Continuous Deployment for Machine Learning - 6th November 2019

Live Webinar: CI/CD for Machine Learning

November 6th, 2019 @ 12pm EST

CI/CD (Continuous Integration/Continuous Deployment) has long been a successful process for most software applications. The same can be done with Machine Learning applications, offering automated and continuous training as well as continuous deployment of machine learning models. Using CI/CD for machine learning applications creates a truly end-to-end pipeline that closes the feedback loop at every step of the way, and maintains high performing ML models. It can also bridge science and engineering tasks, causing less friction from data, to modeling, to production and back again. Join CEO of cnvrg.io Yochay Ettun as he takes you through how to create a CI/CD pipeline for machine learning, and set up continuous deployment in just one click. With a depth of knowledge in all the latest research, Yochay will share with you today's top methods for applying CI/CD to machine learning.

Key webinar takeaways:

Configure and execute continuous training and continuous deployment for ML
Define dependencies and triggers
Automatically connect data pipeline, machine learning pipeline and deployment pipelines
Integrate model bias detection or fairness and accuracy validations
Build monitoring infrastructure to close the data feedback loop
Collect live data for improved model performances

Unable to attend? Register for a recording of the webinar and copy of the presentation following the live event.

Comparing Machine Learning as a Service: Amazon, Microsoft Azure, Google Cloud AI, IBM Watson by Olexander Kolisnykov via @topbots

The article will guide you through the best MLaaS platforms on the market and lists some infrastructural decisions to be made and some important considerations to keep in mind when choosing an MLaaS platform.

This has so much detail and is very very useful. This is worth a bookmark and if the platform allows applause of full cudos to the author Olexander.

Monday 28 October 2019

3 Advanced Python Functions for Data Scientists by Dario Radečić via @TDataScience

Make your code cleaner and more readable by not reinventing the wheel.

Useful functions that are worth making note of and trying to use more in your python code.

Friday 25 October 2019

Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch by @jrdothoughts via @TDataScience

The new release of PyTorch includes some impressive open-source projects for deep learning researchers and developers.

Interesting new features that definitely call for some experimenting to see what they can really do.

Wednesday 23 October 2019

The 3 Missing Roles that every Data Science Team needs to Hire by @kesaritweets via @TDataScience

In a mad rush to hire Data Scientists, most companies overlook three key roles and this often leads to failure of projects

The roles here will be missing for many organisations or strategies so I found it really interesting to read about them. Certainly, something to bear in mind and consider.

Tuesday 22 October 2019

WEBINAR: Demystifying AI & ML: Making Your Data Talk - 31 October 2019

Data Science Central Webinar Series Event

Demystifying AI & ML: Making Your Data Talk
Join us for the latest DSC Webinar on October 31^st, 2019

Learn the basics of AI and Machine Learning and understand how to improve your organization’s experience and data optimization with the power of Augmented Intelligence. Provided will be an overview of an AI strategy and you will learn about our roadmap in AI, Natural language and Machine learning.

In this latest Data Science Central webinar we will review:

Basics of AI and machine learning
Importance of Augmented Intelligence vs Artificial Intelligence
A unique approach to AI that allows you to make the most of your data and AI investments

Featured Speaker:
Vinay Kapoor, Director, Product Management -- Qlik

Hosted by: Rafael Knuth, Contributing Editor -- Data Science Central

Title:	Demystifying AI & ML: Making Your Data Talk
Date:	Thursday, October 31^st, 2019
Time:	9 AM - 10 AM PDT

Space is limited so please register early:

Reserve your Webinar seat now