Data privacy requirements necessitate not only identifying the location and nature of impacted data, but also the flow and transformation that it takes throughout the application landscape.
A great explanation and worth a read just to make sure you really understand the topic.
This is a blog containing data related news and information that I find interesting or relevant. Links are given to original sites containing source information for which I can take no responsibility. Any opinion expressed is my own.
Friday, 20 December 2019
Wednesday, 18 December 2019
How to build pipelines with pandas using pdpipe by Tirthajyoti Sarkar via @TDataScience
This tutorial describes how to build intuitive and useful pipelines with pandas DataFrames using the pdpipe library.
A great tutorial which includes some code too. Definitely worth a bookmark.
A great tutorial which includes some code too. Definitely worth a bookmark.
Labels:
DATA,
DATA SCIENCE,
NLTK,
PANDAS,
PIPELINE,
PYTHON,
SCIKIT-LEARN
Monday, 16 December 2019
An introduction to Kubernetes by/via @jeremyjordan
This is a great blog which will tell you what it is. How to use it. What it’s good for.
This is a perfect place to start learning about Kubernetes and thinking about what you can use it for. There are great code extracts as well as a list of useful links at the bottom.
This is a perfect place to start learning about Kubernetes and thinking about what you can use it for. There are great code extracts as well as a list of useful links at the bottom.
Friday, 13 December 2019
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead by Adrian Colyer via @kdnuggets
The two main takeaways from this paper: firstly, a sharpening of my understanding of the difference between explainability and interpretability, and why the former may be problematic; and secondly some great pointers to techniques for creating truly interpretable models.
I enjoyed this article and his points which are very relevant.
I enjoyed this article and his points which are very relevant.
Wednesday, 11 December 2019
The Problem with “Biased Data” by Harini Suresh via @Medium
Poorly defined terminology could actually play a role in biased data, says Harini Suresh. “The right terminology forms a mental framework, making it that much easier to identify problems, communicate, and make progress. The absence of such a framework, on the other hand, can be actively harmful, encouraging one-size-fits-all fixes for ‘bias,’ or making it difficult to see the commonalities and ways forward in existing work.”
I like this great article by Harini Suresh. I have noticed that you need to have an agreed set of definitions for all the data fields, the calculations, the methodologies, and even the data sources because that there are so many synonyms and opposing definitions for all of those that you need to measure like with like in the same way if you want to try and avail bias - if you do not you have already lost the battle.
I like this great article by Harini Suresh. I have noticed that you need to have an agreed set of definitions for all the data fields, the calculations, the methodologies, and even the data sources because that there are so many synonyms and opposing definitions for all of those that you need to measure like with like in the same way if you want to try and avail bias - if you do not you have already lost the battle.
Tuesday, 10 December 2019
WEBINAR: From Degas to Dashboards: Lessons of the Great Masters - 17 December 2019
Data Science Central Webinar Series Event | |||||||||||||||||||
|
Monday, 9 December 2019
Deep learning has hit a wall by Alex Woodie via @datanami
“The rapid growth in the size of neural networks is outpacing the ability of the hardware to keep up,” said Naveen Rao, vice president and general manager of Intel’s AI Products Group. Solving the problem will require rethinking how processing, network, and memory work together.
This sounds like a physical limitation that needs a two-pronged approach - one needs to be hardware advances but the other is an adaptation to the tools and techniques used to do AI and deep learning.
This sounds like a physical limitation that needs a two-pronged approach - one needs to be hardware advances but the other is an adaptation to the tools and techniques used to do AI and deep learning.
Friday, 6 December 2019
How to Speed up Pandas by 4x with one line of code by @GeorgeSeif94 via @kdnuggets
Pandas is the go-to library for processing data in Python. It’s easy to use and quite flexible when it comes to handling different types and sizes of data. It has tons of different functions that make manipulating data a breeze.
I sure hope this works - I can certainly see what he means.
I sure hope this works - I can certainly see what he means.
Wednesday, 4 December 2019
Nordic data debacles tell story of numbers that aren’t true by Nick Rigillo and Catherine Bosley via @infomgmt
Scandinavia is offering a fresh case study this month in how even the world’s richest countries can struggle to measure their own economies and trust the data.
This is a lesson which we should all learn from and use it to make absolutely sure that we are sure of our numbers and the data source as well as the methodology we use to make any calculation within analytics.
This is a lesson which we should all learn from and use it to make absolutely sure that we are sure of our numbers and the data source as well as the methodology we use to make any calculation within analytics.
Tuesday, 3 December 2019
WEBINAR - ML/AI Models: Continuous Integration & Deployment 11 December 2019
Data Science Central Webinar Series Event | |||||||||||||||||||
|
WEBINAR: Real-Time Analytics at Scale with High Velocity Data - 12 December 2019
Data Science Central Webinar Series Event | |||||||||||||||||||
|
Monday, 2 December 2019
'Big data' and 'analytics' - Two of the top buzzwords everyone secretly hates by ohn-David McKee via @infomgmt
Buzzwords are frequently abused as an attempted credibility builder. A way of showing others that you're in the know.
I agree - they are often used out of context and that just tells me that the user doesn't actually understand the word properly and what it entails to be actually delivered properly. I think Artificial Intelligence is used too often and that it is used too much as the fall guy by people who don't understand it.
I agree - they are often used out of context and that just tells me that the user doesn't actually understand the word properly and what it entails to be actually delivered properly. I think Artificial Intelligence is used too often and that it is used too much as the fall guy by people who don't understand it.
Friday, 29 November 2019
Coding habits for data scientists by David Tan via @thoughtworks
Code to train ML models can get messy fast. This article identifies the bad habits that add complexity in code and suggests good habits to cultivate in order to declutter your code.
Some great advice in this article and some great examples of python code - both good and bad.
Some great advice in this article and some great examples of python code - both good and bad.
Thursday, 28 November 2019
WEBINAR: Automating Regulatory Compliance with Data Wrangling - 10 December 2019
Data Science Central Webinar Series Event | |||||||||||||||||||
|
Wednesday, 27 November 2019
Quantum Computing Holds Promise for Banks, Executives Say by @SCastellWSJ via @WSJ
“In the universe of industries where there is a potential quantum advantage, you could argue that finance has got the shortest path to impact,” says Jeremy Glick, head of research-and-development engineering at Goldman Sachs. But first, we need to build hardware that doesn’t exist yet, and then we need to come up with a really good idea on how to use it.
I love the promise of this and can't wait to see it more widely used - the benefits will be massive and give a great advantage to those companies who utilise it fully.
I love the promise of this and can't wait to see it more widely used - the benefits will be massive and give a great advantage to those companies who utilise it fully.
Tuesday, 26 November 2019
WEBINAR: Train & Tune Your Computer Vision Models at Scale - 5 December 2019
Data Science Central Webinar Series Event | |||||||||||||||||||
|
Monday, 25 November 2019
Google denies it’s using private health data for AI research by Gerrit De Vynck via @infomgmt
Google’s deal with Ascension has been under scrutiny since the Wall Street Journal reported on Monday the company was collecting identifiable data on millions of patients and using it to build new products.
Interesting. I'm sure there is a data privacy issue there, although how would you know they have been using your data??
Interesting. I'm sure there is a data privacy issue there, although how would you know they have been using your data??
Friday, 22 November 2019
30 Helpful Python Snippets That You Can Learn in 30 Seconds or Less by @FatosMorina via @TDataScience
Sometimes all you need is a code snippet.
Very useful code snippets and useful to check your own code knowledge.
Very useful code snippets and useful to check your own code knowledge.
Wednesday, 20 November 2019
Getting better at predicting organised conflict by Tate Ryan-Mosley via @techreview
New techniques, machine learning, and better data gathering have made predictions both more useful and more granular. In this MIT Technology Review article, one predictive model is applied to look at violence in Ethiopia since the election of Abiy Ahmed, the new Nobel Peace Prize winner.
I loved this really insightful article which has some great diagrams that help with understanding.
I loved this really insightful article which has some great diagrams that help with understanding.
Monday, 18 November 2019
WEBINAR: 20 Predictions for 2020 from AI to Data Management - 21 November 2019
Data Science Central Webinar Series Event | |||||||||||||||||||
|
Want a data science job? Use the weekend project principle to get it by @mrdbourke via @Medium
Online course certificates are great. But projects of your own are better.
My suggestions are to join Kaggle, Data Science Central or some other forum where you can access free data and do some analyses that show or prove something.
My suggestions are to join Kaggle, Data Science Central or some other forum where you can access free data and do some analyses that show or prove something.
Friday, 15 November 2019
New Survey: Nearly Two Thirds of Analytics Projects Are Jeopardised Due to Poor Access to the Right Data by/via @insideBigData
According to a recent survey, 57% of organizations have been unable to access real-time analytics or suffered inaccurate business intelligence because of a lack of access to the right data.
I think mirrors of production databases are useful places to run real-time data analytics against. You just need to be very careful to understand that data so that you still use facts and truth and not a subset of it.
I think mirrors of production databases are useful places to run real-time data analytics against. You just need to be very careful to understand that data so that you still use facts and truth and not a subset of it.
Thursday, 14 November 2019
WEBINAR: Hadoop-to-Cloud Migration: How to modernize your data and analytics architecture - 21 November 2019
|
Wednesday, 13 November 2019
Common Data Mistakes to Avoid by/via @geckoboard
“Statistical fallacies are common tricks data can play on you, which lead to mistakes in data interpretation and analysis.” Here’s a look at some of the common fallacies, with examples, a downloadable poster, and - more importantly - ways to avoid them.
This was really useful to remind you of all the potential mistakes you can make. There is also a great poster that can be downloaded to remind you of all these great points. Definitely, something to bookmark and keep.
This was really useful to remind you of all the potential mistakes you can make. There is also a great poster that can be downloaded to remind you of all these great points. Definitely, something to bookmark and keep.
Monday, 11 November 2019
WEBINAR: Enterprise-ready Data Science and ML with Python - 19th November 2019
Data Science Central Webinar Series Event | |||||||||||||||||||
|
When it comes to data, why the 'garbage in, garbage out' doctrine is all wrong by Michael Kanellos via @infomgmt
The problem is that there’s way too much of it and it’s not organized in a way that makes it easy to understand. It doesn’t form beautiful crystalline patterns like salt: it’s more like a huge pile of gravel.
It's clear to me that you can check the quality of your data, but you shouldn't throw away anything that doesn't match your vision or correctness. Flag it as not being "right" but don't lose it - it could still give useful insights. Think of it this way - financial data must equal what is going into the financial ledgers. If you include the bad data it probably will. just make sure you mark r it in some way.
It's clear to me that you can check the quality of your data, but you shouldn't throw away anything that doesn't match your vision or correctness. Flag it as not being "right" but don't lose it - it could still give useful insights. Think of it this way - financial data must equal what is going into the financial ledgers. If you include the bad data it probably will. just make sure you mark r it in some way.
Friday, 8 November 2019
Four people your data team needs to win the model deployment relay by Sarah Gates via @infomgmt
To be effective at model management you need a strong team. The good news is that you don’t need a lot of people to accomplish this. Just like a relay race, the right four people can manage the complete model lifecycle.
This is a great read if you have no idea how to do this and want to know how many people are needed to do all of that. Very useful article.
This is a great read if you have no idea how to do this and want to know how many people are needed to do all of that. Very useful article.
Wednesday, 6 November 2019
Why is a data governance business case hard to get approved? by Nicola Askham via @infomgmt
It can be a real struggle to get your data governance initiative approved in the first place. So I wanted to have a look at the reasons why this might be the case so that you can both plan for and mitigate them.
I agree - it is actually very important BUT it is almost like a last resort if, and only if, there is time or there is enough of a benefit that can be clearly shown.
I agree - it is actually very important BUT it is almost like a last resort if, and only if, there is time or there is enough of a benefit that can be clearly shown.
Tuesday, 5 November 2019
WEBINAR: Real-Time Actionable Data Analytics - 13 November 2019
IoT Central Webinar Series Event | |||||||||||||||||||
|
Monday, 4 November 2019
This New Google Technique Help Us Understand How Neural Networks are Thinking by @jrdothoughts via @TDataScience
Interpretability remains one of the biggest challenges of modern deep learning applications. The recent advancements in computation models and deep learning research have enabled the creation of highly sophisticated models that can include thousands of hidden layers and tens of millions of neurons.
I found this fascinating and it is worth a read as well as a bookmark.
Friday, 1 November 2019
3 tips on how to stop misusing or under-utilising corporate data by Alex Toews via @Infomgmt
Few organizations have assessed how their data can be put to work in the most productive way. This leaves them vulnerable to inefficiencies and can prevent important information from making its way 'to the top.'
A data model is a great place to start as you can begin to understand how the data relates to each other. The data dictionary is also useful as you can see which fields are repeated which is crucial if you want to understand how you can join data together from different sources. Just pay attention to formats and if any conversion needs to be done.
A data model is a great place to start as you can begin to understand how the data relates to each other. The data dictionary is also useful as you can see which fields are repeated which is crucial if you want to understand how you can join data together from different sources. Just pay attention to formats and if any conversion needs to be done.
Wednesday, 30 October 2019
WEBINAR: Continuous Integration/Continuous Deployment for Machine Learning - 6th November 2019
|
Comparing Machine Learning as a Service: Amazon, Microsoft Azure, Google Cloud AI, IBM Watson by Olexander Kolisnykov via @topbots
The article will guide you through the best MLaaS platforms on the market and lists some infrastructural decisions to be made and some important considerations to keep in mind when choosing an MLaaS platform.
This has so much detail and is very very useful. This is worth a bookmark and if the platform allows applause of full cudos to the author Olexander.
This has so much detail and is very very useful. This is worth a bookmark and if the platform allows applause of full cudos to the author Olexander.
Monday, 28 October 2019
3 Advanced Python Functions for Data Scientists by Dario Radečić via @TDataScience
Make your code cleaner and more readable by not reinventing the wheel.
Useful functions that are worth making note of and trying to use more in your python code.
Useful functions that are worth making note of and trying to use more in your python code.
Friday, 25 October 2019
Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch by @jrdothoughts via @TDataScience
The new release of PyTorch includes some impressive open-source projects for deep learning researchers and developers.
Interesting new features that definitely call for some experimenting to see what they can really do.
Interesting new features that definitely call for some experimenting to see what they can really do.
Labels:
CAPTUM,
CRYPTEN,
DATA,
DETECTRON2,
MACHINE LEARNING,
ML,
MODEL,
PYTORCH
Wednesday, 23 October 2019
The 3 Missing Roles that every Data Science Team needs to Hire by @kesaritweets via @TDataScience
In a mad rush to hire Data Scientists, most companies overlook three key roles and this often leads to failure of projects
The roles here will be missing for many organisations or strategies so I found it really interesting to read about them. Certainly, something to bear in mind and consider.
The roles here will be missing for many organisations or strategies so I found it really interesting to read about them. Certainly, something to bear in mind and consider.
Tuesday, 22 October 2019
WEBINAR: Demystifying AI & ML: Making Your Data Talk - 31 October 2019
|
Subscribe to:
Posts (Atom)