Data: 2018

Monday, 31 December 2018

Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret by various authors via @nytimes

Smartphone location data is being sold. It’s supposed to be anonymous, but the data shows how personal it is.

This is scary and could well be something to be cautious with - maybe a time to revisit app permissions on your mobile?

Friday, 21 December 2018

8 top artificial intelligence and analytics trends for 2019 by Srinivasa Vegi via @infomgmt

Artificial intelligence will deliver approximately $2 trillion worth of business value worldwide over the next year. Companies that fail to adopt AI will lose out. Some industries may even be wiped out.

Interesting and ties with my own thoughts quite well.

Wednesday, 19 December 2018

What Is a Data Frame? (In Python, R, and SQL) by/via @oilshellblog

This post introduces data frames and shows how they work by solving the same problem three ways: without data frames, with data frames in Python and R, and in plain SQL.

I love this which allows you to compare and contrast the method across all three so that you can see the idea is the same but the implementation is different. Definitely worth a bookmark.

Monday, 17 December 2018

Git Your SQL Together (with a Query Library) by/via @beeonaposy via

Caitlin Hudon recommends tracking SQL queries in Git. Here she explains how she created a git repository for saving and sharing commonly (and uncommonly) used queries while tracking any changes made to these queries over time.

Good practice for sure. I either use Git or Google Drive. Either way it is good practice to save and keep records of SQL queries you have used.

Friday, 14 December 2018

6 predictions for the future of analytics by Beverly Wright via @infomgmt

The dynamic nature and improved capabilities for analytics continues to excite and enable companies and even individuals to do more and in better ways.

Some interesting predictions. Certainly with items 3 in her list that has been a priority for a while now but I certainly see this area becoming increasingly more important especially as we start to use AI and ML more and more.

3 key elements to make data demonetisation possible by Matt Maccaux via @infomgmt

Businesses that are not realising the full potential value of data are leaving untapped opportunities on the table and are at real risk of being disrupted by companies that are driving forward with an analytics agenda.

I find it bizarre that a company can recognise the importance of their data by don't have a strategy for it. Data is an asset for a company as much as any physical object and it therefore deserves the time, attention and strategy as much as anything else.

Wednesday, 12 December 2018

Facial recognition snares China’s air con queen Dong Mingzhu for jaywalking, but it’s not what it seems by @litaoscmp via @SCMPTech

Dong Mingzhu, chair of China's Gree Electric Appliances, found her face splashed on a huge screen erected along a street in the port city of Ningbo that displays images of people caught jaywalking by surveillance cameras. But local police say the facial recognition program actually nabbed an advertisement on the side of a moving bus.

OMG - what a spectacular failure of AI - something obviously went wrong in the testing of this technology. It seems that no matter how much you test something you will always find a curve ball that proves that you did not test it enough. Definitely an embarrassing fail..

Monday, 10 December 2018

Activation Regularisation for Reducing Generalisation Error in Deep Learning Neural Networks by/via @TeachTheMachine

This tutorial on activation regularisation for reducing generalisation error in deep learning neural networks will help you create better-learned representations and improve predictive models that make use of the learned features.

This is great and I recommend a bookmark to Jason's website as well as subscribing there. Everything he does and explains is very clear and easy to understand.

Friday, 7 December 2018

How better standards can decrease data security spending needs by Anna Johansson via @infomgmt

Recently, standardisation at the highest levels has opened new doors for companies seeking cyber security solutions that don’t cost a fortune and work better than current approaches

Anna makes some good points - chaos with data, systems or interfaces add needless complexity to an organisation that can give areas that could be exploited. You need ordered data with tight controls.

Wednesday, 5 December 2018

Harvard researchers want to school Congress about AI by Karen Hao via @techreview

A tech boot camp will teach US politicians and policymakers about the potential, and the risks, of artificial intelligence.

This sounds like a great idea. Maybe they could repeat it for other country's legislators too?

Tuesday, 4 December 2018

WEBINAR: Data Prep For Data Ops: How To Select & Deploy - 12 December 2018

Data Science Central Webinar Series Event

Data Prep For Data Ops: How To Select & Deploy
Join us for the latest DSC Webinar on December 12^th, 2018

In recent years, a new term in data has cropped up more frequently: DataOps. As an adaptation of the software development methodology DevOps, DataOps refers to the tools, methodology and organisational structures that businesses must adopt to improve the velocity, quality and reliability of analytics. Widely recognised as the biggest bottleneck in the analytics process, data preparation is a critical element of building a successful DataOps practice by providing speed, agility and trust in data.

Join guest speaker, Forrester Senior Analyst Cinny Little, for this latest Data Science Central webinar focusing on how to successfully select and deploy a data preparation solution for DataOps. The presentation will include insights on data preparation found in the Forrester Wave™: Data Preparation Solutions, Q4 2018.

During this webinar you will learn:

Where does data preparation fit within DataOps
What are the key technical & business differentiators of data preparation solutions
How to align the right technologies, people and processes

Speakers:
Will Davis, Sr. Director of Product Marketing -- Trifacta
Cinny Little, Senior Analyst -- Forrester

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	Data Prep For Data Ops: How To Select & Deploy
Date:	Wednesday, December 12^th, 2018
Time:	9 AM - 10 AM PDT

Space is limited so please register early:

Reserve your Webinar seat now

After registering you will receive a confirmation email containing information about joining the Webinar.

Monday, 3 December 2018

The Big Data Game Board™ by William Schmarzo via @kdnuggets

Move aside “Monopoly,” “Risk,” and “Snail Race!” Time to teach the youth of the world of an important, career-advancing game: how to leverage data and analytics to change your life! Introducing the “Big Data Game Board™”!

This is great and well worth your investment in time to read and bookmark. I think this could provide you with a clear roadmap on what you need to do in your own organisation.

Friday, 30 November 2018

WEBINAR: Connected Intelligence Solutions with AI and ML - 11 December 2018

Data Science Central Webinar Series Event

Connected Intelligence Solutions with AI and ML
Join us for this latest DSC Webinar on December 11^th, 2018

TIBCO Connected Intelligence solutions for energy and utility companies provide powerful capabilities. Our platform connects data, systems, processes, and people—and it delivers predictive analytics, AI, and data visualisations for all aspects of asset management, customer information, distribution, forecasting, production, and supply chain.

The TIBCO platform can help you reduce costs and downtime, increase output, and improve customer retention. It lets you embed machine learning into sensors, processes, and equipment for modernised grids and smarter oil fields.

Speaker: Michael O'Connell, Chief Analytics Officer -- TIBCO Software, Inc.

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	Connected Intelligence Solutions with AI and ML
Date:	Tuesday, December 11^th, 2018
Time:	9:00 AM - 10:00 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

Thursday, 29 November 2018

Tips for protecting your data when losing an employee by Jason Park via @infomgmt

Most employers would be surprised to learn that departing internal employees can pose a much bigger threat to their business’s data security than external hackers.

These are really good guidelines. Some organisations take away access as soon as an employee tenders their resignation or at least limits it - however I would sound a small caution there - if someone is that keen to take a copy of data they will do that BEFORE they resign - so you have to have good auditing and great control over data transfers/data sticks in your office.

Tuesday, 27 November 2018

Understanding the new ePrivacy Regulation and how it differs from GDPR by Christian Auty via @infomgmt

The ePR is expected to address electronic communications, including text messages, email, chat applications and IoT devices. Think of the ePR as the traffic cop for data as it travels between controllers and processors governed by GDPR.

This is an insightful article by Christian that I think is a good high level analysis of the differences between the two.

Friday, 23 November 2018

WEBINAR: AI Models And Active Learning - 4 December 2018

Data Science Central Webinar Series Event

AI Models And Active Learning

Join us for this latest DSC Webinar on December 4^th, 2018

The increased availability of computer resources and the prevalence of high-quality training data combined with smart learning schemas, have resulted in a rise in successful AI deployments. However, many organisations simply have too much data, posing a challenge for data scientists: unless at least some of that data is labelled, it's essentially useless for any ML approach that relies on supervised or semi-supervised learning. So, which data needs to be labelled? How much of a dataset needs to be labelled for an ML application to be viable? How can we solve the problem of having more data than we can reasonably analyse?

One promising answer is active learning. Active learning is unique in that it can both solve this data labelling crisis and train models to be more accurate with less data overall. Join us for this latest Data Science Central webinar where we’ll cover:

The pros and cons of active learning as an approach
The three major categories of active learning
How your active learner should decide which rows need labelling
How to obtain those labels
How to tell if active learning is appropriate for your ML project

Speaker: Jennifer Prendki, VP of Machine Learning -- Figure Eight

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	AI Models And Active Learning
Date:	Tuesday, December 4^th, 2018
Time:	9:00 AM - 10:00 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

Wednesday, 21 November 2018

Comparing the performance of machine learning models and algorithms using statistical tests and nested cross-validation by/via @rasbt

Sebastian Raschka compares the performance of machine learning models and algorithms using statistical tests and nested cross-validation.

This blog is great and very much worth a bookmark. Go and look through the entire series of articles - this is useful bot both those new to data science and those who are experienced too.

Tuesday, 20 November 2018

WEBINAR: Transforming 3rd Party Data Into Actionable Insights - 28 November 2018

The rise of third party or external data has given data scientists and organisations additional building blocks to discover breakthrough insights. But many data scientists struggle to understand what third party data is relevant and struggle further to efficiently access and transform that data.

In today’s Data Science Central webinar, we’ll explore innovative techniques to simplify third party data access and transformation.

You will learn:

Techniques for assessing third party data quality and relevance
Strategies for accessing third party data
Information about the third party data landscape as it applies to business outcomes

Speakers:
Mark Hookey, CEO -- DemystData
Richard Scioli, General Manager, Platform -- DemystData

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	Transforming 3rd Party Data Into Actionable Insights
Date:	Wednesday, November 28^th, 2018
Time:	09:00 AM - 10:00 AM PST

Space is limited so please register early:

Reserve your Webinar seat now

After registering you will receive a confirmation email containing information about joining the Webinar.

Monday, 19 November 2018

Managing risk in machine learning by Ben Lorica via @OReillyMedia

Machine learning models are becoming mission critical. Ben Lorica reveals data from a recent survey on ML adoption and discusses some important considerations for managing risk in machine learning.

This is really clear and easy to understand. A good place to start and it will give you something to think about. Maybe it will give you something to consider in your own processes?

Wednesday, 14 November 2018

Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data by @koehrsen_will via @Medium

Here's an explanation of Simpson's paradox and some interesting aspects of this statistical phenomenon, such as correlation reversal.

I love this - it's definitely worth a bookmark and some applause on Medium for an insightful and well written explanation of this important principle.

Monday, 12 November 2018

WEBINAR: Scaling Big Data Pipelines in Apache Spark, No Coding Required - 15 November 2018

Various companies across multiple industries collect and house vast amounts of data. However, most face the same challenge: the ability to process big data and quickly find insight within its framework. Introducing KnowledgeSTUDIO with Apache Spark, the ultimate solution for both data scientists and data analysts. The graphical user interface with Big Data capabilities allows organizations to build pipelines seamlessly.

Topic: Scaling Big Data Pipelines in Apache Spark, No Coding Required
Date: Thursday, November 15, 2018
Time: 2 p.m. ET/11 a.m. PT
Datawatch Speakers: Dr. Steve Walker, Sr. Data Scientist and Mike Rowley, Product Director

Join us and learn how users of KnowledgeSTUDIO for Apache Spark, a wizard-driven productivity tool for building Spark workflows, have overcome these challenges.

Learn how data science teams can:

Utilise interactive workflows with an automated design canvas for building, displaying, refreshing, and reusing analytic models
Automatically generate code that can be customised and incorporated into production scripts
Include manually written code within the graphical workflow
Leverage advanced modelling with open source packages such as Spark ML, Spark SQL
Avoid overhead costs of parallelisation when datasets are very small
Build, explore data segments, and discover relationships using patented Decision Tree technology

The Future of Cybersecurity: How to Protect Your Business from Great Data Risks by/via @Datafloq

A data breach can have severe consequences for your business (and your career). And a recent OTA report concluded that 93% of data breaches were entirely avoidable. Taking these steps to avoid a data breach can save you a lot of headaches down the road.

Good list of steps to make sure you are aware of and doing something about - definitely something to use as a light level list to take forward and expand from.

Wednesday, 7 November 2018

3 best practices for improving and maintaining data quality by Maxim Lukichev via @infomgmt

Organisations are increasingly relying on insights generated by data analysis, and they realise that insights are only as good as the data they come from.

Maxim makes some very good points in here. I think any data analysis with bad data is at best worthless and at worst destructive for your business as you will be making key decisions based on something which is not correct. It is important that you validate your data to make sure it is trustworthy and have a network of data stewards in your business to ensure that data is correct and processes and in some cases systems are updated to make sure that quality is improved and assured going forward.

Monday, 5 November 2018

How to build your own AlphaZero AI using Python and Keras by David Foster via @Medium

This tutorial shows you how to build a replica of the AlphaZero methodology to play the game Connect 4—and how to adapt the code for other games.

This looks really good and is worth following and trying.

Wednesday, 31 October 2018

Machine learning — Is the emperor wearing clothes? by Cassie Kozyrkov via @Medium

Cassie Kozyrkov, chief decision intelligence engineer at Google, offers a "behind-the-scenes look at how machine learning works."

This was really interesting and made me think about everything in a bit more detail.

Tuesday, 30 October 2018

9 Developments In AI That You Really Need to Know by John Welsh via @Forbes

Speakers at the World Summit AI offer up nine bits of advice for people working in AI.

This was really interesting reading and definitely worth a read.

Monday, 29 October 2018

Convolutional Neural Net in Tensorflow by Stephen Barter via @Medium

Here's a look at the fundamentals of convolutional neural nets and how you can create one yourself to classify handwritten digits.

This is a great guide and I think it is well worth a subscription to see what else the author has written on Medium - so much in this article to learn from.

Thursday, 25 October 2018

The Main Approaches to Natural Language Processing Tasks by Matthew Mayo via @kdnuggets

Let's have a look at the main approaches to NLP tasks that we have at our disposal. We will then have a look at the concrete NLP tasks we can tackle with said approaches.

Good lists of approaches with examples that are useful for both the learner and the more experienced practitioner to keep on hand to remind you or them all.

Wednesday, 24 October 2018

8 ways agile methodologies can improve a firm’s culture by Greg Robinson via @infomgmt

Agile project management is becoming hugely popular. It's no wonder. Agile teams are proving that traditional project management strategies fall short. Startups and large corporations are adopting agile principles to stay competitive.

Seems Agile is the way to go if you want to have a more cohesive team that works better together and is happier while they are doing it.

Tuesday, 23 October 2018

Amazon's gender-biased algorithm is not alone by Cathy O'Neil via @infomgmt

Internet giant Amazon recently ran into a problem that eloquently illustrates the pitfalls of big data: It tried to automate hiring with a machine learning algorithm, but upon testing it realised that it merely perpetuated the tech industry’s bias against women

I agree with Cathy here - Amazon should be congratulated for a) testing it properly and b) doing something about it when it was clear there was a problem. It cannot be acceptable to just use the excuse (for that is what it actually is) that you didn't know so cannot be liable. It really makes me mad when we all know that bias is a risk and we should all do the due diligence to test properly to make sure that we ensure it is no longer there. Please recognise bias as a risk and test carefully for it by using someone who is not on your team so they have fresh eyes.

Monday, 22 October 2018

The neural history of natural language processing by Sebastian Ruder via @_aylien

Here's a review of the last 15 years of natural language processing (NLP) research.

I love this and think it is worth a read if only to remind yourself on how far we have already come and that judging from the pace of change great things are always possible and coming at some point in the future.

Tuesday, 16 October 2018

12 trends impacting the future of data management jobs by David Weldon via @infomgmt

Technologies such as artificial intelligence, the Internet of Things and augmented reality are changing how employees work and what skills employers need. Here are 12 top trends that will reshape the workforce over the next five years.

Some interesting thoughts. I think the most important thing I can suggest is that you read and keep up with new technologies and trends so that you understand them so that you are ready to move into them whenever you are able to. After all you might be able to save your employer money, improve processes and get valuable skills all at the same time.

Monday, 15 October 2018

IoT analytics guide: What to expect from Internet of Things data by Bob Violino via @NetworkWorld

Data capture, data governance, and availability of services are among the biggest challenges IT will face in creating an IoT analytics environment.

Interesting article that definitely highlights so of the challenges that are involved in IOT data and reporting off it. This is definitely a new data source with it's own challenges and will need you to rethink the kind of validation needed in order to make important decisions based upon it.

Friday, 12 October 2018

5 Data Science Projects That Will Get You Hired in 2018 by John Sullivan via @kdnuggets

A portfolio of real-world projects is the best way to break into data science. This article highlights the 5 types of projects that will help land you a job and improve your career.

As one of the comments on the article points out these are skills that you need to be able to show. My suggestion is that you use Kaggle to provide a project or at least the data for it., do the things in this as part of a project, and store the code and results on Github so that it can easily be seen.

Thursday, 11 October 2018

A Concise Explanation of Learning Algorithms with the Mitchell Paradigm by Matthew Mayo via @kdnuggets

A single quote from Tom Mitchell can shed light on both the abstract concept and concrete implementations of machine learning algorithms:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
- Tom Mitchell, "Machine Learning"

I really like this thoughtful and clear article discussing the quote form Tom.

Wednesday, 10 October 2018

5 mistakes even the best organizations make with product and customer data by Grant Emison via @infomgmt

CIOs are responsible for the lifeblood of the enterprise, its information, and their purview reaches every corner of the organisation. Mistakes can lead to a loss in productivity, a damaged corporate reputation, security breaches, lawsuits and more.

I have to agree with Grant's last point. We can read it, work hard and try to not make any mistakes, but the chances are we WILL make a mistake - the important thing is to LEARN from the mistake so we don't keep making the same one over and over again.

Tuesday, 9 October 2018

How DeepMind's biggest AI project is fixing bad Android batteries by Matt Burgess via @WiredUK

Google's Android Pie operating system uses DeepMind's AI in a bid to improve your phone's battery life. But is it making any difference?

This sounds great and of course over time it will get even better.

Monday, 8 October 2018

Can We Make Artificial Intelligence Accountable? by @ctowersclark via @forbes

The ability to open the black box is the holy grail of AI—particularly for industries like law, healthcare, and finance that handle sensitive customer data. IBM may have an answer.

I love the sound of this bias detection software as it's one of those things that you have to watch out for but find it hard to find in your own model - I usually recommend someone else not connected to your area check for bias as they are fresh eyes but if you can use code it will a) improve the detection and b) give some kind of audit trail to show that you don't have bias and did everything possible to ensure there was none.

Friday, 5 October 2018

6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study by John Sullivan, via @kdnuggets

Writing a machine learning algorithm from scratch is an extremely rewarding learning experience. We highlight 6 steps in this process.

Great article with very clear steps to follow - I don't think I will be brave enough to do that yet - I need a time with more free time and the courage to work through it all. It does however seem to be a great set of steps to work from - worth a bookmark I think.

Thursday, 4 October 2018

Why customer data research is more important than ever by Megan Harris via @infomgmt

The rise of social media and advancements in marketing software has resulted in an increase in purpose-driven marketing tactics that have changed the way companies interact with consumers forever.

Interesting article. As computing power has increased and the data a company holds on us increases they are doing more and more sophisticated data analyses in order to increase sales to existing customers. After all it costs less to sell to an existing customer than it does to get a new customer via marketing, special offers etc.

Wednesday, 3 October 2018

Building the ideal data quality team starts with these roles by Wilfried Lemahieu, Seppe vanden Broucke and Bart Baesens via @infomgmt

Poor data quality impacts organisations in many ways. At the operational level, it has an impact on customer satisfaction, increases operational expenses and will lead to lowered employee job satisfaction.

Great list of job roles and a blueprint of roles that we could all aim for if we understand each one of them.

Tuesday, 2 October 2018

Meet the little-known group inside of Google that's fighting terrorists and trolls all across the web by Julie Bort via @BIUK

Here's a look at the team at Alphabet that interviews ISIS defectors, protects news and political websites from distributed denial of service (DDoS) attacks, and combats radicalism.

This is a great initiative and something that is not widely known - you certainly don't see it publicised on the main news channels.

Monday, 1 October 2018

What If.you could inspect a machine learning model, with no coding required? by/via @GoogleAI

Building effective machine learning systems means asking a lot of questions. It's not enough to train a model and walk away. Instead, good practitioners act as detectives, probing to understand their model better.

Kudos to them - they really are doing great things - I can only hope that one day I could be good enough to join them.

Thursday, 27 September 2018

Hadoop for Beginners by Aafreen Dabhoiwala via @kdnuggets

An introduction to Hadoop, a framework that enables you to store and process large data sets in parallel and distributed fashion.

A nice little overview of Hadoop although I do agree with the first comment by Randy about relational databases

Wednesday, 26 September 2018

Why organisations should regularly assess the KPIs they track by Kayla Matthews via @infomgmt

KPIs alone are not enough. Instead, it’s necessary to regularly re-evaluate all applicable KPIs to ensure they’re still providing information that’s relevant to the business at large and in line with its data governance strategies.

I definitely agree that it is important to adjust KPIs so that they stay relevant to your organisation and actually achieve what you want them to. Thing of all the time and money that could be wasted by not adjusting them so they are still relevant.

Tuesday, 25 September 2018

Artificial general intelligence: Dream goal, nightmare scenario or fantasy? by Herb Roitblat via @infomgmt

The quest for artificial general intelligence is the holy grail of artificial intelligence research, and, arguably, just as difficult to find. It may be a myth.

This is a really interesting piece by Herb and really makes you stop and think about what is and what is not possible. Definitely worth reading during a time when you have time to stop and think a bit about it.

Monday, 24 September 2018

Getting to ROI with AI in the enterprise by Tom Wilde via @infomgmt

Despite its promise, and its growing adoption, there is still too little we can point to in terms of real business results from artificial intelligence.

Some great thoughts from Tom although I'm certainly not convinced that many organisations have actually achieved something very tangible as a return on ther investment in AI.

Saturday, 22 September 2018

Essential Math for Data Science: ‘Why’ and ‘How’ by Tirthajyoti Sarkar via @kdnuggets

It always pays to know the machinery under the hood (even at a high level) than being just the guy behind the wheel with no knowledge about the car.

This is really useful - you can teach yourself statistics if your own skills are not up to scratch.

Friday, 21 September 2018

5 top strategies to make development cycles more efficient by Charles Dearing via @infomgmt

Software development is fraught with all sorts of pitfalls. Adopting the principles of Agile software development is one way to combat these inevitable pitfalls.

Some useful advice.

Thursday, 20 September 2018

WEBINAR: 4 Ways to Tackle Common Data Prep Issues - 25 September 2018

Anyone who's ever analysed data knows the pain of digging in only to find that
it is poorly structured, full of inaccuracies, or just plain incomplete. But "dirty data"
isn't just a pain point for analysts; it can have a major financial and cultural impact on
an organisation.

In this latest Data Science Central webinar, you will learn four actionable ways to
overcome common data preparation issues, including how to establish a company
standard for "clean data" and how to democratize data prep across your organisation.

Speaker: Louis Archer, London Manager -- Tableau

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	4 Ways to Tackle Common Data Prep Issues
Date:	Tuesday, September 25^th, 2018
Time:	9:00 AM - 10:00 AM PDT

New open challenge seeks to promote ethics in the use of AI and the news by David Weldon via @infomgmt

Toward that goal, a new open call is offering $750,000 for ideas that will shape the impact artificial intelligence has on the field of news and information.

Something to think about entering - I'm sure we all have thoughts on this.

Tuesday, 18 September 2018

Key steps to ensure data protection amidst the growth of mobile apps by Nathan Sykes via @infomgmt

As data protection regulations grow and the laws become more stringent, it has also become much more difficult to follow them because of widespread mobile adoption.

Some useful pointers to steer you in the right direction.

Friday, 14 September 2018

WEBINAR: Columnar Databases: Best Choice for Real-Time Analytics - 19th September 2018

Business today often calls for analyzing millions or even billions of rows of data
on demand and in real time. And although relational databases are unmatched
for transactional workloads, columnar databases allow for faster and easier analytical queries.

In this latest Data Science Central webinar, we'll explore how columnar databases can:

Decrease the need to read from disk
Massively compress your dataset
Eliminate the need for indexes
Support ad hoc, on-demand queries at any scale

We'll use MariaDB AX, an open source columnar database for enterprises, to demonstrate
how column-based storage can make your analytics more efficient, flexible and scalable –
without sacrificing standard SQL.

Speaker: Shane Johnson, Senior Director of Product Marketing -- MariaDB

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	Columnar Databases: Best Choice for Real-Time Analytics
Date:	Wednesday, September 19^th, 2018
Time:	9:00 AM - 10:00 AM PDT

Thursday, 13 September 2018

AI Knowledge Map: How To Classify AI Technologies by Francesco Corea via @kdnuggets

What follows is then an effort to draw an architecture to access knowledge on AI and follow emergent dynamics, a gateway of pre-existing knowledge on the topic that will allow you to scout around for additional information and eventually create new knowledge on AI.

I love the diagram and explanations in this article - it is worth printing and keeping to hand.

Wednesday, 12 September 2018

How blockchain technology could aid key data challenges by Kevin Peek via @infomgmt

A variety of healthcare provider organisations and health insurers are just beginning to deploy distributed information to solve vexing data issues.

Yes blockchain gives definite benefits and I think that organisations should be looking at it seriously to see if it solves some of the problems they are experiencing.

Tuesday, 11 September 2018

Master data management is not the answer to GDPR compliance by Aaron Zornes by @infomgmt

By themselves, neither data governance nor MDM offer sufficient capabilities to meet GDPR requirements. Together, we are much more empowered as an organisation.

This article contains some really interesting points that I had not realised. Worth reading and thinking about - maybe they are true in your own organisation and you are not aware and may need to make some changes.

Monday, 10 September 2018

Digital 'fixation' causing firms to throw good money at bad projects by Bob Violino via @infomgmt

Organisations risk wasting millions of dollars in the next 12 months, as they rush into flawed digital projects, according to a new study.

I have to agree - you must find a benefit from each project and it is also important that you look afterwards to ensure that the project actually DID give the benefits that was suggested - it might meant you have a few failures but in the long term it can only improve the process of providing evidence of potential benefits fr proposed projects.

Sunday, 9 September 2018

Machine Learning Cheatsheet via @readthedocs

Brief visual explanations of machine learning concepts with diagrams, code examples and links to resources for learning more.

Definitely something to be bookmarked.

Saturday, 8 September 2018

Data Visualisation Cheat Sheet by @jschwabish via @kdnuggets

Core principles for successful data visualisation, including tips on how to reduce clutter, preattentive processing and how to integrate text within the graph.

This is so very useful and is worthy of a bookmark for sure.

Friday, 7 September 2018

How advanced OCR found new life in big data systems by Anna Johansson via @infomgmt

Today, optical character recognition, in combination with natural language processing, allows businesses to perform complex data extraction tasks.

A great idea - use OCR to scan in old paper documents to fill the gaps in your online data - you will never get accurate results on analytics if you are missing data.

Thursday, 6 September 2018

o succeed at digital transformation, do a better job of data governance by Darren Cooper via @infomgmt

To set the stage for initiatives like AI and machine learning, companies need a rock-solid governance framework.

Great suggestions by Darren in this article.

Wednesday, 5 September 2018

GDPR compliance the perfect opportunity to modernise data architecture by Amandeep Khurana via @infomgmt

Compliance with the data privacy and security mandate enables organisations to become more agile in their product and service development and rollouts, and more efficient and effective in their ability to respond to market trends and competitive threats.

Yes this is exactly right - everything has to turn onto it's head and be data centric not application centric. I think we need to concentrate on:

WHERE is the data created
WHERE is it also stored (so where is it interfaced to)
HOW it is updated
WHAT changes when it is updated
HOW do you delete the data in ALL systems?

I would suggest you do something like a data flow diagram so you can document all of this for every piece of data.

Tuesday, 4 September 2018

The bias problem with artificial intelligence, and how to solve it by Sanjay Srivastava via @infomgmt

AI bias may come from incomplete datasets or incorrect values. Bias may also emerge through interactions overtime, skewing the machine’s learning. Moreover, a sudden business change, such as a new law or business rule, or ineffective training algorithms can also cause bias.

I agree - you need good quality and representative training data if you want to get good results from any AI and ML you want to use. My advice would be:

1. Take your time - rushing always leads to mistakes so be realistic with plans.
2. Be careful with the methodology you use to create and split your data into Training and Data.
3. Try to use separate teams to test the same piece of code - the hope being that it will help to avoid the bias. Think of it as a human version of a small parallel ML solution.
4. Check, check and check again.

Monday, 3 September 2018

Community lenders tell big tech vendors to get up to speed by Nathan DiCamillo via @infomgmt

Small banks and credit unions say slow responses and outdated products from the establishment tech vendor can become a drag on their innovation efforts.

I partially agree with him - yes large organisations move slow (particularly when you are a small customer and therefore your business is not a big loss to them if you move on) but small ones are less stable and sometimes that can be an unacceptable risk to the business (particularly in the financial sector where you just cannot afford an issue). So do really careful risk management and have SLAs to protect yourself.

Friday, 31 August 2018

WEBINAR: Getting Data Down to a Science – Code-free and Code-friendly ML - 5th September 2018

Data Science helps answer some of the most basic - and the most complex - business questions. In this latest Data Science Central webinar you will learn how to get data down to a science with code-free and code-friendly self-service analytics platforms. Decisive Data’s Lead Data Scientist Tessa Jones will use a sample data set from a global corporation to answer some of the most common data science questions applicable across businesses.

Learn how to use code-free and code-friendly Machine Learning:

Dive – Swim in the data and dive into a few common business questions with answers in data science including demand forecasting and customer segmentation.
Build – Walk through two data science models including code-free time series and clustering machine learning models.
Customize – Implement custom R code into models.
Refine – Enhance your methods with rapid self-service techniques.
Display – Creatively display information visually in Tableau and tell a story that makes the findings clear and captivating using the Art + Data methodology.

Speakers:
Tessa Jones, Lead Data Scientist -- Decisive Data
Scott Trauthen, Director of Marketing -- Alteryx

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	Getting Data Down to a Science – Code-free and Code-friendly Machine Learning
Date:	Wednesday, September 5^th, 2018
Time:	9 AM - 10 AM PDT

Join here

Saturday, 18 August 2018

WEBINAR: Production ML for Data Scientists: What You Can Do and How to Make it Easy - 22 August 2018

Production ML for Data Scientists:
What You Can Do and How to Make it Easy
August 22, 2018 | 10am PT/1pm ET

For many data scientists in the enterprise, the deployment of machine learning into production environments has become a second job - and one that most do not want. Current IT and operations teams and tools can't account for the complexities of deploying, managing and scaling ML applications, leaving data science and data engineering teams on the hook for the success - or failure - of ML and AI initiatives.

In this webinar, data scientists will be introduced to MLOps - an approach for machine learning operationalization that:

Breaks down the silos between data science and IT
Streamlines deployment and orchestration
Adds advanced functionality like ML Health, governance and business metrics

Get Your ML Experiments to Production

On August 22 at 10am PT/1pm ET, join Nisha Talagala, CTO, and Craig Michaud, Sales Engineer, from ParallelM - the MLOps Company - for a look at how much easier machine learning can be with the right technology and processes in place. You'll see how to upload code from your existing data science platforms, run it in a sandbox against production data, conduct AB tests and perform timeline captures.

Friday, 17 August 2018

WEBINAR: Harnessing the Power of AI with Azure Databricks - 21 August 2018

Harnessing the power of AI on streaming data generated by thousands of IoT devices is no easy task. Lennox International came to this realization as they looked to build a smarter HVAC system by analyzing large data sets, combined with external data sources such as weather data, and predicting equipment failure with high levels of accuracy along with their influencing patterns and parameters.

Join this latest Data Science Central webinar to learn how Lennox leveraged Azure Databricks and PySpark to solve their biggest data challenges and improve data science and engineering productivity, resulting in complex machine learning models that run in 40 minutes with minimal tuning and predict failures with accuracy of about 90%.

This webinar will cover:

The data orchestration challenges Lennox faced which impacted model accuracy levels and data processing times
How they use Azure Databricks to build the data engineering pipelines, appropriate machine learning models and extract predictions using PySpark
How they also implemented stacking, ensemble methods using H2O driverless AI and Sparkling Water on Azure Databricks clusters, which can scale up to 1000 cores

Speaker: Prasad Chandravihar, Lead Data Scientist -- Lennox International

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title:	Harnessing the Power of AI with Azure Databricks
Date:	Tuesday, August 21^st, 2018
Time:	09:00 AM - 10:00 AM PDT

Wednesday, 15 August 2018

Tying an agile data management strategy to business goals by Rudraksh Bhawalkar via @infomgmt

As organisations evolve from regular business processes to digital businesses, those that do not have a sharp focus on data will fail to keep up with their more advanced peers.

I think Rudraksh is right and that going digital does not change the needs for accurate and correct data it emphasises it. Make sure you add time to sort out the data properly as part of your data initiatives.

Tuesday, 14 August 2018

Only one in three AI projects reported to succeed by Elliot M. Kass via @infomgmt

IT execs point to inconsistent data, incompatible technologies and organisational silos as major impediments

From my own perspective I have observed the following common issues (even if they shouldn't be common).

Inconsistent formats used for the same data element in different systems.
Inconsistent definitions of the same data in different systems.
Inconsistent values of the same data in different systems.

This is why many organisations have data warehouses and why people like me map between the different systems and the data warehouse so that some ETL code can be used in order to bring it into a common data format, data type and data values so it can be joined easily and you can compare like with like.

Monday, 13 August 2018

Data veracity challenge puts spotlight on trust by Pat Sullivan via @infomgmt

The data veracity challenge is one that most businesses have yet to come to grips with, but if we’re to fully harness data for the full benefit to businesses and society, then this challenge needs to be addressed head on.

I think automation of reports are great for businesses yes, but as this article from Pat says/suggests, you absolutely have to be confidence in your data, that you can rely on the quality of that data, that you know the journey of that data from the original source into wherever you use it from in your reporting, that you understand the meaning of the data (data management), that you can join it with other data and produce something useful and that any data analysis/visualisations/algorithms are correctly defined and are not biased if your business is going to be run using it and investment that is based on it is not wasted.

Wednesday, 8 August 2018

How decision trees work by/via @_brohrer_

This is a fantastic overview of how decision trees work by Brandon Rohrer. Includes lots of diagrams, easy to follow descriptions and a short video if you'd rather watch.

I love that it tells you what to look out for so that you hopefully won't fall into some of the common pitfalls. I really suggest you look at his other blog entries which are incredibly useful and worth bookmarking.

Tuesday, 7 August 2018

10 tips for making high availability more affordable in the public cloud by Dave Bermingham and Joey D'Antoni via @infomgmt

10 ways organisations can utilise public cloud services more cost-effectively while also maintaining appropriate service levels for all applications.

This is a great article. I would add that you should tune everything very carefully and try to make the best use of the resources you are paying for. If you understand the way the cost structure works and the way your code works being clever and careful with tuning code can potentially save (or waste) a lot of money.