Tuesday, 26 June 2018

The 5 Clustering Algorithms Data Scientists Need to Know by George Self via @kdnuggets

In this article we’re going to look at 5 popular clustering algorithms that data scientists need to know and their pros and cons.

This is a great article that needs a bookmark so you can refer to it.

Monday, 25 June 2018

Facebook's fight with fake news gets helping hand from robots by Natalia Drozdiak @business via @infomgmt

The firm is turning to machine-learning technologies to amplify the impact of human fact-checkers reviewing hoax news articles.

This is great news - I just hope it works properly.

Sunday, 24 June 2018

SLIDESHOW: 7 top challenges to working with data by David Weldon via @infomgmt

Data pros are dealing with a skyrocketing amount of data, created and gathered by ever-more devices. Here are the top challenges this is creating, according to a new study by Nexla.

From my own perspective these are a good list of pain points to the use of data. I would add to this  list:

1.. Data Sources - do you know the best place to get your data from - there could be better alternatives do get the data from.

2. System of Record - related to 1. make sure you understand where your data really comes from and if the data is clean and pure of has been altered in some way.

3. Change control - I've been using a systems data to feed in some of the data I was using, but they have missed it in their change control and I've suddenly had different or no data arrive.

4.  Data Management - are fields with the same name really the same?

Friday, 22 June 2018

How to know when data is 'right' for its purpose by Annette Wright via @infomgmt

The key to evaluating the accuracy of data is more about understanding the eventual use of it than any arbitrary or independent measure.

I agree with Annette although I would bring your attention to some ways to try and make sure that data is correct. 

1..For codes always provide values to select from - yes you cannot guarantee the value chosen is the right one but it is a major step forward just to ensure that there are a finite list of values for that field.

2. For some fields use publicly available data to try and limit data entry to valid values - examples could be master postal code lists, master lists of registered companies, master lists of ISO values for items like a country number, language code, etc.  Yes you cannot guarantee that the correct value is selected but you can at least make sure that the value selected is from a finite master list AND is a valid value.

3.  Make sure that all customer facing systems give the customer a mandatory chance to check and correct their data.

Update your processes to ensure that system design takes all of these things into account - time for a culture change to make sure data quality is a top priority in your organisation.

Thursday, 21 June 2018

WEBINAR: Embrace the Modern Analytics Lifecycle - 26th June 2018

Event Banner
Image result for alteryx logo
Overview
Title: Embrace the Modern Analytics Lifecycle
Date: Tuesday, June 26, 2018
Time: 09:00 AM Pacific Daylight Time
Duration: 1 hour

How can organizations quickly discover insights in their data and develop deployable data science models? First step: understand how various components of their analytics ecosystem work together to achieve unprecedented value.

Join Radiant Advisors for a discussion surrounding new research that explores solutions and reference architectures for data science platforms on Azure – all in the context of a modern analytics lifecycle.

Ready to dive in and learn how to craft an environment that optimizes capability, efficiency, and stability? Register for this latest Data Science Central Webinar and you will learn:

  • The stages of the Modern Analytic Lifecycle that support enterprise analytics
  • The conceptual model for a Modern Data Platform to sustain analytic capabilities
  • The logical and physical architectures, including frameworks and solutions architecture components, that specify how technology components fit together in the data platform
  • A demo of Alteryx-based ecosystem architecture patterns that enable enterprise architects to deploy Alteryx on Azure, among other ways to support the modern analytics lifecycle
Register and learn how a leading analytics ecosystem and strategic partnership with IT can help permeate the value of self-service analytics throughout an entire organisation.

Speakers:
Hasan Hboubati, Solutions Engineer -- Alteryx
John O'Brien, Principal Advisor and CEO -- Radiant Advisors
Raman Kaler, Sr. Manager, Alliance Marketing -- Alteryx

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central
 Register here

How change data capture technology drives modern data architectures by Kevin Petrie via @infomgmt

When designed and implemented effectively, CDC can meet today’s scalability, efficiency, real-time and zero-impact requirements. Without it, organisations usually fail to meet modern analytics requirements.

I like the use of case studies and it's very clear. I have to say like the article says where I have worked it's been a mish mash of different methods across the organisation.

Wednesday, 20 June 2018

Facebook said to have shared user data with select companies by David Weldon via @infomgmt

Some of these agreements were reportedly known as 'whitelists,' and enabled those firms to access information about a Facebook user’s friends.

It seems to me that this entire situation is just getting worse and worse.

Tuesday, 19 June 2018

How cloud computing changes data governance strategies by Mohit Sahgal via @infomgmt

The cloud often complicates data management by creating distributed, non-integrated data environments, which require more governance – not less.

A great list of changes that will be slightly different in this article that I think you need to read and think about (preferable BEFORE you implement cloud computing.

Monday, 18 June 2018

For GDPR late-comers, data mapping, security are key first steps by Steve Weil via @infomgmt

Despite having two years to prepare, and the deadline to do so now past, many organisations are still struggling with how to comply with the data management mandate.

I would add that it is a great time to update or start some form of data management. It is also vital that you remember interfaces and reports - especially their output - in order to make sure that data is handled and deleted effectively across your whole business.

Thursday, 14 June 2018

A Beginner’s Guide to the Data Science Pipeline by Randy Lao via @kdnuggets

On one end was a pipe with an entrance and at the other end an exit. The pipe was also labelled with five distinct letters: "O.S.E.M.N."

This is very clear and very useful for anyone who is just beginning the journey with data science and want to understand the steps involved and the order they should be done. A useful list to print out and keep.

Wednesday, 13 June 2018

Top 20 R Libraries for Data Science in 2018 by/via @activewizards

An infographic of Top 20 R packages for data science, which covers the libraries main features and GitHub activities, as all of the libraries are open-source.

This is a great infographic and so useful I think if you are likely to do anything in R you should bookmark it as well as print out a copy for you to write a few notes on.

Tuesday, 12 June 2018

Embracing agile software methodologies to improve workflows by Lisa Froelings via @infomgmt

The most efficient and effective method for creating software is keeping a realistic timetable. Having bouts of productivity is counter to the process. A steady, consistent pace is of the utmost importance.

From a personal perspective I have worked for an organisation that set impossible timescales and did not allow the right amount of time to develop software properly. Just like you need to embrace the methodology you also need to plan correctly and not give in to pressures on dates - if the date has to come in then scope should be removed.

Monday, 11 June 2018

Three models for determining the true value of data by Armen R. Kherlopian via @infomgmt

With proper targeted analysis, data scientists can uncover meaningful insights to guide decisions on products, customer experience, risk mitigation, processes and profitability.

This highlights the ways to go with your data now.  I would suggest that you can also add your own data to public datasets as well as paid for data in order to get more value from it. However I will sound the caution here as I always tend to - you need to have good quality data is you want to sell it or derive any decisions or insights from it. So you need careful validation and stewardship of the data.

Saturday, 9 June 2018

Is Open Data the Silver Bullet for Better Drugs? by @gunjan_gb via @Datafloq

Open data sharing within the pharmaceutical industry is crucial for opening the floodgates of innovation.

A great thing to do and could give some really great results. Maybe this could be a working concept for some other industries too?

Friday, 8 June 2018

WEBINAR: Harness the Power of Big Data Analytics - 14 June 2018

Event Banner
Harness the Power of Big Data Analytics
Join us for the latest DSC Webinar on June 14th, 2018
From BI to AI, the need for Big Data and analytics is pervasive and transformational. However, Big Data technologies such as Hadoop or Spark are still quite complicated and not leveraged to their full capacity by business practitioners. New technologies are available to leverage the power of big data platforms for self-service data preparation and automated machine learning to help organizations get the most out of their analytics initiatives and unlock the full potential of their Big Data investments.

In this latest Data Science Central webinar you will learn the essentials you need for a modern data and analytics strategy, ways to expand your strategy development repertoire, and emerging approaches, as well as:

  • Why big data solutions like Hadoop and Spark are ideal for machine learning and advanced analytics initiatives
  • What Automated Machine Learning for Big Data is and how it can change your approach to ML
  • How Self-Service Data Preparation reduces the work required to deliver clean data at scale for predictive modeling
  • How to leverage Big Data platforms to rapidly deliver more accurate predictions for ML initiatives
Speakers:
Raju Penmatcha, PhD, Customer Facing Data Scientis -- DataRobot
Connor Carreras, Manager for Customer Success, Americas -- Trifacta

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Title: Harness the Power of Big Data Analytics
Date: Thursday, June 14th, 2018
Time: 9:00 AM - 10:00 AM PDT
Register here

Best practices for building an enterprise artificial intelligence platform by Rashed Haq and Brian Martin via @infomgmt

When designed well, this AI system facilitates faster, more efficient and more effective collaboration among AI scientists and engineers.

I like this article - especially the logical layers which make complete sense and could be the one thing you need to transform your AI implementations.

Thursday, 7 June 2018

Digital Transformation 3.0: Enter The IoT Blind Spot by Nadir Izrael via @forbes

Connected devices are everywhere -- in businesses, hospitals, manufacturing plants, power stations, aeroplanes and government buildings. The internet of things (IoT) is taking us through the biggest digital transformation the world has ever seen

The figure of 40% in this article for devices that can't be seen definitely shows that you cannot use IoT effectively in any organisation unless you do all of the infrastructure and basics so that it really is usable. I have visions of all these connected devices that can't be seen - what a catastrophic failure and huge waste of money.

Wednesday, 6 June 2018

How Companies Can Use the Data They Collect to Further the Public Good by Edward L. Glaeser,Hyunjin Kim and Michael Luca via @HarvardBiz

The potential value of the large data sets being amassed by private companies raises new opportunities and challenges for managers making strategic data decisions.

I like the steps in the article but just like using any other data be careful of correlation and causation as it is very easy to mistakenly interpret one for the other.

Tuesday, 5 June 2018

Why is machine learning 'hard'? by/via @zaydenam

Machine learning engineers command higher salaries than software engineers, probably because at its core, machine learning is a fundamentally harder debugging problem than standard software. It's not the math; it's that machine learning adds two additional (potentially bug-infested) dimensions: the model and the data.

Great article by Zayd which I think needs to be shared.  I think the key to fixing issues and getting ML working is you need:
1. Focused Concentration.
2. Knowledge of ML, DL,
3. A good level of understanding of the input data and it's model
4. Experience (sometimes you have come across something like this before so just know it is likely to be the same thing).

Don't be discouraged - the more you do with ML the better you will be at debugging and fixing errors with any code you have developed.

Monday, 4 June 2018

Data mapping: A key challenge in achieving GDPR compliance by Laszlo Dellei via @infomgmt

With the 25 May compliance deadline now upon us, there is simply not enough time for manual mapping. There are, however, some alternatives organisations can turn to.

I used to do this as my day job and it's really interesting understanding the journey of data through and organisation, how it is used, what it is called everywhere (so many synonyms) and how it is changed as it is on that journey (text, number, decimal, number, etc). Then you need to work out where that documentation is going to be entered and who is going to have access to it.

Sunday, 3 June 2018

4 best practices for tapping the potential of prescriptive analytics by Peter Bull via @infomgmt

This technology considers business objectives, constraints and inputs to recommend the best action forward, showing the impact of each decision on relevant KPIs

I agree with Peter - you don't need a maths degree to use predictive analytics. I really think you need to have clear requirements, exactly what you are trying to achieve and a clear way of judging the results. Just give it a try.

Saturday, 2 June 2018

Watching the continued convergence of analytics products and services by Boris Evelson via @infomgmt

Just like product vendors, consultants find it harder and harder to compete just on people and prices, so now they compete on products, too.

I have to agree - it's harder and harder to find a difference between the big consultancies. However they do need to find a way to differentiate so that people notice them and choose them - this could be where a small company could nip in and get the business from them.

Friday, 1 June 2018

SLIDESHOW: 5 top trends driving analytics and BI strategy by David Weldon via @infomgmt

Gartner analysts share their predictions on the most important strategies influencing data science and business intelligence for the next five years.

Some very good points and I have to say I agree with all of them. I can definitely see all of them happening and that it will be fairly quick and painless.  The non-use of a Data Scientist is already a possibility with tools like Tableau making greater waves in the market.