Tuesday, 28 February 2017

A Dead Simple Tool To Find Out What Facebook Knows About You by Katharine Schwab via @FastCoDesign

Facebook builds complex profiles of each of its users so it can offer data points to advertisers for targeting ads. Some data points are obvious, like your age and interests, but many may surprise you. This free tool reveals the unsettling amount of information Facebook tries to deduce about you.

Wow - just wow.

Monday, 27 February 2017

Machine-learning model predicts remission, relapse in cancer patients by Greg Slabodkin via @infomgmt

Researchers were able to teach a standard 64-bit computer workstation running Windows to predict remission with 100 percent accuracy, while relapse was correctly predicted in 90 percent of relevant cases.

This is great news and points to the future.

Sunday, 26 February 2017

Automatically Segmenting Data With Clustering by @bilalmahmood via @kdnuggets

In this post, the author walks through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.

Useful and worth reading even if you already know this just to make sure you are clear on it.

Saturday, 25 February 2017

Making Python Speak SQL with pandasql by/via via @YhatHQ

Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.

This is a great post and includes lots of code and examples - one for you to bookmark and sign up for his updates while you are there!!

Friday, 24 February 2017

WEBINAR: How to Identify and Track Customers Across Platforms with a Universal ID - 28 February 2017

Webinar Event Details

Date: Tuesday, February 28, 2017
Time: Noon ET/ 9:00 am PT
Duration: 60 minutes (including Q&A)

What You'll Learn

In an economy where customer attention is fleeting, expectations are constantly changing, and competitors are just a click or a tap away, understanding customer behaviour is critical. Before you can dive into any kind of advanced customer behaviour analysis, however, you must ensure that you are counting each customer once and only once ― a difficult task, especially with customer touchpoints ranging across a variety of devices and website domains. So how do you accurately identify your customers?

In this webinar, Erin Franz, data analyst from Looker, and Julie-Jennifer Nguyen, Product Marketing Manager at Segment, show you how to determine if you are correctly counting your customers and how to create a universal user ID for customers across touchpoints.

You will learn:

• Which signals indicate that you are miscalculating your customers
• Why creating a universal ID is the backbone of customer analysis
• Best practices for identifying customers, including how to tie identities across anonymous and logged-in sessions, account for changed email addresses, and plan for cross-platform interactions
• How to derive a user table of universal IDs with SQL and LookML
results.

Presenters

Erin Franz is a data analyst technology lead at Looker. She focuses on partner technical integration and enablement. Prior to Looker, Erin worked in analytics at Accenture where she helped build out big data solutions for enterprise customers.

Julie-Jennifer Nguyen is a Product Marketing Manager at Segment where she helps advocate for the customer, support the product development lifecycle, and enable the sales team. Before Segment, she was in charge of CX Strategy and Analytics at Warby Parker.

Using Machine Learning to predict parking difficulty via @googleresearch

Google released a feature for Google Maps for Android in 25 US cities that predicts parking difficulty close to your destination so you can plan accordingly. To build it, Google had to overcome obstacles like the lack of real-time information about parking spots, high variability (day, time, work day, events, etc.), difficult to graph parking structures, and illegal parking. Google used a combination of crowdsourcing and machine learning to address those issues. Here's how they did it.

Posted by James Cook, Yechen Li, Software Engineers and Ravi Kumar, Research Scientist

Interesting feature that could be very useful in London or any other busy city. Lets hope they are able to roll it out.

Thursday, 23 February 2017

How Big Data and AI Help Us Tackle The World’s Biggest Problems in 2017 and Beyond by @BernardMarr via @Data_Informed

Can computers solve all our problems? Well, when combined with the creative power of humans, the answer is… maybe.

I find all of these examples to be very exciting and can't wait to see how they develop long term (as initial progress is not always an indicator of long term success).

Wednesday, 22 February 2017

Data Exchange, Analytics Remain Out of Reach for Many Providers via ‎@HITAnalytics

Providers are still running into data exchange and interoperability roadblocks due to EHR shortcomings, leaving them unprepared to tackle value-based care.

Better design is needed to ensure that these kinds of issues don't happen in the future.

Tuesday, 21 February 2017

App Discovery with Google Play Parts 1,2 and 3 via @googleresearch

This is a multi part blog on the Google Research Blog:

Part 1: Understanding Topics by Malay Haldar, Matt MacMahon, Neha Jha and Raj Arasu, Software Engineers

Part 2: Personalised Recommendations with Related Apps by Ananth Balashankar & Levent Koc, Software Engineers, and Norberto Guimaraes, Product Manager

Part 3: Machine Learning to Fight Spam and Abuse at Scale by Hsu-Chieh Lee, Xing Chen, Software Engineers, and Qian An, Analyst

These are great posts and this blog is well worth following.

Monday, 20 February 2017

How predictive analytics can tackle the opioid crisis by Sharif Hussein via @medcitynews

Every day in the U.S., there are about 650,000 opioid prescriptions dispensed, 3,900 people who begin abusing opioids, and 78 deaths from opioid-related overdoses. There are states where the number of yearly opioid prescriptions outnumbers the population.

Interesting use of predictive analytics and a very worthwhile cause.

Sunday, 19 February 2017

Serial Killers Should Fear This Algorithm by Robert Kolker via @BW

Thomas Hargrove is building software to identify trends in unsolved murders using data nobody’s bothered with before.

Link to the Murder Accountability Project

This sounds great and I find it hard to understand why his information was ignored. I found it also very interesting that their solve rate was down when the staffing levels were reduced in come states.

Saturday, 18 February 2017

Top 9 ethical issues in artificial intelligence by Julia Bossmann via @wef

Tech giants such as Alphabet, Amazon, Facebook, IBM and Microsoft – as well as individuals like Stephen Hawking and Elon Musk – believe that now is the right time to talk about the nearly boundless landscape of artificial intelligence. In many ways, this is just as much a new frontier for ethics and risk assessment as it is for emerging technology. So which issues and conversations keep AI experts up at night?

I found this to be a fully thought out article that concentrated on the issues that need to be thought out and addressed with AI. When you read this you can see many of these issues either starting to be experienced now or see that they will be with us in the future.

Friday, 17 February 2017

6 areas of AI and Machine Learning to watch closely by @NathanBenaich via @kdnuggets

Artificial Intelligence is a generic term and many fields of science overlaps when comes to make an AI application. Here is an explanation of AI and its 6 major areas to be focused, going forward.

This is great and explains it in a new (to me) but very good way - well worth reading.

Thursday, 16 February 2017

WEBINAR: Defeating data chaos - 21 February 2017

February 21, 2017 | 2 PM ET/11 AM PT
Hosted by Information Management

Your business has more access to more data than ever before. But you can’t use that data to its full advantage if you don’t know what it is, where it is, and if it can be trusted.

Fortunately, effective data cataloguing and governance can ensure that all of your organisation's data stakeholders share a “single version of the truth” when it comes to data relevance, quality, and lineage. The result: You can surf the data tsunami with ease — instead of getting drowned by it.

How leading companies use a data catalogue to bring coherence to data chaos
What really promotes more impactful use of data by business stakeholders
3 steps you can take to get your company’s data governance on the right track

Victory over data chaos is within your reach. Invest a little time with us to discover how!

Featured Presenters:


Moderator: Lenny Liebmann Founding Partner Morgan Armstrong	Speaker: Michael Becker Co-Founder & Managing Partner mCordis

Sponsored By:
Sponsor

5 Career Paths in Big Data and Data Science, Explained by Matthew Mayo via @kdnuggets

Sexiest job... massive shortage... blah blah blah. Are you looking to get a real handle on the career paths available in "Data Science" and "Big Data?" Read this article for insight on where to look to sharpen the required entry-level skills.

I LOVE this article, it also have some great articles off the links in it. I recommend reading this so that you can work out which role you are more like and what you need to do in order to improve your skills.

Wednesday, 15 February 2017

China biotech turns to big data as next weapon in war on cancer by Natasha Khan via @infomgmt

China has made the precision medicine field a focus of its 13th five-year plan, and its companies have been embarking on ambitious efforts to collect a vast trove of genetic and health data.

This is really exciting and I can envisage the kinds of benefits that could be gained from understanding more about our DNA and markers for certain illnesses. However I would like to add a word of caution - I personally have an autoimmune illness and I know that whilst my DNA would show that I have a propensity to have an autoimmune disease, there is no guarantee that I will a) get one and b) which one it will be. Our understanding of DNA needs to improve a huge amount to have precision in some areas.

Tuesday, 14 February 2017

Predictive Analytics 101 by @data36_com

If you have basic R or Python skills, you can build a simple predictive model. These two posts show you how:

Part one

Part two

I recommend you sign up for his newsletter here

Monday, 13 February 2017

SLIDESHOW: Top technology trends we’ll see in 2017 by David Weldon via @infomgmt

Part one - From artificial intelligence, to predictive analytics, to ransomware security, to the changing role of the chief data officer, tech leaders sound off on the top technology trends that we will see in 2017.

Part two - Data management, machine learning, and cloud cost models will be top-of-mind for organisations this year. Tech leaders sound off on the other top technology trends that we will see in 2017.

Part three - Intelligent data management, the AI-driven Internet, and a billion dollar security breach are just some of the trends we can expect this year. Tech leaders share their predictions here on those topics, and more.

Interesting thoughts and ideas - some I like and some are just weird and scary.

Sunday, 12 February 2017

Vodafone Turns to Big Data Process Analytics to Make Procurement More Efficient by Bob Violino via @infomgmt

Telecommunications giant deploys system to maximise catalogue buying and speed the release of purchase orders to suppliers.

It's good to see a practical use from a big company.

Saturday, 11 February 2017

What effective information stewardship would have done by Andrew White via @infomgmt

Imagine that – information stewardship solutions (at some point in the future) might negate the need to collect and centralise data in order to govern it.

I think this is something great that is certain to be there in the future. If it doesn't start being dome automatically time will be spent on wasted doing this and there is still no guarantee on the quality of the data and the result.

Friday, 10 February 2017

WEBINAR: Webinar: Improve Your Regression with CART and Gradient Boosting - 16 February 2017

Improve Your Regression with CART and Gradient Boosting

Join us for our upcoming webinar:

Date: Thursday, February 16, 2017

Time: 1 pm EST, 10 am PST

Can't make it at this time? Register to receive a recorded copy of the webcast and presentation slides, which we will email out a few days after the live event.

Duration: 55 minutes

Speaker: Charles Harrison, Marketing Statistician, Salford Systems

Cost: Free

Abstract: In this webinar we'll introduce you to a powerful tree-based machine learning algorithm called gradient boosting. Gradient boosting often outperforms linear regression, Random Forests, and CART. Boosted trees automatically handle variable selection, variable interactions, nonlinear relationships, outliers, and missing values.

We'll see that CART decision trees are the foundation of gradient boosting and discuss some of the advantages of boosting versus a Random Forest. We will explore the gradient boosting algorithm and discuss the most important modeling parameters like the learning rate, number of terminal nodes, number of trees, loss functions, and more. We will demonstrate using an implementation of gradient boosting (TreeNet® Software) to fit the model and compare the performance to a linear regression model, a CART tree, and a Random Forest.

The Rise of Cognitive Work Redesign by Thomas H. Davenport via @Data_Informed

Cognitive technologies are capable of transforming contemporary business processes, but they won’t do so without a concerted effort to redesign work around their capabilities.

Interesting and he points out a few important things that need to be included as part of that.

Thursday, 9 February 2017

The “Right To Be Forgotten” to Be Realized In the Coming GDPR! by Christer Jansson via @infomgmt

The GDPR is designed to harmonise data privacy laws across Europe, to ensure privacy protection to all EU citizens and to reshape the way organisations will approach fraud, cyber security, and data privacy.

This looks really interesting and needed.

Wednesday, 8 February 2017

SLIDESHOW: Top 10 Big Data Trends We’ll See in 2017 by Joe Caserta via @infomgmt

Last year was the year of ‘big data.’ This will be the year of ‘data intelligence,’ as organizations look for actionable insights from all that data. Here are 10 trends to expect.

Interesting list.

Tuesday, 7 February 2017

WEBNAR: How to Keep Your R Code Simple While Tackling Big Datasets - 14th February 2017

Overview

Title: How to Keep Your R Code Simple While Tackling Big Datasets

Date: Tuesday, February 14, 2017

Time: 09:00 AM Pacific Standard Time

Duration: 1 hour

Summary

How to Keep Your R Code Simple While Tackling Big Datasets

R, TERR, Spark and Python are tools that benefit from larger systems. Software-Defined Servers enable data scientists to size their processing system to the size of a particular data problem. In this Data Science Central webinar you will learn how Software-Defined Servers work in practice for several common data science tools and will explore how removing core and memory constraints has multiple, profound and positive implications for application developers tackling big data problems of all kinds.

Speaker: Michael Berman, Vice President of Engineering -- TidalScale

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Monday, 6 February 2017

The state of Jupyter by Fernando Pérez and Brian Granger via @OReillyMedia

This describes how Project Jupyter got here and where we are headed.

Interesting if you are not up to date on what it actually is and how it exists.

Sunday, 5 February 2017

The Rise of the Data Engineer by @mistercrunch via @freeCodeCamp

This great article shows how the transition from business intelligence engineer to data engineer. This is a great article and makes so much sense. Definitely a must read article.

On a personal note I feel I am probably closer to a data engineer then to a data scientist.

Saturday, 4 February 2017

SLIDESHOW: Gartner’s Top 10 MDM Solutions for 2017 by David Weldon via @infomgmt

Gartner Group has released its first Magic Quadrant for Master Data Management solutions. The report looks at 10 leading vendors in this space, reviewing their products and market strategies.

I have to admit to liking Informatica.

Friday, 3 February 2017

Big Results From Big Data Still Elude Most Organisations by David Weldon via @infomgmt

According to the survey, respondents generally believe that big data will provide their company with a competitive advantage. But they cite several challenges to achieve that goal.

This is said by organisations time and time again - it seems to me that Big Data needs to not be viewed as the solution to all problems but implemented slowly using specific questions to drive the implementation. That way you get as good a result as you can for as little spend as possible and as quickly as possible. I thought we had learnt this lesson with Data Warehousing.

Thursday, 2 February 2017

DOC: Google's Rules of Machine Learning by Martin Zinkevich

43 rules that Google engineers have learned while implementing some of the most sophisticated and widely used machine learning models in the world, as written by a Google engineer.

This is absolutely great and everyone should try to follow these.

Wednesday, 1 February 2017

Neural Networks and Modern BI Platforms Will Evolve Data and Analytics by Kasey Panetta via @Gartner_Inc

Machine conquered man when Google’s AlphaGO defeated the top professional Go player, but the evolution of deep learning didn’t end with the game. Baidu improved speech recognition from 89% to 99% and deep-learning jobs grew from practically zero jobs in 2014 to around 41,000 jobs today.

I found this really interesting and it's worth reading.