Data: January 2016

Sunday, 31 January 2016

VIDEO: Data modelling constructs and terminology via @radar

Data modelling constructs and terminology by David Blaha via @radar - Identification of data sources is the first step in warehouse development. In this free video training segment, Michael Blaha provides a framework by reviewing data modelling constructs and terminology, including dependent and independent entity types. Using IE (information engineering) notation and the ERwin tool, Michael walks you through a sample operational data model.

Quite basic but useful if you need an introduction

Lessons on Hadoop application architectures via @radar

Lessons on Hadoop application architectures by Shannon Cut on @radar - Here's a useful look at how to choose the right tools for your data stack.

I have to agree = using something packages that you are familiar with helps with speed, cost and makes it easier to maintain going forward when new releases are issued.

Saturday, 30 January 2016

Beginners Guide to learn about Content Based Recommender Engines via @AnalyticsVidhya

Beginners Guide to learn about Content Based Recommender Engines by Shuvayan Das via +Analytics Vidhya - Do you know the algorithms behind ‘recommendations’ we see basis of one's past / current preference to improve the user experience, are Recommender Systems? Lets explore more on one of its wings - Content Based Recommender Engines, through this guide.

Great guide with case studies.

Analytics Investments Often Depend on How the CIO Views Their Legacy via @infomgmt

Analytics Investments Often Depend on How the CIO Views Their Legacy by David Weldon via +Information Management - When it comes to investments in data analytics and business intelligence, just how committed is the typical CIO? It depends a lot on how the CIO views their role and on their individual management style, according to a recent study.

Interesting and worth a read for the insight you get into this area.

Friday, 29 January 2016

Learn Gradient Boosting Algorithm for better predictions (with codes in R) via @AnalyticsVidhya

Learn Gradient Boosting Algorithm for better predictions (with codes in R) by Tavish Srivastava via +Analytics Vidhya - If it is tough to understand the concepts and complexities of the Gradient Boosting Algorithm you need to read this great article. Includes R code.

Differentiate Between Hadoop And Data Warehousing via @AegisSoftTech

Differentiate Between Hadoop And Data Warehousing by Ethan Millar on +Aegis Soft Tech - The hadoop environment has a same aim – to gather maximum interesting data from different systems, in better way. Using such radical approach, programmers can dump all data of interest into a big data store.

Great article which helps to resolve some confusion that exists about the two approaches.

Thursday, 28 January 2016

WEBINAR: Monetize My Data - 2 February 2016

Monetize My Data

Blueprint for Big Data Success

According to Gartner 30% of businesses will be monetizing data assets by 2016. Learn how you can capitalize and create new and thriving revenue streams by leveraging the variety and volume of your big data.

Join our webinar on Tuesday, February 2nd at 8:30 am PST/11:30 am EST to hear about how to develop a competitive advantage and drive growth by fully leveraging your data assets with current customers and in new markets. We'll go over:

Market trends in data monetization and potential opportunities for your organization
How big data enhanced the data-driven applications that organizations can provide
The role of high-performance embedded BI capabilities for a seamless customer experience

Also see how Ruckus Wireless embedded Pentaho as part of their SmartCell insight, a branded software application supporting massively scalable wi-fi analytics for mobile network operators and large enterprise networks.

Register here

WEBINAR: Software Release Orchestration and the Enterprise - 2 February 2016

Software Release Orchestration and the Enterprise:How ING Streamlined and Increased Software Deployments to Twice a Day

DATE: Tuesday, February 2, 2016

TIME: 11:00AM ET

Enterprises are realizing that doing DevOps right requires a streamlined Continuous Delivery pipeline that spans many groups beyond Dev and Ops. Finding a way to automate and control modern DevOps processes while maintaining visibility is a huge a challenge.

Join Andreas (A.J.) Prins, IT Manager at ING and Andrew Phillips, VP of DevOps Strategy at XebiaLabs, as they discuss the challenges enterprises are facing and offer actionable advice on how to:

More easily manage complex, distributed releases across technical and non-technical teams
Gain better control and oversight of your DevOps automation and overall software delivery process
Provide visibility into your Continuous Delivery process for everyone involved in your DevOps initiative
Release more quickly, identify bottlenecks, reduce errors and lower the risk of release failures

Reserve your seat today!

Resolving 3 Crucial Bottlenecks of Data Processing in BI Software via @infomgmt

Resolving 3 Crucial Bottlenecks of Data Processing in BI Software by Eldad Farkash via +Information Management - While modern tools provide great performance for smaller and simpler datasets, they tend to buckle down under the pressure of dealing with big data, disparate data sources, or many concurrent users.

A great feature article with some real solutions to common physical problems to using BI.

Auto Industry, U.S. Reach Agreement on Cybersecurity and Wired Vehicles

Auto Industry, U.S. Reach Agreement on Cybersecurity and Wired Vehicles by Jeff Plungis via +Information Management - The U.S. Transportation Department and 17 automakers have reached agreement on efforts to enhance safety, including sharing information to thwart cyber-attacks on their increasingly wired vehicles.

Great move and should enable this side of technology to expand safely moving forwards.

Wednesday, 27 January 2016

IoT Devices Are Exploding On the Market via @infomgmt

IoT Devices Are Exploding On the Market by Nigel Fenwick via +Information Management - Sensors can and will improve our lives – giving us more data and insight about our environment and allowing us to tailor experiences to be more finely tuned to our personal desires.

An interesting blog which has nothing to do with physically exploding IoT devices (which was how I read this headline).

Empirically-Based Approach to Understanding the Structure of Data Science via @bobehayes

Empirically-Based Approach to Understanding the Structure of Data Science by Bob Hayes via Business Over Broadway - In this post he has taken an empirically-based approach to understand the structure of data science skills. Based on a factor analysis of skill ratings, data science skills really do fall into three broad skill areas: subject matter expertise, technology/programming and math/statistics

Tuesday, 26 January 2016

Cloudera vs. Hortonworks vs. MapR - Hadoop Distribution Comparison via @dezyreonline

Cloudera vs. Hortonworks vs. MapR - Hadoop Distribution Comparison via @dezyreonline - A thorough analysis of how the different distributions of Hadoop, Cloudera, Hortonworks and MapR differ from each other.

A thoroughly done comparison between them all.

10 Things to Consider Before Diving Into the Hadoop Data Lake via @Datafloq

10 Things to Consider Before Diving Into the Hadoop Data Lake by Craig Lukasik via +Datafloq - Are you starting with the development of a data lake? Although there are a lot of benefits of using a Hadoop Big Data Lake at your organization, developing a successful data lake is not that easy. Use this Data Lake Considerations Checklist before you dive into the Hadoop Data Lake and avoid unnecessary mistakes.

A very useful checklist.

Monday, 25 January 2016

Why Big Data Is Still In Its Adolescence! via @Datafloq

Why Big Data Is Still In Its Adolescence! by Lakshmi Randall via +Datafloq - There is a belief that big data has achieved maturity, but there is still a lot of room to grow for organizations that want to start with a big data. This year brings a great deal of hope for organizations and innovation through big data projects as can be seen from these eight signs that indicate a bright and exciting future for Big Data across the globe.

I agree with her - it's not been around long enough yet to be classed as mature.

How Big Data Startups are allowing Companies to Experiment via @Datafloq

How Big Data Startups are allowing Companies to Experiment by Sankalan Prasad via +Datafloq - The eco-system of big data startups allows large companies that want to start with big data to vet their ideas before a larger investment is required. This results in an approach of start small and scale later, which before was a lot more difficult. How can organization actually benefit from the increase in Big Data startups?

Great idea to do it this way and not waste too much budget and resources.

Sunday, 24 January 2016

5 Great Benefits of Big Data in Marketing in 2016

5 Great Benefits of Big Data in Marketing in 2016 by Nate Vickery via +Datafloq - Consumers create massive amounts of data. Now with the introduction of advanced software made exclusively for big data analysis companies have the chance to use all this information for improving their sales. Since big data analysis is already in use in many Fortune 500 companies and adequate software is easy to find online, there’s no reason for marketers not to benefit from big data.

Any company can use the data no matter the size of the company.

5 Cyber Risks Affecting the Internet of Things and How to Manage These Risks via @Datafloq

5 Cyber Risks Affecting the Internet of Things and How to Manage These Risks by Mark van Rijmenam via +Datafloq - The Internet of Things will offer organizations tremendous value and will provide consumers with fantastic benefits. However, the Internet of Things also comes with a wide variety of cyber risks that could harm organizations and consumers who work with the IoT. What are these risks and how can you manage them?

A must read if you are thinking of going into the IoT area.

Saturday, 23 January 2016

Big Data ROI In 2016: Put Up, Or Shut Up via @infomgmt

Big Data ROI In 2016: Put Up, Or Shut Up by David Weldon via +Information Management - Organizations will take a hard look at their data analytics investments in 2016, and expect to see some very real returns on those investments. If strong ROI can't be shown, some data initiatives may see the plug pulled.

I would go so far as to say unless there were some quick wins already identified before the project started then it probably should have be started in the first place. Doing any of this is not cheap so you need to have a level of certainty first.

Why Organizations Struggle to Get Data Cultures Right via @infomgmt

Why Organizations Struggle to Get Data Cultures Right by David Weldon on +Information Management - A growing number of organizations are embracing data analytics and business intelligence, but a large number of those organizations struggle with how to get full value from their data.

I agree with the article - you need people seeing the big data picture and you need to help the parts of the organisation to understand the data and what can be achieved with it.

Friday, 22 January 2016

WEBINAR: Applying Graphic Design Principles to Create Killer Dashboards - 26 January 2016

Overview

Title: Applying Graphic Design Principles to Create Killer Dashboards

Date: Tuesday, January 26, 2016

Time: 09:00 AM Pacific Standard Time

Duration: 1 hour

Summary

Applying Graphic Design Principles to Create Killer Dashboards

Web and graphic design principals are tremendously useful for creating beautiful, effective dashboards. In this latest Data Science Central Webinar event, we will consider how common design mistakes can diminish visual effectiveness. You will learn how placement, weight, font choice, and practical graphic design techniques can maximize the impact of your visualizations.

Speaker: Dave Powell, Solution Architect -- Tableau

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Fears Over Data Security In the Cloud Easing, Says Study vi @infomgmt

Fears Over Data Security In the Cloud Easing, Says Study by David Weldon via +Information Management - Concerns over data security continue to be the number one issue preventing many organizations from taking their data to the cloud, but the level of fear over cloud security is dropping, according to a new study.

It's still valid to have fears and check things out properly, but maybe they really have sorted it all out now.

MapR Partners Launch Free Test Drives in Collaboration with AWS via @infomgmt

MapR Partners Launch Free Test Drives in Collaboration with AWS via +Information Management - Several MapR partners have integrated technologies with the MapR data platform and made them freely available for education, demonstration, and evaluation purposes via Amazon Web Services (AWS) Test Drive for Big Data.

Sounds like a great opportunity to test out a few things.

Thursday, 21 January 2016

Why Your Organization Is Approaching Personalization Wrong via @infomgmt

Why Your Organization Is Approaching Personalization Wrong by Fiona Adler via +Information Management - Despite their reputations, we found that benchmark firms collected only very basic information around their customers. They were weak at applying advanced predictive analytics and overall fell short of providing customers with an experience they desired or expected.

There is no point asking for data if you are not going to use it, and you have to think about the effect asking for that data is going to have on your customers.

Running scalable Data Science on Cloud with R & Python via @AnalyticsVidhya

Running scalable Data Science on Cloud with R & Python by Kunal Jain via +Analytics Vidhya - Would you think why to run Data Science on Cloud? You might be quite surprised to know various significant benefits it offers! Then why wait? Learn how to run scalable Data on cloud, here with R & Python.

So many other things are done via the cloud so why not this?

Wednesday, 20 January 2016

1.5 TB dataset of anonymized user interactions released by Yahoo via @DataScienceCtrl

1.5 TB dataset of anonymized user interactions released by Yahoo by Laetitia Van Cauwenberge via @DataScienceCtrl - The Yahoo News Feed dataset is a collection based on a sample of anonymized user interactions on the news feeds of several Yahoo properties, including the Yahoo homepage, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Movies, and Yahoo Real Estate.

Great resource for research.

Daniel Tunkelang's 3 part series on Data Science as a Profession via @radar

Daniel Tunkelang's 3 part series on Data Science as a Profession via @radar - this is a great series and really needs to be read.

Part 1 - Beyond the Venn diagram
Part 2 - Data scientists: Generalists or specialists?
Part 3 - Where should you put your data scientist?

Tuesday, 19 January 2016

Learn to use Forward Selection Techniques for Ensemble Modeling via @AnalyticsVidhya

Learn to use Forward Selection Techniques for Ensemble Modeling by Tavish Srivastava on +Analytics Vidhya - Do you know the application of Forward Selection Techniques for Ensemble Learning using R Programming?

Great tutorial with R code - something work reading and experimenting with as a means to reduce the chances of over fitting any mode you produce.

Big Data and the Deep, Dark Web via @Data_Informed

Big Data and the Deep, Dark Web by +Bernard Marr via @Data_Informed - Bernard discusses the massive volumes of data that exist below the online surface, in the deep web and dark web.

I can see that this COULD be a source of useful data, but the criminal side of it is so large I'm not sure how you would find anything useful. I'm also not sure I would want law enforcement calling and taking all my computers because I accessed the wrong things.

Monday, 18 January 2016

Kudu and the Ongoing Evolution of Hadoop via @Data_Informed

Kudu and the Ongoing Evolution of Hadoop by Rajan Chandras via @Data_Informed - Rajan discusses the advantages and shortcomings of Kudu, its impact on Hadoop, and what its being offered to the Apache Software Foundation means for Cloudera.

Very interesting and puts some things into context for me.

Put Data Science Skills Before Big Data Infrastructure via @Data_Informed

Put Data Science Skills Before Big Data Infrastructure by David Johnston via @Data_Informed - Companies are placing too much emphasis on technology and should focus more on the data skills of their employees, writes ThoughtWorks Data Scientist Dr. David Johnston.

Sunday, 17 January 2016

The Enterprise of the Future: Competing on People Analytics via @Inc

The Enterprise of the Future: Competing on People Analytics by @johnrampton via +Inc. - This post analyses how people analytics will be the next big enterprise and how companies desperate for top talent need it.

I think via a gut reaction sometimes someone with some knowledge is seen as someone with all knowledge. I would challenge that assumption as you have to know your limitations and act accordingly otherwise disasters can happen.

Talent Shortage Makes Data 'Transparency' a Murky Topic for Many Firms via @infomgmt

Talent Shortage Makes Data 'Transparency' a Murky Topic for Many Firms by David Weldon via +Information Management - The shortage of data scientists will put growing focus on two trends: data analytics automation, and the tapping of more ‘business types’ to manage data initiatives.

I think this has been happening for a while over time. Certainly I've seen key business resources with some IT knowledge being pushed into semi Data Architect roles with the comment "You can do it - it looks really easy". So I'm sure other roles are seeing the same blurring.

Saturday, 16 January 2016

The Emerging Data Design: Bitemporal Data via @infomgmt

The Emerging Data Design: Bitemporal Data by Mike Lapenna via +Information Management - Many of us have been exposed to aspects of the bitemporal design by using time series data, temporal data, or historical data.

This is a part 1. We should all be doing this in our designs, and it makes like easier in the long run to be able to show changes over time.

Will Corporations Drown In the Flood of Data Heading Our Way? via @infomgmt

Will Corporations Drown In the Flood of Data Heading Our Way? By Patrick van der Horst via +Information Management - Many companies are currently struggling with the data flood that is heading right for them. As the volume of unstructured data increases and the amount of storage available to preserve it all will decrease.

I agree with him - we need to concentrate on data that is useful (gives value) and not try to do everything with all possible data. I think this is part of the reason many big data projects fail - they forget to focus on something small that gives value to get the whole thing started (a bit like prototyping)

Friday, 15 January 2016

WEBINAR: Creating a Strong Business Advantage with Multi-Genre Advanced Analytics for IoT - 19 January 2016

Tuesday, January 19, 2016
9:00 AM PST

The Internet of Things (IoT) is an important topic in the world of analytics. There are many critical business problems that need to be addressed in real time in the context of IoT. However, many of IoT solutions are siloing information into particular analytical genres.

What organizations need is a facility to breakdown those walls between analytical workloads in their IoT practices and create an environment where analytics techniques (e.g., Machine Learning, Statistics, Graph, and others) can be intelligently mixed to address critical business problems.

Webinar attendees can expect to learn more about:

A brief survey of critical business problems in the context of IoT
IoT trends today
How our technology can provide value in the collection, management, analysis, and insights delivery on IoT data
The important questions your organization needs to ask when implementing an IoT Analytics project

WEBINAR: Predict. Share. Deploy. With Open Data Science - 20 January 2016

Machine learning and predictive analytics open up many new opportunities to create business value. From predicting new customers to meaningfully optimizing the business, data scientists can unlock incredible business value.

But building, tuning, sharing, deploying, and scaling these models is challenging, and rarely covered in statistics class. How can you make data science work in the real world?

We are here to help - Continuum Analytics Data Scientist Christine Doig will teach you how to create, share, scale, and operationalise your models in our webinar on Wednesday, January 20th.

In this webinar, you'll learn to:

Build predictive models with Anaconda and using Python packages, such as pandas and scikit-learn, in the Jupyter Notebook
Use Anaconda and R together in your data science workflow
Share and collaborate with your team using Jupyter Notebooks
Scale out your work across hundreds of nodes with Anaconda

Christine will also conduct a Q&A session after the webinar - so tune in and get your data science questions answered

Register here

OpenAI

OpenAI is a non-profit artificial intelligence research company. Read all about it here on @open_ai

There is also this interview with Andrej Karpathy by Shelly Fan on +Singularity Hub which helps to flesh out some of the questions that are probably in the back of your mind.

I have to say this sounds exciting and I can't wait to see what great things they can come up with.

Stop making excuses and get your data distributed via @radar

Stop making excuses and get your data distributed via @radar by Patrick McFaddin - "Being a 'data-driven' company is now the norm, and globalising your data - or making your data available to customers worldwide - is the basic entry fee for participating," says Patrick McFadin. Here's how to do it with Apache Cassandra.

The free courses mentioned at the end sound interesting and would help to understand the usefulness of this too.

Thursday, 14 January 2016

The current state of machine intelligence 2.0 via @radar

The current state of machine intelligence 2.0 via @radar - A year ago, Shivon Zilis mapped the machine intelligence ecosystem. But it's been a busy year in machine learning. In this post, she updates her original mapping and considers autonomous systems and focused startups among other major changes that have been seen in machine intelligence in past year.

I found this fascinating.

4 Data in 2016 posts from @infomgmt

Here are 4 separate articles around Data in 2016 from +Information Management that I think combined give a good setting of the stage for this year:

Data in 2016: 5 Trends That Will Drive Big Data by Dave Weldon

Data in 2016: Big Data, Business Intelligence Still Top IT Concerns by Dave Weldon

Data in 2016: 6 Changes to Expect in Security, Cloud and Mobile Tech by Dave Weldon

Data in 2016: Top 10 CIO Concerns which is a Slideshow

I have to say I see more sensible big data implementations that give quick tangible benefits, instead of the huge projects we've seen in the past that have often failed to deliver anything apart from a massive spend, some training and skills for staff, and a few massaged egos.

Wednesday, 13 January 2016

Thriving On Data via @infomgmt

Thriving On Data by Ron Tolido on +Information Management - As organisations complete the latest wave of big data projects, plenty of opportunities and challenges remain. This is a 5 part article.

Part 1 - My data is bigger than yours
Part 2 - Real real time
Part 3 - Now you see me
Part 4 - Data apart together
Part 5 - Cognito Ergo Sum

Ten handy python libraries for (aspiring) data scientists

Ten handy python libraries for (aspiring) data scientists by Srinath Achanta on Big Data Made Simple - Data science has gathered a lot of steam in the past few years, and most companies now acknowledge the integral role data plays in driving business decisions. Python, along with R, is one of the most handy tools in a data scientist’s arsenal.

A very useful list.

Tuesday, 12 January 2016

Is Apache Hadoop the only option to implement big data? by @kumarchinnakali

Is Apache Hadoop the only option to implement big data? by @kumarchinnakali on Big Data Made Simple - Yes, Hadoop is not only the option to big data problem. Hadoop is one of the solutions

Very interesting article which has helped to clarify a few thing for me in my own mind.

Perfect way to build a Predictive Model in less than 10 minutes via @AnalyticsVidhya

Perfect way to build a Predictive Model in less than 10 minutes by Tavish Srivstava on +Analytics Vidhya - Here is a trick to build a predictive model in as less as 10 minutes.

A very interesting blog with embedded R code.

Monday, 11 January 2016

6 Powerful Reasons Why Your Business Should Visualize Data In 2016 via Maptive

6 Powerful Reasons Why Your Business Should Visualize Data In 2016 via Maptive - “A picture is worth 1,000 words.” You know the phrase. Interesting article from Maptive

How to make any plot in R using ggplot2? via r-statistics.co

How to make any plot in R using ggplot2? by Selva Prabhakaran via r-statistics.co - Simplified tutorial as the way you make plots in ggplot2 is very different from base graphics

This is a valuable resource if you don't fully understand how to do it. Has R code showing how to make the plots.

Sunday, 10 January 2016

Brad Hopper’s 2016 Enterprise Market BI Predictions via BI Solutions Review

Brad Hopper’s 2016 Enterprise Market BI Predictions via +Solutions Review - Business Intelligence vendors have a tendency to promote concepts such as “democratizing BI” and the like.

A very interesting read. So it's all about integration and everything being soother. I'm all for that as long as it gives correct results that can be depended upon on the way.

5 Major Data Analytics Missteps and How to Avoid Them via @infomgmt

5 Major Data Analytics Missteps and How to Avoid Them by Sana Narani via +Information Management - She has noticed five major mistakes organizations often make when implementing self-service analytics. Read on to explore her view on the top BI mistakes and learn how to avoid them.

I particularly agree with #3 - it's an area that tends to have far too little attention paid to it as it is perceived as lost time/money, whereas really if you don't concentrate on it the rest of what you do with the data is potentially worthless.

Saturday, 9 January 2016

How the Internet of Things Can Help Against Global Warming via @Datafloq

How the Internet of Things Can Help Against Global Warming by Francisco Maroto via +Datafloq - The Internet of Things is offers unique opportunities for addressing issues like clean water, landfill waste, deforestation, and air pollution and ultimately will help reduce the environmental effects of human activities. But can the Internet of Things be a game-changer in the fight against Global Warming due to a network of sensors to monitor the earth in real-time?

I'm hoping you feel as excited about the possibilities as I do after reading this article. It makes a great change for something to have the chance to be used for so much good.

How Artificial Intelligence Will Kickstart the Internet of Things via @Datafloq

How Artificial Intelligence Will Kickstart the Internet of Things by Ahmed Banafa via +Datafloq - The possibilities that IoT brings to the table are endless. IoT continues its run as one of the most popular technology buzzwords of the year, and now the new phase of IoT is pushing everyone to ask hard questions about the data collected by all devices and sensors of IoT. IoT will produce a tsunami of big data and the only way to keep up with this IoT-generated data and gain the hidden insights it holds is using Artificial Intelligence.

I found this article fascinating - it is well worth the investment in time reading it.

Friday, 8 January 2016

WEBINAR: Your Open Source Playbook for 2016 - 13 January 2016

Your Open Source Playbook for 2016

January 13, 2016 9:00 AM PST / 5:00 PM GMT

Is your company looking to transform digitally? More enterprise executives are realizing that an active open source software strategy is critical to transforming their business in the software-driven economy. While developers may rejoice, other parts of the organization may struggle with what this means for their processes and role requirements.

Join RedMonk analyst, Stephen O’Grady, and Pivotal’s Director of Open Source, Roman Shaposhnik, to learn:

Why open source is critical to digital transformation
What best practices to follow to avoid pitfalls
Where you should start to make changes and gain quick wins

By the end of this webinar you should have a high-level checklist for 2016 to make open source software support your digital transformation vision.

Speakers

Stephen O'Grady

PRINCIPAL ANALYST AND CO-FOUNDER, REDMONK

Based in Portland, Maine, Stephen O'Grady’s job is to help companies understand developers better and to help developers. He focuses on infrastructure software such as programming languages, operating systems and databases, as well as covering horizontal industry trends such as open source and cloud computing. Stephen is also a regular contributor to GearMonk, the RedMonk blog reviewing the latest gear. He is the author of “The New Kingmakers” and is regularly cited in publications such as the New York Times, BusinessWeek and the Boston Globe.

Roman Shaposhnik

DIRECTOR OF OPEN SOURCE, PIVOTAL

Roman Shaposhnik is leading the effort to transform Pivotal’s Data business into a 100% open source endeavor. He is a member of Apache Software Foundation, committer on Apache Hadoop, and, as of late, a man behind the ODPi.org curtain. Roman has been involved in Open Source software for more than a decade and has hacked projects ranging from Linux kernel to the flagship multimedia library called FFmpeg. He grew up at Sun microsystems where he had an opportunity to learn from the best software engineers in the industry. Roman's alma mater is St. Petersburg State University, Russia (the school that almost turned him into a mathematician).

Moderator

Dormain Drewitz

DIRECTOR, PRODUCT MARKETING, PIVOTAL BIG DATA SUITE

Dormain Drewitz is Director of Product Marketing for Pivotal Big Data Suite at Pivotal. Prior to Pivotal, she was Director of Platform Marketing at Riverbed Technology. Prior to Riverbed, she spent over five years as a technology investment analyst, closely following enterprise infrastructure software companies and industry trends. Dormain holds a B. A. in History from the University of California at Los Angeles.

WEBINAR: Optimize the Data Warehouse - Blueprint for Big Data Success - 12 January 2016

Rising data volumes can cause your business to struggle due to data warehouse performance issues and impact your businesses ability to meet SLAs. Learn how you can reduce strain on the data warehouse and existing infrastructure and improve the performance of information delivery by offloading less frequently used data to Hadoop.

Join our webinar on Tuesday, January 12th at 8:30 am PST/11:30 am EST to learn how your business can reduce costs and improve operational performance, including:

Reducing the need for expensive DW infrastructure
Saving management costs by using Hadoop
Extending the remaining useful life of your already established data warehouses
Meeting SLAs to deliver data on time to the end-user
Empowering your business users to meet goals on time
Satisfying compliance requirements

We'll also walk you through a real life customer example of a leading network storage and data management innovator who saw a 15x cost improvement and stronger performance against customer SLAs by leveraging Pentaho's easy to use, open and extensible data integration platform.

When to Ditch a Z-test for a T-test via linux-neophyte

When to Ditch a Z-test for a T-test via linux-neophyte - Two of the most important concepts in introductory statistics are hypothesis testing and confidence interval estimation in the one sample setting.

Great blog which helps to demystify how you work it out.

20 questions to detect fake data scientists via @importio

20 questions to detect fake data scientists via +import.io - Now that the Data Scientist is officially the sexiest job of the 21st century, everyone wants a piece of the pie. That means there are a few data posers out there. People who call themselves Data Scientists, but who don't actually have the right skill set.

Try and see if you can answer them all? I can answer some of them but not all of them.

Thursday, 7 January 2016

WEBINAR: The Past, Present and Future of Data Science – A Live Roundtable - 12 January 2016

Overview

Title: The Past, Present and Future of Data Science – A Live Roundtable

Date: Tuesday, January 12, 2016

Time: 09:00 AM Pacific Standard Time

Duration: 1 hour

Summary

The Past, Present and Future of Data Science – A Live Roundtable

Join us for our next Data Science Central Webinar, which will include a live roundtable discussion with members of the Pivotal Data Science team. The panel will feature three Principal Data Scientists – Sarah Aerni, Woo Jung and Rashmi Raghu. This unique event provides DSC Community members a full hour to interact with an elite team and learn from their years of real-world experience.

Topics discussed will include (but not be limited to):

How did you become a data scientist? What steps did you take?
What are the most important qualities of a data scientist?
How would you describe a typical work day?
What is the largest data set you have worked on?
Which tools/platforms do you use (R, Python, Hadoop etc.)?
How do you think the field of data science will evolve in the years to come?
What tools and techniques are you looking forward to using in the future?
In what ways do you think data science will continue to transform industries?

This is a rare opportunity to engage with this talented group in an open discussion, which will flow organically and as directed by those that attend.

Moderated by: Bill Vorhies, Editorial Director - Data Science Central

7 Killer Tools for Automating Competitor Intelligence Research via @Code_Fuel

7 Killer Tools for Automating Competitor Intelligence Research via @Code_Fuel - Competitor intelligence research is vital for any organization. This blog entry gives 7 tools which can automate any search - the results of which can feed into analytics.

Using these tools, with analytics, and both being automated would be a smart way to stay on top of a subject or competitor.

The Impact of IoT on Data Science and Analytics via @infomgmt

The Impact of IoT on Data Science and Analytics by Bart Schouw on +Information Management - Of all the trends that will shape data science and analytics over the next few years, the Internet of Things (IoT) promises to have the most profound impact of all.

Interesting thoughts that make you stop and think about the subject a little more than we already had.

Wednesday, 6 January 2016

We need open and vendor-neutral metadata services via @radar

We need open and vendor-neutral metadata services by Ben Lorica @radar - in this article he discusses Joe Hellerstein's recently sketched out a vision for open, vendor-neutral metadata services, which can give rise to many novel data products and applications, as well as lead to data-governance policies.

Article contains some great links to related documentation. I have to agree with him and hope that some of the vision becomes fact. When something is new we all accept that there will be some chaos, but as it becomes more mature we need standards in order to support us all going forward.

Balancing Personalized Services Without Threatening Data Privacy via @infomgmt

Balancing Personalized Services Without Threatening Data Privacy by Maggie Buggie via +Information Management - It’s a difficult question for retailers to address: are you personalizing services for your customers, or intruding on their privacy?

An interesting aspect to think about it. I think if we put ourselves in the shoes of the customer and asked the question if it was Amazon (as an example) asking this question of me how would I feel? Would I feel comfortable or uneasy? I have to say I have abandoned setting up an account or making a purchase with an organisation just because of the information they wanted from me. I couldn't see what it was relevant to them, me being a user/customer of them, and so felt I could not continue. Greed for information is one thing, but when you have no idea why you need it or what you would do with it, why are customers being asked for it??

Tuesday, 5 January 2016

Data, Design or Algorithms? via @infomgmt

Data, Design or Algorithms? by Steve Miller via +Information Management is a blog that discusses what is more important in these three areas, and contains some interesting thoughts and conclusion. Worth a read to make you sit and consider the question to see if you agree with his thought processes and conclusions.

I have to agree that they are all important in order to make what you have useful, but also agree there are other factors that need to be included in this discussion. Without Data Management how do you know what you have and what it really means/relates to other data? Without data quality how do you know your answer is right? Without data security how do you know that your data really is yours? IMHO the list is quite long.

Learn Big Data Analytics using Top YouTube Videos, TED Talks & other resources via @AnalyticsVidhya

Learn Big Data Analytics using Top YouTube Videos, TED Talks & other resources via +Analytics Vidhya - by Manish Saraswat this is a comprehensive list of excellent videos, resources available for you to get inspired & get going on the subject. Most as YouTube videos and there is a write-up with each to suggest the audience for it.

Monday, 4 January 2016

Will 2016 be the Year you Clean up your Dirty Data? via @dqmartindoyle

Will 2016 be the Year you Clean up your Dirty Data? via @dqmartindoyle on +Datafloq - in this article Martin Doyle says it feels like forever, we've been warning about the dangers of low quality data. Our warnings have been reinforced and echoed by some of the world’s biggest think tanks. However, despite this, some organisations still haven't acted to improve the quality of their data. Will 2016 be the year that organisations will clean their dirty data?

I have to say the only place to fix this is at source. Limit the allowed values in data fields, Use drop down lists to limit values, use free format text sparingly. Bad data pollutes any reporting or intelligence you look to gain from data and removing bad data from reporting stops it matching the system of record that it came from (which is almost as bad).

Five Patterns of Big Data Integration via @LakshmiLJ

Five Patterns of Big Data Integration via @LakshmiLJ - in this article by Lakshmi Randall on +Datafloq she discusses that as our reliance on Hadoop and Spark for data management, processing and analytics grows, data integration strategies should evolve to exploit big data platforms in support of digital business, Internet of Things (IoT) and analytics use cases.

I can imagine many are stuck thinking of the "old" was of integrating data and many processes and standards have not bee updated nor adapted for this new world of data integration.

Sunday, 3 January 2016

Why You Should Keep Your Analytics Simple via @sisense

Why You Should Keep Your Analytics Simple via @sisense - in this article by Eran Levy in +Datafloq he discusses a research paper by renowned consultancy Aberdeen Group which reveals that “[data] complexity is often best answered with simplicity”. Several other surveys reveal interesting findings with regard to the costs and benefits of using an integrated tool for data preparation, querying and visualization, as opposed to the “assembly line” approach.

I think it is important to keep things as simple as you can to make it easier to understand and maintain going forward.

Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' via @FortuneMagazine

Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' via +Fortune Magazine - in this article by Derrick Harris he describes why Cloudera are moving from MapReduce to Spark and the process/effect of doing that.

Interesting article and well worth the read

Full Clinical Benefit of Electronic Patient Records In Question via @infomgmt

Full Clinical Benefit of Electronic Patient Records In Question via +Information Management - Greg Slabodkin says while studies on the electronic exchange of health information show some evidence of benefits for healthcare, the full impact of HIE on improving clinical outcomes and avoiding potential harms has been inadequately studied and needs additional research.

This is a US based study and analysis. From the UK side, there was a project to implement a centralised patient records system but it was cancelled in 2011 after countless delays and escalating costs. I have to say as a UK patient there are several issues that the centralised system could have resolved which I have personal experience of:

Patient records are currently stored in local data silos and there is no information sharing between them. I have MRI scans in 2 hospitals in different areas and there is no sharing of that data between them.
Communication between clinicians seems to be via paper letter, which is scanned in as a document, but there appears to be no OCR applied to it to add meaningful data to the computer system.

There has to be some form of communication - whether that be an interface, a centralised database, or something else.

Saturday, 2 January 2016

Buckle your Seats Belts. 2016 is the Year of Awesome Analytics via @infomgmt

Buckle your Seats Belts. 2016 is the Year of Awesome Analytics via +Information Management - according to Dan Graham demand for data scientists, power users, and architects will soar even higher in 2016. Corporations will compete for these experts but will not fill their gaps.

I can imagine that happening but we need to make sure the financial recovery is good enough for the investment. I realise that this is an invest or die kind of situation, but it needs to be a good enough investment, and not something on the cheap.

Beginners Tutorial on Conjoint Analysis using R via @AnalyticsVidhya

Beginners Tutorial on Conjoint Analysis using R by Sray Agarwal on +Analytics Vidhya - A technique that allows companies to do more in limited budgets & used widely in product designing? Its known as "Conjoint Analysis". In this article Sray explores this new concept together with a case study, using R, for beginners to get a grip easily.

I love this idea and having the examples in R means I understand it better.

Friday, 1 January 2016

Happy New Year

Happy New Year to everyone. Wishing you all a successful 2016 where you find even more innovative uses for data and analytics.

M x