Data: February 2016

Monday, 29 February 2016

WEBINAR: Industry Practitioners Discuss the Merits of Graph-Based MDM - 3 March 2016

Industry Practitioners Discuss the Merits of Graph-Based MDM

Complimentary Web Seminar
March 3, 2016
2 PM ET/11 AM PT
Brought to you by Information Management

Customer data is at the core of all business processes. The promise of MDM is to provide a framework for that customer data to be considered a reusable asset for achieving a “consistent view.” Despite a high degree of importance, CIOs ̶ and more recently CDOs ̶ have struggled to demonstrate the value and justification for continued effort behind MDM programs given historically costly, lengthy timelines and limited insights. The introduction of a graph-based MDM, however, is helping a growing number of companies across different industries to leapfrog the inherent challenges found in more traditional MDM approaches and deliver contextually relevant views back to the business inside all systems of engagement and interactions.

Attend this roundtable discussion with industry practitioners as they discuss the merits of graph-based MDM, describe how it is different and take your questions.

In this webinar you’ll learn:

What is a graph database and what makes it different from other databases
How graph-based MDM is not only quicker to stand up, but can help deliver richer insights back to the business for better ROI
Real-life use-cases and war stories from industry practitioners who’ve implemented both traditional MDM solutions and graph-based MDM

Do Not Confuse Data Governance With Data Management via @infomgmt

Do Not Confuse Data Governance With Data Management by Henry Payret via +Information Management - A quick summary of the difference - useful as a reminder

Which jobs will AI (Artificial Intelligence) kill? via @DataScienceCtrl

Which jobs will AI (Artificial Intelligence) kill? by Vincent Granville on @DataScienceCtrl - great blog by Vincent Granville looking at what has already been replaced and what could be replaced gong forward. Makes you top and think about where you might want to focus in the future.

Sunday, 28 February 2016

The Elements of Python Style by Andrew Montalenti

The Elements of Python Style by Andrew Montalenti - His document goes beyond PEP8 to cover the core of what I think of as great Python style. It is opinionated, but not too opinionated. It goes beyond mere issues of syntax and module layout, and into areas of paradigm, organization, and architecture. I hope it can be a kind of condensed "Strunk & White" for Python code.

Amazing resource and useful for anyone at any level of python coding.

Is my developer team ready for big data? via @oreilly

Is my developer team ready for big data? by Jesse Anderson via +O'Reilly - Big data and NoSQL technologies represent dramatic shifts in complexity compared to the technologies that came before. Do your team members have the skills to evaluate options, create the solution, and troubleshoot problems? Jesse Anderson provides a business leader’s guide to launching big data projects.

Great article and provides a starting point to avoid problems.

Saturday, 27 February 2016

Apache Cassandra for analytics: A performance and storage analysis via @oreilly

Apache Cassandra for analytics: A performance and storage analysis by Evan Chan via +O'Reilly - Evan Chan presents a performance and storage analysis of Apache Cassandra, comparing the effects of storage format, modeling/filtering, caching, and other effects on analytical query speed and storage cost.

Really excellent analysis with lost of data and code examples to help you dig into it further.

Predictive modeling: Striking a balance between accuracy and interpretability via @oreilly

Predictive modeling: Striking a balance between accuracy and interpretability by Patrick Hall via +O'Reilly - Here's how to strike a balance between accuracy and interpretability when you're using machine learning models in regulated industries.

Great article containing very good advice.

Friday, 26 February 2016

WEBINAR: Predictive Analytics Deployment to Mainframe or Hadoop - 3 March 2016

Predictive Analytics Deployment to Mainframe or Hadoop

Thursday, March 3, 2016 7:00:00 PM GMT - 8:00:00 PM GMT

The big challenge for analytics-driven organizations today is closing the gap between deriving an analytic result and getting the ROI. Organizations need a consistent and efficient way to deploy analytic results into everything from systems of record like mainframes to modern big data infrastructure.

Join James Taylor, CEO of Decision Management Solutions and Michael Zeller, CEO of Zementis, in this live webinar to learn how the Predictive Model Markup Language (PMML) provides an XML standard that streamlines the deployment of predictive analytic models. With PMML a model can be developed in one tool or language, whether open source like R or commercial predictive analytics products, and easily migrated to a wide range of operational systems including mainframes like IBM zSystems and new data infrastructure like Hadoop with Spark / Storm / Hive.

You will learn:
• The challenges of an increasingly complex analytic environment.
• How analytics increase the value of legacy systems of record, mainframes and big data infrastructure.
• Why PMML is the critical glue between heterogeneous analytics environments.

The presenters will use case studies to outline this proven, standard-based approach to analytics deployment in today's complex predictive analytics environments.

All registrants will receive a copy of the new paper by James Taylor, "Standards-based Deployment of Predictive Analytics - Using a standards-based approach to deploy predictive analytics on operational systems from mainframes to Hadoop."

Register today! This is a live webinar with a Q&A following. If you would like to attend but can't make the webinar time, please register to receive a copy of the white paper, presentation and a link to the recording.

Presenters:



James Taylor and Michael Zeller



James Taylor is the CEO of Decision Management Solutions, experts in decision management and decision modeling. He provides strategic consulting, working with clients to adopt decision modeling, predictive analytics and business rules. James is the author of multiple books and articles and writes a regular blog at JT on EDM.  James is a contributor to the BABOK® Guide on decision modeling and is a co-submitter of the new Decision Model Notation (DMN) standard. 

Michael Zeller is the CEO and Co-Founder of Zementis. Mike has extensive experience in strategic implementation of technology, business process improvement and systems integration. He strives to provide customers with innovative business solutions tailored to their unique needs. He also serves on the Board of Directors of Tech San Diego and as Secretary/Treasurer on the Executive Committee of ACM SIGKDD, the premier international organization for data mining.

Thursday, 25 February 2016

WEBINAR: IBM SPSS Predictive Analytics series - 2, 9 and 16 March 2016

Webcasts

Select one or more of the following webcasts

What's New in Predictive Analytics Wednesday, March 02, 2016, 11:00 AM EST

What's New in IBM SPSS Statistics Wednesday, March 09, 2016, 11:00 AM EST

What's New in IBM SPSS Modeler Wednesday, March 16, 2016, 11:00 AM EDT

How The ETL Bottleneck Will Kill Your Business via @forbes

How The ETL Bottleneck Will Kill Your Business by Dan Woods ( @danwoodsearly ) via +Forbes - The landscape of data is growing rapidly. We now have access to new forms of big data, but also many high quality curated data sets from APIs etc. There is a crucial skill that used to go by the name of ETL that is highly undervalued and crucial to making all of this work.

Very interesting article. Please note it is 4 screens long.

Self-Service Analytics: This time, it’s Different via @Data_Informed

Self-Service Analytics: This time, it’s Different by Alivin Wong via @Data_Informed - Alvin Wong of Logi Analytics looks at the evolution of self-service BI tools and explains why, this, time, self-service is here to stay.

I completely agree with him - it has to be a joint IT and Business effort to ensure that there is success - if one or the other is not on board it is doomed to failure.

Wednesday, 24 February 2016

Topic Modeling Large Amounts of Text Data via @Data_Informed

Topic Modeling Large Amounts of Text Data by Frank D. Evans via @Data_Informed - Exaptive Data Scientist Frank Evans discusses how to use Spark to glean insights from large sets of unstructured text data.

Really worthwhile read as we all struggle with unstructured data.

IoT Sensors Giving Low-Tech Industries High-Tech Benefits via @infomgmt

IoT Sensors Giving Low-Tech Industries High-Tech Benefits by Naniv Vardi on +Information Management - Industries such as manufacturing, commercial real estate, agriculture, and even waste management are being disrupted and revolutionised by IoT sensors.

A great look at the kinds of industries and uses that can utilise the IoT.

Tuesday, 23 February 2016

The Phases of Hadoop Maturity: Where Exactly Is it Going? via @Data_Informed

The Phases of Hadoop Maturity: Where Exactly Is it Going? by Chad Carson via @Data_Informed - With Hadoop marking 10 years of being in production, Chad Carson of Pepperdata discusses the various phases on Hadoop growth.

Amazing to think it has been around for that long.

The Graph Database Comes Into Its Own via @infomgmt

The Graph Database Comes Into Its Own by Emil Eifrem on +Information Management - For those of us in the field, we know graphs are long overdue mass attention, but we also know CIOs have been quietly taking advantage of them for some time.

A look at something we all know has been coming for a while. I guess this article means it is becoming more mainstream.

Monday, 22 February 2016

WEBINAR: Choosing the Right Graph Database to Succeed in Your Project - 25 February 2016

Thursday, February 25, 2016

8 am PT | 11 am ET | 4pm GMT

There are different graph databases both as functional, as well as deployment capabilities. And the choice you make at the beginning may affect the success of your project at the end.

This webinar will focus on the five most important criteria you should consider:

1. What benefits can the graph database bring to your project - Typical use cases and most optimal data management solutions

2. What kind of data can be stored in it - Data modelling and third party datasets

3. How do you want to explore your data - Query and visualization capabilities

4. How does the database fit into your system - Integration, scalability and tools

5. Who will manage your database - Deployment, support, upgrade and infrastructure

At the end of this webinar you will walk away with a check-list of what to watch for while choosing a graph database.

Complementary you will get an overview of the typical use cases of semantic graph databases ( i.e. heterogeneous data integration and “360 view” of enterprise data, information discovery, content enrichment and metadata management), and a map for a safe choice between flavours of one of the leading semantic graph databases – GraphDB by Ontotext – from free edition and single-node deployments to high-availability clusters and fully managed database-as-a-service in the Cloud.

Don’t miss the opportunity to be a part of this live event where you can submit your own questions and hear from one of the pioneers in Smart Data analytics.

Recommendations – SlideShare Presentations on Data Science by @AnalyticsVidhya

Recommendations – SlideShare Presentations on Data Science by Kunal Jain on +Analytics Vidhya - Among all forms of media, presentations are probably the most crisp & to the point! Easier to learn and revise with, these Slideshare presentations in Data Science will make a good resource for you.

A great list from Kunal and well worth a look.

Finding the Right Colour Palettes for Data Visualisations via @7wdata

Finding the Right Colour Palettes for Data Visualizations via +Yves Mulkers (7wData) - While good colour palettes are easy to come by these days, finding the right colour palette for data visualisations is still quite challenging.

Interesting article which makes some very good points. Just don't forget the 4 to 10% of men who are colour blind as referred to in rule 1

Sunday, 21 February 2016

MapReduce Use Case-Youtube Data Analysis via @acadgild

MapReduce Use Case-Youtube Data Analysis via +ACADGILD - This blog is about analysing the data of youtube.This total analysis is performed in Hadoop MapReduce. This youtube data is publicly available and the youtube data set is described below under the heading Data Set Description. Using that dataset we will perform some Analysis and will draw out some insights like what are the top 10 rated videos in youtube, who uploaded the most number of videos.
By reading this blog you will understand how to handle data sets that does not have proper structure and how to sort the output of reducer.

Great code and a clear explanation of what the code is actually doing.

Big Data in the Sandbox: Learning to Play Better via @infomgmt

Big Data in the Sandbox: Learning to Play Better by Avi Kalderon via +Information Management - As these sandboxes have grown in popularity, the opportunity they present needs to be balanced with some responsible boundaries.

I completely agree - there has to be done kind of discipline - if it becomes a free for all it will be almost impossible to move it into a production state. I've worked in IT trying to pull in something from the business team and sometimes their assumptions are wrong and sometimes to test and document it is to break it, which negates the whole advantage and purpose.

Saturday, 20 February 2016

Free Hadoop and Spark Training Program Draws Over 50,000 Participants via @infomgmt

Free Hadoop and Spark Training Program Draws Over 50,000 Participants by David Weldon via +Information Management - The MapR Technologies program offers online courses on Hadoop, Spark, and other big data technologies.

Well worth a visit if you have the time.

Friday, 19 February 2016

Anodot Provides Anomaly Detection and Operational Intelligence

Anodot Provides Anomaly Detection and Operational Intelligence by Mark A. Smith via +Information Management - Unlike most vendors in the space, the company is delivering anomaly detection and operational intelligence through software-as-a-service (SaaS).

This is a really exciting offering and brings a great useof analytics to the masses via this SaaS.

3 Questions You Should Ask Your Analytics Vendor via @infomgmt

3 Questions You Should Ask Your Analytics Vendor by Paul Hofmann via +Information Management - Not all analytics solutions are equivalent or appropriate for specific needs. Making the best solution and vendor choice is paramount to the success of any analytics initiative.

I agree completely. In fact you may find you have to have more than one, because sometimes one size does not fit all.

Thursday, 18 February 2016

3 Themes for Data Governance in 2016 via @Data_Informed

3 Themes for Data Governance in 2016 by Jelani Harper via @Data_Informed - As the big data landscape evolves in terms of technologies and users, so too do the requirements for effective data governance.

Data Quality, Data Governance and Data Management are key to success. You need to know what you have, that it is correct, and that it will stay correct in order to rely on you data to make decisions and have a positive outcome - anything else if purely luck.

The 3 A’s of Enterprise Integration via @Data_Informed

The 3 A’s of Enterprise Integration by @rdharn1 via @Data_Informed - Modern organizations require real-time insight based on structured and unstructured data from an ever-growing number and variety of sources. Ravi Dharnikota of SnapLogic offers tips for identifying the right data integration platform for your organization’s needs.

Very useful way of thinking and should mean you can handle whatever comes next as you have covered it already as a possibility.

Wednesday, 17 February 2016

WEBINAR: Analysing Census Time-Series Data with Feature Extraction and Clustering - 23 February 2016

Overview

Title: Analyzing Census Time-Series Data with Feature Extraction and Clustering

Date: Tuesday, February 23, 2016

Time: 09:00 AM Pacific Standard Time

Duration: 1 hour

Summary

Analysing Census Time-Series Data with Feature Extraction and Clustering

What's Really the Next Silicon Valley? In this latest Data Science Central Webinar event, Matt Coatney, a Data Scientist at Exaptive will discuss feature extraction and clustering of time series data, using city census data about businesses as fodder. He'll explain his approach for symbolically representing time series data with Symbolic Aggregate approXimation (SAX) and sorting them with Fourier transform.

Matt will also explain how algorithms from genetics can aid in clustering time series data. You will see the Python he wrote to implement the algorithms, and he will explore the results of the clustering with a visual application to find patterns in the census data.

Now let's finally settle what city is really going to be the next Silicon Valley!

Speaker: Matt Coatney, data scientist and VP Services at Exaptive,

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Register here

The Analytics Diet: How Big Data Can Help You Lose Weight via @Data_Informed @BernardMarr

The Analytics Diet: How Big Data Can Help You Lose Weight by +Bernard Marr via @Data_Informed - Bernard Marr reviews the ways in which big data analytics is being applied to the areas of health monitoring and weight loss.

Never did I realise how many apps there are out there to help you lose weight. Not sure they are going to help you when you inevitably lose faith/interest in the whole process and give in.

What’s Changed: 2016 Gartner Magic Quadrant for Business Intelligence and Analytics Platforms via @BigData_Review

What’s Changed: 2016 Gartner Magic Quadrant for Business Intelligence and Analytics Platforms by Timothy King via @BigData_Review - This article gives a great summary of the changes in the 2016 Gartner Magic Quadrant.

A must read not only for confirmation that you chose right, but also gives you a clear idea which are on the way up and which are on the way down.

Tuesday, 16 February 2016

What happens when you have outliers in your data? via @mcpasin

What happens when you have outliers in your data? by Marco Pasin @mcpasin via analytics for fun - great example of the effect an outlier can have on your data. As it uses a small list of numbers it is very easy to see the impact.

50+ Data Science and Machine Learning Cheat Sheets via @kdnuggets

50+ Data Science and Machine Learning Cheat Sheets by Bhavya Geethika via +KDnuggets - There are thousands of packages and hundreds of functions out there in the Data science world! An aspiring data enthusiast need not know all. Here are the most important ones that have been brainstormed and captured in a compact few pages.

Great resource and need to be made a bookmark if nothing else.

Monday, 15 February 2016

Internet-of-Things becomes Internet-of-Everything via @infomgmt

Internet-of-Things becomes Internet-of-Everything by Per Ricktun via +Information Management - What we see now is that we still gather data from remote devices and sensors, but the data can be used to trigger action. To execute business processes. Or influence already running processes.

Per makes a great point - the application of this data is expanding and being used to control processes.

Mini DataHack and the tactics of the three “Last Man Standing”! via @AnalyticsVidhya

Mini DataHack and the tactics of the three “Last Man Standing”! by Kunal Jain via +Analytics Vidhya - Here's what you would say a treasure of learning !! How our Signature Hackathon "Last Man Standing" Winners took up the challenge and fetch the Top Scores. MUST read as there's another challenge coming up for you all.

I would add that even if you have no intention of entering any of the hackathons it is a great place o learn and pick up new skills and techniques.

Sunday, 14 February 2016

The Machine Learning Revolution: How it Works and its Impact on SEO via @moz

The Machine Learning Revolution: How it Works and its Impact on SEO by Eric Enge via +Moz - Machine learning is becoming more and more prevalent in the SEO industry, driving algorithms on many major platforms. In this blog post Eric Enge will dive into a certain amount of technical detail about how it works, but will also discuss its practical impact on SEO and digital marketing.

Nice article worth reading.

Top 10 Machine Learning Algorithms via @dezyreonline

Top 10 Machine Learning Algorithms on +DeZyre - According to a recent stgoog_728477731udy, machine learning algorithms are expected to replace 25% of the jobs across the world, in the next 10 years. With the rapid growth of big data and availability of programming tools like Python and R - machine learning is gaining mainstream presence for data scientists.

Great list which could also be a guide on which algorithms to try/learn/use next.

Saturday, 13 February 2016

How to implement these 5 powerful probability distributions in Python via Big Data Made Simple

How to implement these 5 powerful probability distributions in Python by Manu Jeevan on Big Data Made Simple - R is considered as the de facto programming language for statistical analysis right? But In this post, Jeevan will show you how to easily implement statistical concepts using Python.

Very clear explanation with code.

Why Most Business Intelligence Tools Fail the 'Hadoop Test' via @infomgmt

Why Most Business Intelligence Tools Fail the 'Hadoop Test' by Sarah Gerweck on +Information Management - BI is still largely using 17th-century statistical techniques: counts, sums, averages and extrema. At most, we might use techniques that were used by Gauss and Galton in the 19th century.

Very interesting blog. I have to agree with her that BI is ok for lightweight reports but to do proper statistics in the past has needed something specialist. If BI can adapt to enable use of Hadoop like structures and more statistics type reports/functions it will continue to be successful.

Friday, 12 February 2016

WEBINAR: Using Databases and Containers: From Development to Deployment - 24 February 2016

DATE: Wednesday, February 24, 2016

TIME: 1:00PM ET

In this talk we review what Docker is and why it's important to Developers, Admins and DevOps when they are using a NoSQL Database such as Aerospike, the high performance NoSQL Database. Persistence is a critical element for a successful multi-Container strategy.

We also cover the following topics:

Using Docker to Orchestrate a multi container application (Flask + Aerospike)
Injecting HAProxy and other production requirements as we deploy to production
Scaling the Web and Aerospike clusters to grow to meet demand

This presentation led by Alvin Richards, VP of Product at Aerospike includes an interactive demo showcasing the core Docker components (Machine, Engine, Swarm and Compose) along with Aerospike’s integration. We hope you will see how much simpler Docker can make building and deploying multi-node Aerospike based applications.

Register here

The Top A.I. Breakthroughs of 2015 via @kdnuggets

The Top A.I. Breakthroughs of 2015 by Richard Mallah on +KDnuggets - Learn about the biggest developments of 2015 in the field of Artificial Intelligence.

This is a three page post. Some really interesting developments Be careful to click on next page and not next post.

How Big Data And Analytics Are Changing Hotels And The Hospitality Industry

How Big Data And Analytics Are Changing Hotels And The Hospitality Industry by +Bernard Marr via +Forbes - How hotels and the hospitality industry can use Big Data and analytics to their advantage.

This is a 2 page article. I agree and it is quite exciting thinking about the quantity of data they have (or could have),

Thursday, 11 February 2016

WEBINAR: Text Analytics Delivers Game-Changing Customer Insights - 16 February 2016

Text Analytics Delivers Game-Changing Customer Insights

Join us to learn how text analytics can help you discover the hidden social insights that can transform your business

Date: February 16, 2016
Time: 11 AM ET

To remain competitive, businesses need to operate at the speed of social. At least 80% of enterprise data is unstructured, contained in the myriad text-based social conversations that are happening every day. Unlocking the hidden value of text through predictive analytics is imperative for understanding customers’ opinions and needs to make better, more informed business decisions.

During this webinar RapidMiner and Aylien will explore the power of social content by analyzing data captured from thousands of tweets referencing Super Bowl 50 ads to determine viewer sentiments and predict potential trends in brand adoption.

Attend this webinar to:

Learn how to leverage predictive and text analytics for: understanding your clients, improving customer satisfaction, and optimizing marketing spend
Learn how to quickly make sense of social media data across thousands of responses using sentiment analysis and predictive modeling
Understand the impact of predictive and text analytics on business opportunities
Learn how to share and communicate customer insights through data visualization

Can’t attend? Register anyways, and we will send you the recording of the webinar after the event.

Register here

VIDEO: Learn Apache Spark in simple and easy steps via @Intellipaat

Learn Apache Spark in simple and easy steps via +Intellipaat - A beginner's tutorial containing complete knowledge of Apache Spark. If this has wet your appetite you can go to their website to see what training they offer.

Finding the Right Roadmap for a Successful Analytics Journey via @infomgmt

Finding the Right Roadmap for a Successful Analytics Journey by Fred Bazzoli via +Information Management - Organisations launching their first data analytics program often make missteps early in the process that can set back the project just as it is getting started.

If only some organisations, before starting the journey into analytics, could read something like this, they could prepare themselves better for for what lies ahead (and make it a greater success).

Wednesday, 10 February 2016

Why Emerging Leaders Must Get to Know Big Data via @Datafloq

Why Emerging Leaders Must Get to Know Big Data by Kate Rodriguez via +Datafloq - In business, there can be chasms between promise and fulfilment. Companies of all sizes are investing in technology to assemble unprecedented amounts of data from a wide array of sources. This goldmine of information, when properly analysed, can solve existing problems and reveal new opportunities. Yet many organizations are not reaping the rewards of their massive data collection efforts.

I agree with her - if senior management have not had some training/education on this subject how are they going to make decisions on priorities, budgets, projects. You need the whole picture to manage effectively.

Data Scientist or Data Science Team? That's the Question! via @Datafloq

Data Scientist or Data Science Team? That's the Question! by Michael Young via +Datafloq - Data Scientist or Data Science Team? Many businesses struggle to recruit the right talent they need. We see this particularly in fields like Data Science, where such skills are in demand to help business growth. The answer to this important question depends on the needs of your business, your big data strategy and how much you can afford to invest.

Interesting article. I'm tempted to say that a Team is better than a Data Scientist on the basis of economies of scale - a team can split the work and do the parts they are good at - a Data Scientist has the potential to become a bottleneck.

Tuesday, 9 February 2016

Framing 10,000 Tweets (Sentiment Analysis in R) by @juliasilge

Framing 10,000 Tweets (Sentiment Analysis in R) by Julia Silge @juliasilge - An excellent, and extensive, introduction to using R and ggplot2 to analyse twitter posts.

A great example including R code. Recommended if you are starting out with R and wondered how to do this.

Big Data Landscape 2016 Created by @Mattturk, @Jimrhao and @firstmarkcap

Big Data Landscape 2016 Created by @Mattturk, @Jimrhao and @firstmarkcap on Data Science Central

The failure to replicate scientific findings by Kaiser Fung

The failure to replicate scientific findings by @junkchart (Kaiser Fung) - Scientific reproducibility has been much discussed of late. Quartz goes so far as to say "Nearly all of our medical research is wrong." Much of this lack of reproducibility comes from cherry-picking statistically significant results from a larger batch of experimental results, a practice that has become known as P-hacking. If you find yourself falling into this trap, make sure to apply corrections to avoid it.

I love Randall's description of the problem.

Monday, 8 February 2016

WEBINAR: Aster-R: Becoming An R Power User With Aster - 9 February 2016

Start Date:2/9/2016
Start Time:10:00 AM PST
Duration:60 minutes

Abstract:
You will learn how data scientists can provide business value by using Aster-R without the usual disgruntle of “I do not want to learn that.” We will discuss several use cases from marketing analytics to text analytics developed in Aster-R that excel in ease of use and efficiency.

SPEAKERS

Diego Klabjan
Professor
Department of Industrial Engineering and Management Sciences Director, Master of Science in Analytics. Northwestern University

Diego Klabjan's research is focused on applying analytics in the areas of sustainability, airline management, railway industry, logistics, operations management and supply chain optimization, production planning, integer programming, and parallel computing. Among other companies, he has collaborated with United Airlines, American Airlines, Sabre Holdings, FedEx Express, General Motors, and NASA.

Diego Klabjan is a professor at the Northwestern University, Evanston, Illinois, Department of Industrial Engineering and Management Sciences. After obtaining his doctorate from the School of Industrial and Systems Engineering of the Georgia Institute of Technology in 1999, in the same year he joined the University of Illinois as an assistant professor in the former department of Mechanical and Industrial Engineering. After returning from his sabbatical leave of absence at the Massachusetts Institute of Technology, in 2006 he joined the Department of Civil and Environmental Engineering at the University of Illinois at Urbana-Champaign. In summer 2007 he accepted an associate professor position at Northwestern University. In 2012 he was promoted to a full professor at Northwestern University. He is the recipient of the first prize of the 2000 Transportation Science Dissertation Award and the Preseren's award for the outstanding undergraduate thesis.

He is the Director of the Master of Science in Analytics.

He is a former president of the INFORMS Aviation Applications Section and he is actively involved in AGIFORS.

John Thuma
Senior Director, Aster Strategy and Analytics
Teradata

A technology leader and evangelist, John Thuma is recognized in data warehousing, business intelligence, and advanced analytics for his next-level strategies. With nearly 30 years of practical experience, John has developed and implemented real world governance and international solution development programs across a variety of industries and disciplines.

He is a regular contributor to many media outlets and a faculty member of the International Institute of

Register here

How to manage your data before it manages you via @7wdata

How to manage your data before it manages you via @7wdata - Evolving technologies, more stringent compliance policies, and the need for big data analytics and long-term data retention, are all important factors that these professionals must consider when attempting to manage their data.

I think a good aim to have is that you must manage your data before it manages you. The less control you have (be that data quality, data management, data storage, data archiving, etc) The less reliable your results are from using that data, and the higher the cost will be for the data overall.

Data Quality Management Lacking Among Businesses via @eWEEKNews

Data Quality Management Lacking Among Businesses by Nathan Eddy via +eWEEK.com - Data quality must improve significantly if businesses are to reap benefits, according to a Blazent survey of 200 C-level and senior IT professionals.

This has always been the case, but as our use has become much more sophisticated and the volumes have increased data quality has also become more critical. After all would you want the responsibility of justifying anything on the basis of incorrect data?

Sunday, 7 February 2016

3 great graph database blog entries around the subject of fraud from @Neo4j

3 great graph database blog entries around the subject of fraud from the team at +Neo4j - the World's Leading Graph Database -

First party bank fraud - linking people via addresses, phone numbers, accounts is just not possible in a standard hierarchical database. I've seen for myself the duplicates that can exist in a data warehouse when you load all your order management data. The different spellings, the duplicate addresses, etc. Bringing it all back to the one address, as an example) can be a major undertaking.

Insurance fraud - how a graph database can be used to map out the relationships between people and claims - I believe this can be done much earlier in the process now using a graph database as opposed to waiting for complicated analytics.

E-commerce fraud - how a graph database can be used efficiently to detect fraud - something that in the past has been rather slow and clunky.

How much SQL is required to learn Hadoop? via @dezyreonline

How much SQL is required to learn Hadoop? via +DeZyre - With widespread enterprise adoption, learning Hadoop is gaining traction as it can lead to lucrative career opportunities. There are several hurdles and pitfalls students and professionals come across while learning Hadoop. This post provides detailed explanation on how SQL skills can help professionals learn Hadoop.

Interesting thoughts. Certainly I can cope with SQL fine but Java I'm certainly not as good at. Nice to read confirmation I can survive with the skills I have.

Saturday, 6 February 2016

Type A Data Scientist vs. Type B Data Scientist via @dezyreonline

Type A Data Scientist vs. Type B Data Scientist via +DeZyre - In only 2 years, the role of data scientist has gained traction from various organizations leading to increased employment of data scientists.Data scientists come from various backgrounds and they can be accountants, programmers, mathematicians, business analysts, statisticians, visualization experts, machine learning practitioners, data miners, data engineers, etc.In this topology of different data scientists, it is necessary to understand the differences between the two –Type A Data Scientist and Type B Data Scientist that bring value to an organization.

I'm definitely a Type 2 - I can do some statistics but I would be lost in a heavy session of it.

Top Differences between Hadoop1.0 & Hadoop 2.0 via @greycampus

Top Differences between Hadoop1.0 & Hadoop 2.0 by Jenny Brown via +Greycampus - There’s a lot that has been written with regards to Hadoop1.0 & Hadoop 2.0. Here is a quick look at their main features and the differences that exist between the two.

Great summary from Grey Campus - well worth a read.

Friday, 5 February 2016

Big Data and the Progression toward Streaming Analytics via @Data_Informed

Big Data and the Progression toward Streaming Analytics by Apurva Dave on @Data_Informed - With today’s limitless data sets and business demands for real-time insight, IT administrators need a new toolkit for drawing insights and even a new language, writes Apurva Dave of Jut.

I agree with him - there are free places in the marketplace for new streaming analytics platforms

Predict the Winners of the Big Games with Machine Learning via @Data_Informed

Predict the Winners of the Big Games with Machine Learning by Nirmal Fernando via @Data_Informed - With the build-up to the Super Bowl underway, Nirmal Fernando explains how the WSO2 Machine Learner team built a machine-learning model to predict playoff winners and the Super Bowl champion.

A great example of how machine learning can get a good result - fingers crossed for the actual result.

Thursday, 4 February 2016

Data Mining Critical for Personalization, But Few Firms Do It Well via @infomgmt

Data Mining Critical for Personalization, But Few Firms Do It Well by David Weldon on +Information Management - A new study confirms the importance of data mining in managing customer relationships and creating personalization, but finds that most organizations fail to use the technique effectively.

This underlines the fact that if you do not do something properly you will not have the stated benefits.

4 Trends That Are Driving Business Intelligence via @infomgmt

4 Trends That Are Driving Business Intelligence by David Wedon on +Information Management - Many organizations have sung the praises of business intelligence for years, but many of those firms were not actually realizing the full benefits of it. That picture is beginning changing, as advanced analytics tools and techniques mature.

I agree with the last point - the speed that BI can now be done has opened up things significantly as in the past there wasn't the ability to do a lot of things.

Wednesday, 3 February 2016

The Imperative for Ethical Standards in Analytics via @infomgmt

The Imperative for Ethical Standards in Analytics by Scott Nestler via +Information Management - As businesses and organizations of all shapes and sizes continue to seize the rapidly growing opportunities presented by data and analytics, the risks associated with the unprincipled use of analytics grow even stronger.

A new quantum approach to big data via @EurekAlertAAAS @MIT

A new quantum approach to big data: by +Massachusetts Institute of Technology (MIT) via @EurekAlertAAAS - MIT research has found that systems for handling massive digital datasets could make impossibly complex problems solvable.

This is really interesting and I look forward to reading more about it and hopefully seeing something about a prototype.

Tuesday, 2 February 2016

Best practice advice for moving to the cloud via ZDNet

Best practice advice for moving to the cloud by Mark Samuels on +ZDNet - So you want to move it all into the cloud? Then learn from these tips from someone who has already done it.

Really useful tips that should be reviewed as you may find something you hadn't thought of.

Introducing Kaggle Datasets via @kaggle

Introducing Kaggle Datasets via @kaggle - Hosting open datasets is nothing new, but Kaggle Datasets goes much further. With it, anyone can view raw data, analyze it, and view and discuss results. Want to know what the most gender-neutral baby names are in the US? Someone's already run that analysis.

Monday, 1 February 2016

Why 2016 might be the year of citizen data scientists via @7wdata

Why 2016 might be the year of citizen data scientists via @7wdata - Find out what citizen data scientists will do in this new data ecosystem. Also, know the potential benefits and trade-offs to leaning on these pros for analytics. the article author recently visited with Shawn Rogers, Chief Research Officer at Dell Statistica, a business unit within +Dell Software

I agree with him - there are so many tools that make it easier to produce the kinds of analyses and visuals that traditionally a Data Scientist would have been producing.

Scheduling R Markdown Reports via @mcpasin

Scheduling R Markdown Reports via Email by Marco Pasin ( @mcpasin ) - Marco has written a great guide on how to do this here.

Recommended (and while you are there sign up for his posts and follow him).