Data: MAP REDUCE

Showing posts with label MAP REDUCE. Show all posts

Friday, 14 October 2016

WEBINAR: Gain Extreme Agility and Performance Using a Spark-free Approach to Data Management - 20 October 2016

Date: Thursday, October 20, 2016
Time: Noon ET/ 9:00 am PT
Duration: 60 minutes (including Q&A)

What You'll Learn

Businesses are clamoring to capture all data possible and harness it as a revenue driver. The challenge is bringing the data together. Companies that can capture and harness this data can benefit accordingly.

When it comes to data management in Hadoop, the architecture foundation makes all the difference for performance. Jake Dolezal shares his research into the performance of data quality and data management workloads on Hadoop clusters. Jake discusses a YARN-based approach to data management and outlines highly effective IT resource utilization techniques to achieve extreme agility for organizations and performance gains in Hadoop.

What You Will Learn:

• Learn an effective method for democratizing data access and business intelligence
• Understand what it takes to break through the traditional trade-offs in managing big data and achieve both agility and performance without the use of code-based languages like Spark or MapReduce
• Discover how to achieve performance in Hadoop that is 5.5x faster than Spark and 19x faster than MapReduce
• How to manage complex, high-volume data with identity and entity resolution in the most demanding applications, such as customer data quality

All attendees will receive a free copy of the report “Hadoop Data Integration Benchmark” published by MCG Global Services.

Presenters

Jake Dolezal,
Practice Lead,
McKnight Consulting Group Global Services

Todd Hinton,
Vice President of Product Strategy,
RedPoint Global

Saturday, 28 May 2016

How to Perform RDBMS CRUD Operations with Hadoop MapReduce Integration via @Datafloq

How to Perform RDBMS CRUD Operations with Hadoop MapReduce Integration by Ethan Millar via +Datafloq - This article introduces the way to perform RDBMS operations with Hadoop integration. Hadoop is a trending technology these days and to understand the subject, you need to clear some basic facts about this technology. In this post, experts will explain how to read the RDBMS data and manipulate it with Hadoop MapReduce and write it back to RDBMS. They are introducing a way to perform simple RDBMS read and write operations using Hadoop.

Great article and highly recommended.

Tuesday, 29 March 2016

Introduction To Bulk Deletion Of Column Values In Hadoop Development With MapReduce via @Datafloq

Introduction To Bulk Deletion Of Column Values In Hadoop Development With MapReduce by Evan Gilbort via +Datafloq - For all the technical readers among you, here is an article about learning how to delete bulk column values by using Hbase bulk loading with Hadoop MapReduce. Proficient Hadoop developers are sharing important things required for bulk column deletion in Hadoop development. You can follow the steps shared by them to know how they do it and benefit from the insights provided by them.

Very technical but very useful.

Sunday, 21 February 2016

MapReduce Use Case-Youtube Data Analysis via @acadgild

MapReduce Use Case-Youtube Data Analysis via +ACADGILD - This blog is about analysing the data of youtube.This total analysis is performed in Hadoop MapReduce. This youtube data is publicly available and the youtube data set is described below under the heading Data Set Description. Using that dataset we will perform some Analysis and will draw out some insights like what are the top 10 rated videos in youtube, who uploaded the most number of videos.
By reading this blog you will understand how to handle data sets that does not have proper structure and how to sort the output of reducer.

Great code and a clear explanation of what the code is actually doing.

Saturday, 6 February 2016

Top Differences between Hadoop1.0 & Hadoop 2.0 via @greycampus

Top Differences between Hadoop1.0 & Hadoop 2.0 by Jenny Brown via +Greycampus - There’s a lot that has been written with regards to Hadoop1.0 & Hadoop 2.0. Here is a quick look at their main features and the differences that exist between the two.

Great summary from Grey Campus - well worth a read.

Tuesday, 5 January 2016

Learn Big Data Analytics using Top YouTube Videos, TED Talks & other resources via @AnalyticsVidhya

Learn Big Data Analytics using Top YouTube Videos, TED Talks & other resources via +Analytics Vidhya - by Manish Saraswat this is a comprehensive list of excellent videos, resources available for you to get inspired & get going on the subject. Most as YouTube videos and there is a write-up with each to suggest the audience for it.

Sunday, 3 January 2016

Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' via @FortuneMagazine

Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' via +Fortune Magazine - in this article by Derrick Harris he describes why Cloudera are moving from MapReduce to Spark and the process/effect of doing that.

Interesting article and well worth the read

Saturday, 10 October 2015

Apache Spark vs. Hadoop MapReduce via @Intersog

The new Apache Spark has raised a buzz in the world of Big Data. It promises to be more than 100 times faster than Hadoop MapReduce with more comfortable APIs, which begs the question: could this be the start of the end for MapReduce?

Great article by Jenny Richards on Intersog. At the moment there is a place for both as it really does depend exactly what you want to do. Over time that might change but for now there is definitely a place for them both.

Thursday, 17 September 2015

Spark versus MapReduce: which way for enterprise IT? via @computerweekly

Interesting comparison between the two. I can't disagree with the conclusion.

Saturday, 1 August 2015

The truth about MapReduce performance on SSDs by @yanpeichen and @kashkamb via @radar

It is well-known that solid-state drives (SSDs) are fast and expensive. But exactly how much faster — and more expensive — are they than the hard disk drives (HDDs) they're supposed to replace? And does anything change for big data?

Great article by Yanpei and Karthik where they show that the cost-per-performance is approaching parity with HDDs.

Monday, 1 June 2015

Is Spark better than Hadoop Map Reduce?

For anyone who gets into the Big Data world, the terms Big Data and Hadoop become synonyms. As they learn the ecosystem along with the tools and their workings, people become more aware about what big data actually means, and what role Hadoop has in the big data ecosystem.

Thursday, 3 July 2014

5 steps to offload your Data Warehouse with Hadoop

This +TDWI whitepaper is produced by Syncsort.

It's all about finding the most costly ETL and replacing it with equivalents in MapReduce.

Thursday, 26 June 2014

Google launches Cloud Dataflow - says MapReduce is tired.

In this article on +ZDNet it talks abou Google's announcement that it now uses Cloud Dataflow on pipelines not MapReduce which works better on single flows.

Seems that the world is changing again.

Monday, 2 June 2014

MapReduce - the concept behind Big Data. Links to resources and a summary of what it actually is.

Here is the link to the Google Research Publication on MapReduce.
Link to the Wikipedia page on MapReduce.
There is also an extensive set of documentation about MapReduce here.

Essentially the data has a MAP function (filter and sort) performed on it then a REDUCE (summary) function performed on the result of the MAP function. It uses parallel processing to do these (a bit like a multi level tree structure) controlled by a framework which makes it quick, increases fault tolerance and reduces redundancy. This is very similar to the structure of queries in Teradata using AMPS where one AMP controls the entire query.

The possibilities of this approach to me are exciting as it enables us to process large amounts of data in a short amount of time and give an end result that is worthwhile.

Data