Showing posts with label HADOOP. Show all posts

Friday, 31 July 2020

10 big data blunders businesses should avoid by Sara Brown via @MITSloan

Big data is a promising investment for firms, but embracing data can also bring confusion and potential minefields - everything from where companies should be spending money to how they should be staffing their data teams.

This was an interesting read and definitely a good list to use as a basis of what you need to avoid in order to not make a mistake.

Thursday, 14 November 2019

WEBINAR: Hadoop-to-Cloud Migration: How to modernize your data and analytics architecture - 21 November 2019

Hadoop-to-Cloud Migration: How to modernize your data and analytics architecture

November 21, 2019

10:00 AM PT

Hi there,

Many Hadoop customers struggle with its system complexity, unscalable infrastructure, and DevOps burden, and are exploring how a migrate their data and workloads to modern cloud based data platform to better meet their needs. Migrations and modernization can help accelerate big data projects and open new frontiers around data science and machine learning.

Sign-up for our webinar with Anand Venugopal, Migration Solutions Director at Databricks, to learn how best practices on evaluating cloud migration and data platform modernization. We'll cover topics like how technology components map from on-premise to cloud model, cost savings from the cost compute model, and new use cases enabled by modern data architectures.

Save Your Spot

Sincerely,
The Databricks Team

Wednesday, 11 September 2019

What happened to Hadoop? by @derrickharris via @Medium

It was the next big thing...until it wasn’t. Derrick Harris explains, “Hadoop’s path to ubiquity intersected a host of other technology shifts that as a whole would prove to be more impactful in the long run, in part by peeling off the most valuable promises of big data and making them more consumable.”

Definitely, a question that needed to be answered! Thank you Derrick for the great answer - to thank him give him applause and a follow.

Thursday, 27 September 2018

Hadoop for Beginners by Aafreen Dabhoiwala via @kdnuggets

An introduction to Hadoop, a framework that enables you to store and process large data sets in parallel and distributed fashion.

A nice little overview of Hadoop although I do agree with the first comment by Randy about relational databases

Friday, 8 December 2017

Backing Up Big Data? Chances Are You’re Doing It Wrong by @PeterSmails via @datanami

The increasing pervasiveness of social networking, multi-cloud applications and Internet of Things (IoT) devices and services continues to drive exponential growth in big data solutions.

I like that this contains two case studies and it makes perfect sense. You need to make sure that backups and recovery and always included in a big data project and that you treat it as a big data project not just a "normal" database.

Tuesday, 24 October 2017

Beyond Hadoop by James Ovendon via @iegroup

A company once synonymous with big data is on its way out, but what comes next?

Interesting. So people are starting to use alternate to Hadoop or using it for other reasons.

Thursday, 14 September 2017

277 Data Science Key Terms, Explained by Matthew Mayo via @kdnuggets

This is a collection of 277 data science key terms, explained with a no-nonsense, concise approach. Read on to find terminology related to Big Data, machine learning, natural language processing, descriptive statistics, and much more.

This links to lots of articles grouping the terms by their general classification for example deep learning or predictive analytics.

Tuesday, 20 June 2017

5 Reasons That Business Intelligence on Hadoop Projects Fails by Remy Rosenbaum via @DZone

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity.

Some great points by Remy. Make sure you read this if you need to do something like this.

Sunday, 22 January 2017

What Is Fog Computing? And Why It Matters In Our IoT And Big Data World by Vincent Stokes via @datafloq

Fog computing is a disruptive technology that adds another level of complexity to Cloud computing, but also offers greater efficiency and lower costs.

There are some great links to other articles from this one to help build up the picture of what it is a why it is needed.

Wednesday, 16 November 2016

Big Data Is All Relative—or Relational by Mike Azevedo via @infomgmt

But as we foam at the mouth over the next great revelation that will emerge from the Hadoop cluster, a new wave of cloud-enabled applications are testing the limits of our traditional relational database systems.

RDBMS are so ensconced into our systems that it has to be better to have new innovations able to use them rather than the expense of redeveloping every thing . Yes I admit things would work better and faster with the new databases, but sometimes there is not the time nor the resources to move the data.

Friday, 14 October 2016

WEBINAR: Gain Extreme Agility and Performance Using a Spark-free Approach to Data Management - 20 October 2016

Date: Thursday, October 20, 2016
Time: Noon ET/ 9:00 am PT
Duration: 60 minutes (including Q&A)

What You'll Learn

Businesses are clamoring to capture all data possible and harness it as a revenue driver. The challenge is bringing the data together. Companies that can capture and harness this data can benefit accordingly.

When it comes to data management in Hadoop, the architecture foundation makes all the difference for performance. Jake Dolezal shares his research into the performance of data quality and data management workloads on Hadoop clusters. Jake discusses a YARN-based approach to data management and outlines highly effective IT resource utilization techniques to achieve extreme agility for organizations and performance gains in Hadoop.

What You Will Learn:

• Learn an effective method for democratizing data access and business intelligence
• Understand what it takes to break through the traditional trade-offs in managing big data and achieve both agility and performance without the use of code-based languages like Spark or MapReduce
• Discover how to achieve performance in Hadoop that is 5.5x faster than Spark and 19x faster than MapReduce
• How to manage complex, high-volume data with identity and entity resolution in the most demanding applications, such as customer data quality

All attendees will receive a free copy of the report “Hadoop Data Integration Benchmark” published by MCG Global Services.

Presenters

Jake Dolezal,
Practice Lead,
McKnight Consulting Group Global Services

Todd Hinton,
Vice President of Product Strategy,
RedPoint Global

Thursday, 15 September 2016

New Research - We’re In the Middle of a Data Engineering Talent Shortage by @jakestein via @stitch_data

We’ve all become accustomed to hearing about the rising demand for data scientists, but according to the latest research, the real talent crisis lies in data engineering. This report explains where the gaps are and where things are expected to go.

This is very interesting and adds fuel to the facts that certain skills are essential. There is too much focus on becoming a Data Scientist, but anyone who is technical is probably much better off as a Data Engineer.

Thursday, 1 September 2016

Big Data, Big Growth, Big Promises by Jennifer Adams via @infomgmt

We expect non relational databases to be the fastest-growing sector within big data management solutions. We forecast that NoSQL will grow 25.0% and Hadoop will grow 32.9% annually over the forecast period.

Proof that relational databases are on the way out in they popularity stakes.

Wednesday, 31 August 2016

A Recipe for Cooking with the Hadoop Ecosystem by David Menninger via @infomgmt

The open source model has had a major impact on the big data market, yet in some ways, the open source approach has succeeded despite its shortcomings.

Open source is definitely here to stay.

Sunday, 28 August 2016

Data Partitioning in Big Data Application with Apache Hive by Vijay Aegis via CodeInnovationsBlog

Big data consulting company professionals are introducing the concept of partitioning in big data application. You need to read the post completely to understand how to do partitioning in such app using Apache Hive. If you don’t know how to do it, experts will help.

Useful blog.

Monday, 1 August 2016

WEBINAR: A Pragmatic Approach to Processing and Refining Big Data - 3 August 2016

A Pragmatic Approach to

Processing and Refining Big Data Wednesday, August 3, 2016 | 8 am PT/16:00 BST

Select Time:

First Name:

Last Name:

Company Name:

Role

Business Email

Business Phone

Country:

Intended Use:

Want to learn how to approach your Hadoop data processing and analytics projects without sacrificing governance and control?
In a big data world, business users need on-demand access to governed data sets on highly diverse sources, regardless of scale. By focusing on the right principles from both existing data warehousing approaches and emerging data lake use patterns, it is possible to drive automatic processing, refinement, and publishing of Hadoop data sets for immediate interactive analysis
Join this webinar to learn how:

Enterprise data warehouse and data lake design patterns fit today's analytic landscape

Organizations can approach Hadoop data processing and analytics without sacrificing governance and control

Pentaho provides an approach to delivering refined on-demand data marts to end users in a big data environment

Pentaho customer FINRA was able to leverage Pentaho's big data capabilities to rapidly accelerate fraud detection

Speaker: Ben Hopkins, Sr. Product Marketing Manager - Big Data, Pentaho

Register here

Friday, 15 July 2016

Big Data Vendors See the Internet of Things Opportunity by Paul Miller via @infomgmt

We're moving from expensive and specialist analytics teams towards an environment in which processes, workflows, and decision-making throughout an organisation can become usefully data-driven.

I look forward to his report. I think we can all see that IoT is the next big thing after Big Data.

Tuesday, 12 July 2016

5 Data Management Lessons from LinkedIn Acquisition by Manish Sood via @Data_Informed

Reltio CEO Manish Sood writes that Microsoft’s acquisition of LinkedIn reveals important truths about the nature of data management.

Sunday, 3 July 2016

Hadoop Security Issues and Best Practices by Marry Tho via @Analyticbridge

The big data blast has given rise to a host of information technology software and tools and abilities that enable companies to manage, capture, and analyse large data sets of unstructured and structure data for result oriented insights and competitive success. But with this latest technology comes the challenge of keeping confidential information secure and private.

Great list of issues that are worth reading and checking through.

Saturday, 25 June 2016

How to Capitalise on the Data Landscape of Tomorrow via @Data_Informed

How to Capitalise on the Data Landscape of Tomorrow by Marshall Daly @Data_Informed - Tableau’s Marshall Daly examines where organisations are storing their data, choices and innovations based on today’s business demands that are shaping the data landscape of tomorrow, and how organisations can build a data workflow to keep pace with that innovation.

Interesting.