Data: SQL

Showing posts with label SQL. Show all posts

Monday, 25 July 2022

Pivot Table Concepts by Derek Mortensen via @TDataScience

Report, Analyze, Tell Stories.

This was really interesting. Pivot tables tell us so many things and can be a great tool and so to be able to do them outside of Excel is great.

Friday, 22 April 2022

10 SQL Queries You Should Know as a Data Scientist by Uğur Savcı via @Medium

Learn the Most Used SQL Queries in 5 Minutes with Examples

You need to keep these somewhere so you can access them. I have in the past used text files in a directory or Evernote. It is really easy then to copy and edit the code.

Wednesday, 6 April 2022

101 DATA SCIENCE with Cheat Sheets (ML, DL, Scraping, Python, R, SQL, Maths & Statistics) by Anushka Bajpai via @Medium

Data Science is an ever-growing field, there are numerous tools & techniques to remember. It is not possible for anyone to remember all the functions, operations and formulas of each concept. That’s why we have cheat sheets and summaries. They help us access the most commonly needed reminders for making our Data Science journey fast and easy.

This is really like a one-stop-shop for cheatsheets - definitely worth a bookmark, a printout, adding to Evernote or whatever is your choice for preserving something important.

Wednesday, 2 March 2022

WEBINAR: Designing an Effective SQL Data Lakehouse - 10 March 2022

See Dremio Cloud in action and ask our product experts questions. Join Wednesdays at 10:00 AM PT. Register Here.

Designing an Effective SQL Data Lakehouse


	REGISTER

date

Thursday, March 10, 2022


		09 AM PT 12 PM ET

Speakers

Deepa Sankar
VP of Portfolio Marketing
Dremio

Kevin Petrie
Vice President of Research
Eckerson Group

Hi,

SQL data lakehouses are a growing trend among organizations looking to run SQL queries for analytics and BI directly on their cloud data lakes. What's needed for success with this cloud data architecture?

Learn how to design an effective SQL Data Lakehouse with experts Kevin Petrie, VP of Research at Eckerson Group, and Deepa Sankar VP Portfolio Marketing at Dremio. You will learn:

		Why organizations are adopting SQL data lakehouses

		7 must-have characteristics and architectural components

		How to build and execute a successful SQL data lakehouse strategy

Monday, 28 February 2022

6 Lesser-Known SQL Techniques to Save You 100 Hours a Month by @camwarrenm via @TDataScience

Use these simple techniques to make your analysis and data extracts easier.

I'm sure we all have our own library containing useful SQL code (I certainly do) and I think these could be added to supplement them.

Friday, 7 January 2022

Query Pandas DataFrame with SQL by Edwin Tan via @TDataScience

Can you use SQL in Pandas? Yes and this is how.

I have to admit having used SQL for years I am far more comfortable using that than any other method.

Monday, 18 October 2021

Aggregations on time-series data with Pandas by @OlegZero13 by @TDataScience

Python Pandas and SQL - time aggregations and syntax explained.

This is a great reminder of the syntax and helped me to remember some things I had obviously forgotten.

Monday, 17 May 2021

Practical SQL for Data Analysis by/via @be_haki

In this epic post, Haki Benita shows how to use SQL to perform fast and efficient data analysis. Pivot tables, subtotals, linear regression, binning, and interpolation can all be done with SQL and in many cases, that's the best approach. There's a lot of detail here and a linked index makes it easy to jump around.

I love SQL and I am so much more comfortable writing code in it. I can however see times when Python and Pandas would work better.

Wednesday, 28 April 2021

Working With Time Series Using SQL by Michael Grogan via @kdnuggets

This article is an overview of using SQL to manipulate time-series data.

This is nice and clear. Times are ok if you have enough of the right data and you really understand what you are doing. Pay particular attention to timezones and daylight saving. Also, consider the physical location of the data and what time that system or server is set up to be.

Wednesday, 13 January 2021

SQL vs NoSQL: 7 Key Takeaways by Alex Williams via @kdnuggets

People assume that NoSQL is a counterpart to SQL. Instead, it’s a different type of database designed for use-cases where SQL is not ideal. The differences between the two are many, although some are so crucial that they define both databases at their cores.

I enjoyed reading this thoughtful article. I think it helps to clear up some potential confusion and ensures that you really understand via his careful use of diagrams.

Monday, 5 October 2020

4 SQL Tips for Data Scientists and Data Engineers by @SeattleDataGuy via @BttrProgramming

Please, don’t average averages is the first tip he has for us.

These are really valuable insights and I completely agree with his observations. I love that he has given you code segments as well so there are no excuses for not understanding these. Some of these links seamlessly into basic rules of data analytics and make sure that you do not skew your results.

Friday, 18 September 2020

WEBINAR: Lakehouse: The future of cloud data platforms - 2 parts 22 and 29 September 2020

View in Browser

September 22nd & 29th

09:00 am - 11:00 am PT

Hi,

The Lakehouse pattern is emerging as the successor to the data warehouse and data lake because it combines the advantages of both - with none of the shortcomings.

What is Databricks' vision for Lakehouse? You're invited to a 2-part series where you'll learn what's driving the development of the Lakehouse pattern and Delta Lake.

Part I: Lakehouse: The New Approach to Managing Data

Part II: Leveraging Delta Lake for High-Performance SQL and Analytics

You'll learn:

The challenges facing managing data in the cloud
Why Databricks' vision for Lakehouse sets it apart
How Delta Lake and Delta Engine bring high-performance to SQL and analytics

REGISTER NOW

Monday, 27 May 2019

7 Steps to Mastering SQL for Data Science — 2019 Edition: by Matthew Mayo via @kdnuggets

Follow these updated 7 steps to go from SQL data science newbie to practitioner in a hurry. We consider only the necessary concepts and skills and provide quality resources for each.

Something that everyone who writes code against a data source needs to understand (but it is especially important for SQL code). Contains a great visual and links to further information.

Wednesday, 19 December 2018

What Is a Data Frame? (In Python, R, and SQL) by/via @oilshellblog

This post introduces data frames and shows how they work by solving the same problem three ways: without data frames, with data frames in Python and R, and in plain SQL.

I love this which allows you to compare and contrast the method across all three so that you can see the idea is the same but the implementation is different. Definitely worth a bookmark.

Monday, 17 December 2018

Git Your SQL Together (with a Query Library) by/via @beeonaposy via

Caitlin Hudon recommends tracking SQL queries in Git. Here she explains how she created a git repository for saving and sharing commonly (and uncommonly) used queries while tracking any changes made to these queries over time.

Good practice for sure. I either use Git or Google Drive. Either way it is good practice to save and keep records of SQL queries you have used.

Saturday, 5 May 2018

Presto for Data Scientists – SQL on anything by Kamil Bajda-Pawlikowski via @kdnuggets

Presto enables data scientists to run interactive SQL across multiple data sources. This open source engine supports querying anything, anywhere, and at large scale.

I have to agree with Kamil - download a free version of it and try it - I think you will be pleasantly surprised.

Thursday, 9 November 2017

Pig vs Hive vs SQL – Difference between the Big Data Tools by Manisha Nandy Mazumder via @Hadoop360

Great article comparing the three tools.

This is great for understanding the differences and which one might be best for you.

Saturday, 25 February 2017

Making Python Speak SQL with pandasql by/via via @YhatHQ

Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.

This is a great post and includes lots of code and examples - one for you to bookmark and sign up for his updates while you are there!!

Thursday, 15 September 2016

New Research - We’re In the Middle of a Data Engineering Talent Shortage by @jakestein via @stitch_data

We’ve all become accustomed to hearing about the rising demand for data scientists, but according to the latest research, the real talent crisis lies in data engineering. This report explains where the gaps are and where things are expected to go.

This is very interesting and adds fuel to the facts that certain skills are essential. There is too much focus on becoming a Data Scientist, but anyone who is technical is probably much better off as a Data Engineer.

Friday, 9 September 2016

How to Become a (Type A) Data Scientist by Ajit Jaokar,via @kdnuggets

This post outlines the difference between a Type A and Type B data scientist, and prescribes a learning path on becoming a Type A.

I found this really interesting.