This is a blog containing data related news and information that I find interesting or relevant. Links are given to original sites containing source information for which I can take no responsibility. Any opinion expressed is my own.
Monday, 3 October 2022
Monday, 29 August 2022
Seven Killer Memory Optimization Techniques Every Pandas User Should Know by Avi Chawla via @TDataScience
Monday, 16 May 2022
How To Convert Pandas DataFrame Into NumPy Array by Giorgos Myrianthous via @TDataScience
Converting a pandas DataFrame into a NumPy array.
A great guide which shows just how easy it can be to do that conversion.
Wednesday, 11 May 2022
How to Use Wikipedia as a Data Source by Alan Jones via @TDataScience
How to load information from Wikipedia into Pandas by finding the best team in the English Premier League.
I like that this worked example could be used to do all sorts of things.
Monday, 7 March 2022
D-Tale: One of the Best Python Libraries You Have Ever Seen by Ismael Araujo via @TDataScience
Here is his take on this must-have Python library and why you should give it a try.
I like this - it looks incredibly easy to use and very intuitive. Definitely, one to add to your list of very useful Python libraries.
Monday, 14 February 2022
Top 10 Pandas Functions for Preparing Data by Holly Dalligan via @BttrProgramming
Because she wanted to create useful, accurate analysis with as little work as possible.
I found this really interesting and it looks very useful too - data preparation if done well can help you to produce much better results from your analysis,
Wednesday, 2 February 2022
Good-bye Pandas! Meet Terality — Its Evil Twin With Identical Syntax by Bekhruz (Bex) Tuychiev via @TDataScience
… but up to 30 times faster.
This looks very interesting to experiment and play with.
Friday, 7 January 2022
Query Pandas DataFrame with SQL by Edwin Tan via @TDataScience
Can you use SQL in Pandas? Yes and this is how.
I have to admit having used SQL for years I am far more comfortable using that than any other method.
Monday, 3 January 2022
A new Era of SPARK and PANDAS Unification by MA Raza, Ph.D. via @AnalyticsVidhya
Pyspark and Pandas. A PRACTICAL GUIDE: SPARK 3.2.0 A new Era of SPARK and PANDAS Unification Pyspark and Pandas
I found this very interesting even though I didn't understand all of it.
Monday, 18 October 2021
Aggregations on time-series data with Pandas by @OlegZero13 by @TDataScience
Python Pandas and SQL - time aggregations and syntax explained.
This is a great reminder of the syntax and helped me to remember some things I had obviously forgotten.
Monday, 27 September 2021
Differences Between concat(), merge() and join() with Python by Amit Chauhan via @towards_AI
Combining data frames in Pandas.
This looks very useful and worth a bookmark, an addition to Evernote or printout.
Monday, 20 September 2021
5 Common Excel Tasks Simplified with Python by Frank Andrade via @TDataScience
Using Pandas, OS, and the DateTime module to simplify Excel tasks with Python.
I really like this which can be used to make yourself easier as well as a great learning experience.
Monday, 5 July 2021
9 Useful Pandas Methods You Might Have Not Heard About by Eryk Lewinson via @TDataScience
They can make your daily work easier and faster.
I really like that it contains code segments and that you can see what it can really do. I would suggest doing a bit of playing around and try them - some of them might be able to save you some time or even give you a better result.
Monday, 28 June 2021
All You Need to Know About Pandas Cut and Qcut Functions by @snr14 via @TDataScience
What exactly is the difference between them?
I found this interesting and think I would use this in the future.
Friday, 21 May 2021
400x times faster Pandas Data Frame Iteration by Satyam Kumar via @TDataScience
Avoid using iterrows() function.
I liked his conclusion and he makes some good points - just adding a dictionary is probably a quick and easy change to code which can make a big difference whilst avoiding the reworking of code.
Saturday, 8 May 2021
3 Python Pandas Tricks for Efficient Data Analysis by @snr14 via TDataScience
Explained with examples. Pandas is one of the predominant data analysis tools.
Some handy hints in Python that may fix some minor issues in your code that you hadn't realised could be fixed so easily.
Friday, 16 April 2021
Top 10 Python Libraries Data Scientists should know in 2021 by Terence Shin via @kdnuggets
So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.
I found this useful and it gave me a great reminder of what I can/should use for libraries now.
Wednesday, 14 April 2021
Pandas Basics Cheat Sheet (2021) by Christopher Zita in @TDataScience
The absolute basics for beginners learning Pandas in 2021.
This is SO useful and a great reminder for you even if you know how to do the basics included in this cheat sheet - definitely one to save and refer back to.
Monday, 15 March 2021
Are You Still Using Pandas to Process Big Data in 2021? Here are two better options by Roman Orec via @kdnuggets
When its time to handle a lot of data -- so much that you are in the realm of Big Data -- what tools can you use to wrangle the data, especially in a notebook environment? Pandas does not handle really Big Data very well, but two other libraries do. So, which one is better and faster?
These are some great suggestions and well worth an experiment as you may find if you benchmark against all of them (including Pandas) that you find something much better which will be to your advantage.
Wednesday, 3 February 2021
Use Python to Value a Stock Automatically by Bohmian via @dd_invest
Is Apple Stock Overvalued? Just Enter the Ticker and Let Python decide Automatically!
This might be useful, but at the very least is a good exercise in coding and processing financial data. Contains code and examples.