This post updates a previous very popular post 50+ Data Science, Machine Learning Cheat Sheets by Bhavya Geethika. If we missed some popular cheat sheets, add them in the comments below.
Cheatsheets on Python, R and Numpy, Scipy, Pandas
SciPy Cheat Sheet SciPy is another package that is essential for scientific computing and a great one to pick up once you master NumPy. It provides mathematical algorithms and convenience functions that are built on NumPy. NumPy API Other Cheat Sheets Summary. In this tutorial, you discovered the key functions for linear algebra that you may find useful as a machine learning practitioner. Are there other key linear algebra functions that you use or know of? Let me know in the comments below. Do you have any questions? Contribute to vatsal30/Data-Science-Cheat-Sheet development by creating an account on GitHub. Data Science: NumPy Basics Cheat Sheet. M atplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.
Data science is a multi-disciplinary field. Thus, there are thousands of packages and hundreds of programming functions out there in the data science world! An aspiring data enthusiast need not know all. A cheat sheet or reference card is a compilation of mostly used commands to help you learn that language’s syntax at a faster rate. Here are the most important ones that have been brainstormed and captured in a few compact pages.
Mastering Data science involves understanding of statistics, mathematics, programming knowledge especially in R, Python & SQL and then deploying a combination of all these to derive insights using the business understanding & a human instinct—that drives decisions.
Here are the cheat sheets by category:
Cheat sheets for Python:
Python is a popular choice for beginners, yet still powerful enough to back some of the world’s most popular products and applications. It's design makes the programming experience feel almost as natural as writing in English. Python basics or Python Debugger cheat sheets for beginners covers important syntax to get started. Community-provided libraries such as numpy, scipy, sci-kit and pandas are highly relied on and the NumPy/SciPy/Pandas Cheat Sheet provides a quick refresher to these.
Cheat sheets for R:
The R's ecosystem has been expanding so much that a lot of referencing is needed. The R Reference Card covers most of the R world in few pages. The Rstudio has also published a series of cheat sheets to make it easier for the R community. The data visualization with ggplot2 seems to be a favorite as it helps when you are working on creating graphs of your results.
At cran.r-project.org:
At Rstudio.com:
Others:
Cheat sheets for MySQL & SQL:
For a data scientist basics of SQL are as important as any other language as well. Both PIG and Hive Query Language are closely associated with SQL- the original Structured Query Language. SQL cheatsheets provide a 5 minute quick guide to learning it and then you may explore Hive & MySQL!
Cheat sheets for Spark, Scala, Java:
Apache Spark is an engine for large-scale data processing. For certain applications, such as iterative machine learning, Spark can be up to 100x faster than Hadoop (using MapReduce). The essentials of Apache Spark cheatsheet explains its place in the big data ecosystem, walks through setup and creation of a basic Spark application, and explains commonly used actions and operations.
Cheat sheets for Hadoop & Hive:
Hadoop emerged as an untraditional tool to solve what was thought to be unsolvable by providing an open source software framework for the parallel processing of massive amounts of data. Explore the Hadoop cheatsheets to find out Useful commands when using Hadoop on the command line. A combination of SQL & Hive functions is another one to check out.
Cheat sheets for web application framework Django:
Django is a free and open source web application framework, written in Python. If you are new to Django, you can go over these cheatsheets and brainstorm quick concepts and dive in each one to a deeper level.
Cheat sheets for Machine learning:
We often find ourselves spending time thinking which algorithm is best? And then go back to our big books for reference! These cheat sheets gives an idea about both the nature of your data and the problem you're working to address, and then suggests an algorithm for you to try.
Cheat sheets for Matlab/Octave
MATLAB (MATrix LABoratory) was developed by MathWorks in 1984. Matlab d has been the most popular language for numeric computation used in academia. It is suitable for tackling basically every possible science and engineering task with several highly optimized toolboxes. MATLAB is not an open-sourced tool however there is an alternative free GNU Octave re-implementation that follows the same syntactic rules so that most of coding is compatible to MATLAB.
Cheat sheets for Cross Reference between languages
Related:
Little time to learn NumPy? This article shows you the ten most amazing NumPy cheat sheets. Download them, print them, and pin them to your wall — and watch your data science skills grow! 🐱🏍
All NumPy cheat sheets in this article are 100% free. All links open in a new tab (so feel free to click all links without worrying about losing this page).
Here’s a quick summary if you don’t have time reading all cheat sheets:
Here’s a quick download for you: I created this cheating sheet to explain some important NumPy concepts to my coding students.
NumPy is a widely used Python scientific computing package. It simplifies linear algebra, matrix computations, and speeds up data analysis. Knowing NumPy is a prerequisite for other Python packages like pandas or Scikit-Learn.
This article should serve as the ultimate NumPy reference. The cheat sheets are diverse and range from one page to multiple pages. They also involve cross-language comparison cheat sheets. While some resources are great beginner’s references, others are involved and require high-level expertise.
DataCamp is an online platform that offers data science training through videos and coding exercises. This cheat sheet is one of the most comprehensive one-page cheat sheets available. In a way, it adds to the previous cheat sheet with more examples and more functions. It is a good summary of creating arrays and basic array design. This cheat sheet provides functions for the specific datatypes. At the end of the sheet is more advanced stuff like slicing and indexing. There are also some introductory tools for data analysis and array manipulation. Though overall, this is a fantastic resource, the one drawback is the color palette. The bright orange is a distraction from the content. If you like the color palette, this could be your comprehensive go-to list of the NumPy basics.
This is a useful resource for the NumPy basics. It provides a summary of creating arrays and some basic operations. It is minimalistic, with a good overview of many basic functions. The sheet is divided into sections with headers for easier orientation. On the left-hand side of the sheet, the NumPy import convention is mentioned import numpy as np
. A one-line explanation follows each function. The biggest advantage of this list is good readability. This enables a quick search for the right function.
The cheat sheet is divided into four parts. The first part goes into details about NumPy arrays, and some useful functions like or finding the number of dimensions. The 2nd part focuses on slicing and indexing, and it provides some delightful examples of Boolean indexing. The last two columns are a little bit disconnected. They provide a wide range of functions, ranging from matrix operations like transpose to sorting an array. However, the last two columns are not necessarily grouped conveniently. The advantage of this sheet is that it also includes Boolean and not only the numerical types.
Dataquest is a similar online platform to DataCamp. It offers a variety of data science tracks and lessons, followed by coding exercises. This is another good resource of the most important NumPy functions and properties. The cheat sheet is readable with distinct sections, and each section has a clear title. Besides the sheet organization and excellent readability, it provides a range of functions and operations. Also, compared to the previous two cheat sheets, there is a math and statistics section. It divides the math sections into scalar and vector math, and there is a statistics section at the bottom.
If you are a Matlab user and need a quick introduction to Python and NumPy, this could be your go-to. The sheet contains three columns – the first column is the Matlab/Octave, the second column are the Python and NumPy equivalents, and the third is a description column. The sheet’s focus is not solely on NumPy, but there are many Python basics listed. Since it is not a single sheet, the content is organized into separate sections. It provides math, logical and boolean operators, roots and round offs, complex numbers, extensive linear algebra, reshaping and indexing, some basic plots, calculus, and statistics.
This cheat sheet provides the equivalents for four different languages – MATLAB/Octave, Python and NumPy, R, and Julia. The list is not a single PDF sheet, but it is a scrollable document. On each far left-hand and the right-hand side of the document, there are task descriptions. This is an extensive sheet, and it is extra useful because the output of each task is given. The sheet covers creating and designing of matrices, matrix shape manipulation, and some basic and more advanced matrix operations. The advanced section is particularly interesting because it lists many useful functions in data analysis, like finding a covariance and eigenvalues and creating random normally distributed variables.
This is the most comprehensive sheet on the list. Not only that includes side-to-side equivalents between MATLAB, R, NumPy, and Julia; and it also covers everything from functions and syntax, to loops and I/O. The most interesting and useful component is that certain lines like a function definition are given for MATLAB, R, and Julia, but not for NumPy because of the lack of that functionality. That makes it easy to compare and contrast and to find the best fit for a project.
Although there are other comparison cheat sheets in this collection, this one lists some advanced features. As the title says, it is a comparison between R(and S-plus) and NumPy. It is very detailed for each family of operations. For example, the sorting section provides eight ways to sort an array. Some operations are not possible in both languages, so it is easy to find the right function. This is the only cheat sheet in the collection that provides detailed plots and graphs. Moreover, some advanced math and statistics were given, like differential equations and Fourier analysis.
This is not a NumPy specific sheet. It covers many Python data science topics, but also some Python basics. It is easily navigated through because of the contents given in the beginning. The NumPy section is comprehensive. It covers NumPy basics like the array properties and operations. Also, it contains an extensive list of math functions and linear algebra functions. Some of the useful linear algebra functions are finding inner and outer products and eigenvalues. Others are functions for rounding off and generating random variables.
The Finxter cheat sheet is different from all the previously mentioned sheets because it’s visually the clearest. It gives a detailed description of each function and lists the examples along with the outcome. The good thing about the visible outcome is that looking at it can help if you’re unsure about the name of the function. Along with the cheat sheet, there is an accompanying video with further detailed examples and explanations.
xtensor is a C++ library, similar to NumPy, made for numerical analysis. The cheat sheet provides a two-column view, where the first column is NumPy, and the second column contains the xtensor equivalents. The sheet focuses on array initialization, reshaping, and slicing functions. Further, it continues with array manipulation like transpose or rotation functions. There are a lot of tensor operations, but the sheet is missing the descriptions. So, it’s not always easily deducible what a certain function does.
This article is contributed by Finxter user Milica Cvetkovic. Milica is also a writer on Medium — check out her profile.
A thorough understanding of the NumPy basics is an important part of any data scientist’s education. NumPy is at the heart of many advanced machine learning and data science libraries such as Pandas, TensorFlow, and Scikit-learn.
If you struggle with the NumPy library — fear not! Become a NumPy professional in no time with our new coding textbook “Coffee Break NumPy”. It’s not only a thorough introduction into the NumPy library that will increase your value to the marketplace. It’s also fun to go through the large collection of code puzzles in the book.
Related Articles: