A programming primer for data science

This section aims to:

  1. Present the core programming concepts, used in various statistics, econometrics and data science tasks.
  2. Provide examples of both R and Python.
  3. Provide a convenient interface to toggle between R and Python to see the similarities and differences between the two programming languages.
  4. Highlight some language-specific concepts.

Packages

In general packages are collections of various functions, classes, sample datasets and their documentation. Packages usually focus on a specific topic, for example, a package may focus on plot creation, while another may be created for estimating various (or only a specific subset of) statistical models. In other cases, packages may contain executables, such as shiny for R (and Python), or streamlit for Python.

Some packages are available with the base installation of the programming language, while others need to be explicitly downloaded and installed (see install.packages in R and pip in Python for more).

Packages need to be loaded only once per project. It is recommended to load all of the required packages at the beginning of your script/notebook file.

An example of loading some installed packages is provided below.

The library() function loads the specified library, if it exists. If such a library isn’t found - an error is raised.

Whenever we want to refer to an object from a specific library we can either:

  • Load the whole package and call a specific function:
library(ggplot2)
# A function from the `ggplot2` package
ggplot()
  • Without loading the package, we can use the :: notation (package_name::function_name) to call a specific function:
# A function from the `ggplot2` package
ggplot2::ggplot()

As noted, the second example works when we don’t want to load the package (but we still need to have it installed).

An important caveat: if multiple libraries have the same function names, the last library loaded will override any functions with the same names. If you only need a few functions from a specific library - it may be best to use the :: notation for those functions, intead of loading the whole library.

import pandas as pd
import plotnine as plt
import statsmodels.api as sm

The import statement searches for the specified module (or package) and then it binds the results of that search to a name. If no name is specified, then the assigned name is the same as the module name. If no such module/package is found, then an error is raised. (Note: see package and module definitions for more specifics.)

  • If the package is loaded, we can call specific functions/classes using the dot (.) notation:
import plotnine as plt
# A class from the `plotnine` package
plt.ggplot()
  • Alternatively, we can choose to only load specific modules:
from plotnine import ggplot
# A class from the `plotnine` package
ggplot()

Note that in Python functions are independent blocks of code that can be called from anywhere, while methods are tied to objects or classes and need an object or class instance to be invoked. For example: array() function in numpy and the fit() module for the OLS class in statsmodels

Additional packages can be found at the following repositories:

Operators

There are a number of operators available in R and Python, which are used in mathematical calculations, value comparisons and value assignments.

Operator type R Python
addition arithmetic x + y x + y
subtraction arithmetic x - y x - y
multiplication arithmetic x * y x * y
division arithmetic x / y x / y
exponentiation (\(x^y\)) arithmetic x^y (recommended) or x**y x**y
modulus (x mod y) arithmetic x %% y x % y
integer division arithmetic x %/% y x // y
matrix Multiplication arithmetic x %*% y x @ y
equal logical (comparison) x == y x == y
not equal logical (comparison) x != y x != y
(x) is less than (y) logical (comparison) x < y x < y
(x) is more than (y) logical (comparison) x > y x > y
(x) is less than or equal to (y) logical (comparison) x <= y x <= y
(x) is more than or equal to (y) logical (comparison) x >= y x >= y
(x) and (y) logical (comparison) x & y x and y or (elementwise) numpy.logical_and(x, y)
(x) or (y) logical (comparison) x | y x or y or (elementwise) numpy.logical_or(x, y)
not (x) logical (comparison) !x not x
containment test (which x values are in a set of y values) other x %in% y numpy.isin(x, y)
assign value assignment x <- 2 or x <<- 2 (global) or x = 2 x = 2
add y to x and assign to x assignment x <- x + y x += y or x = x + y
subtract y from x and assign to x assignment x <- x - y x -= y or x = x - y
multiply x by y and assign to x assignment x <- x * y x *= y or x = x * y
divide x by y and assign to x assignment x <- x / y x /= y or x = x / y
exponentiate \(x^y\) and assign to x assignment x <- x^y x **= y or x = x ** y

Note: we omit bitwise operators as they are less common in general data analysis and modelling.

A comprehensive list of operators is available in the Python documentation.

Data Types

There are a number of built-in (and library-specific) data types available in R and Python. Data types are used to represent specific values (or collections of values or objects) and have a pre-defined functionality for various operators.

Numbers

Numeric values can be values from \(\mathbb{Z}\) (integer), \(\mathbb{R}\) (real number) or \(\mathbb{C}\) (complex number) sets.

We use the assignment operator to assign values to variables:

x1 <- as.integer(1)
x2 <- 2
x3 <- complex(real = 3, imaginary = 1)
x4 <- 4 + 2i
x1 = 1
x2 = 2.0
x3 = complex(real = 3, imag = 1)
x4 = 4 + 2j

We can print these values using the print() function:

print(x1)
[1] 1
print(x2)
[1] 2
print(x3)
[1] 3+1i
print(x4)
[1] 4+2i
print(x1)
1
print(x2)
2.0
print(x3)
(3+1j)
print(x4)
(4+2j)

As well as check the types of our values:

typeof(x1)
[1] "integer"
typeof(x2)
[1] "double"
typeof(x3)
[1] "complex"
typeof(x4)
[1] "complex"
type(x1)
<class 'int'>
type(x2)
<class 'float'>
type(x3)
<class 'complex'>
type(x4)
<class 'complex'>

We can add, subtract, multiply and divide multiple values together:

x5 <- x1 + x2 + 3
x6 <- x3 + x4
x7 <- x1 - x2 - 5
x8 <- x3 - x4
x9 <- x1 * x2 * (-1)
x10 <- x3 * x4
x11 <- x1 / 2
x12 <- x3 / x4
x13 <- x2^2
x14 <- x4^3
x5 = x1 + x2 + 3
x6 = x3 + x4
x7 = x1 - x2 - 5
x8 = x3 - x4
x9 = x1 * x2 * (-1)
x10 = x3 * x4
x11 = x1 / 2
x12 = x3 / x4
x13 = x2**2
x14 = x4**3
print(x5)
[1] 6
print(x6)
[1] 7+3i
print(x7)
[1] -6
print(x8)
[1] -1-1i
print(x5)
6.0
print(x6)
(7+3j)
print(x7)
-6.0
print(x8)
(-1-1j)
print(x9)
[1] -2
print(x10)
[1] 10+10i
print(x11)
[1] 0.5
print(x12)
[1] 0.7-0.1i
print(x9)
-2.0
print(x10)
(10+10j)
print(x11)
0.5
print(x12)
(0.7-0.1j)
print(x13)
[1] 4
print(x14)
[1] 16+88i
print(x13)
4.0
print(x14)
(16+88j)

div and mod operations can also be carried out as follows:

x15 <- 5 %% 3
x16 <- 5 %/% 3
x15 = 5 % 3
x16 = 5 // 3
print(paste0("Remainder of a division (mod): ", x15))
[1] "Remainder of a division (mod): 2"
sprintf("Integer division (div): %02d", x16)
[1] "Integer division (div): 01"
print("Remainder of a division (mod): ", x15)
Remainder of a division (mod):  2
print(f"Integer division (div): {x16:02d}")
Integer division (div): 01

Here we use the 02d format notation to specify a two-digit integer format.

Text/Strings/Characters

Strings can be any combination of various symbols.

s1 <- "This is a sentence"
s2 <- "cat"
s3 <- "1"
s1 = "This is a sentence"
s2 = "cat"
s3 = "1"
print(s1)
[1] "This is a sentence"
print(s2)
[1] "cat"
print(s3)
[1] "1"
print(s1)
This is a sentence
print(s2)
cat
print(s3)
1

Unlike numbers, strings do not have a clear definition for mathematical operations1:

s3 + 1
Error in s3 + 1: non-numeric argument to binary operator
s3 + 1
can only concatenate str (not "int") to str

Nevertheless, we may need to modify various strings of characters in our data. To make this process easier, a number of functions are available in R and Python.

String transformations

Firstly, we may be interested in concatenating multiple strings together. We can do so as follows:

s4 <- paste(s1, s2, sep = ". ")
s5 <- paste(s3, s2)
s6 <- paste0(s3, s2)
s7 <- paste0(c(s1, s2, s3), collapse = "; ")
s4 = s1 + ". " + s2
s5 = s3 + " " + s2
s6 = s3 + s2
s7 = "; ".join([s1, s2, s3])
print(s4)
[1] "This is a sentence. cat"
print(s5)
[1] "1 cat"
print(s6)
[1] "1cat"
print(s7)
[1] "This is a sentence; cat; 1"
print(s4)
This is a sentence. cat
print(s5)
1 cat
print(s6)
1cat
print(s7)
This is a sentence; cat; 1

We may also want to change the capitalization of our text:

[1] "THIS IS A SENTENCE"
[1] "this is a sentence"
print(stringr::str_to_sentence(s2))
[1] "Cat"
print(stringr::str_to_title(s1))
[1] "This Is A Sentence"
print(s1.upper())
THIS IS A SENTENCE
print(s2.lower())
cat
print(s2.capitalize())
Cat
print(s1.title())
This Is A Sentence

We can also calculate the number of characters in our string:

[1] 18
[1] 3
print(len(s1))
18
print(len(s2))
3

We might be interested in extracting part of a string as follows:

print(substr(s1, start = 1, stop = 2))
[1] "Th"
print(substring(s1, first = 1, last = 4))
[1] "This"
print(substring(s1, first = 3, last = 4))
[1] "is"
print(substring(s1, first = 8))
[1] " a sentence"
print(s1[0:2])
Th
print(s1[:4])
This
print(s1[2:4])
is
print(s1[7:])
 a sentence

We may also wish to split a string into separate segments:

print(strsplit(s1, split = " "))
[[1]]
[1] "This"     "is"       "a"        "sentence"
print(strsplit(s1, split = "a"))
[[1]]
[1] "This is "  " sentence"
print(strsplit(s1, split = "is"))
[[1]]
[1] "Th"          " "           " a sentence"
print(s1.split(" "))
['This', 'is', 'a', 'sentence']
print(s1.split("a"))
['This is ', ' sentence']
print(s1.split("is"))
['Th', ' ', ' a sentence']

Regular expressions

A regular expression (regex) is a sequence of characters that specifies a pattern in text. We can use regular expresions to:

  • Check if a specific sequence exists in a string;
  • Replace a sequence with another one;
  • Capturing portions of the match as placeholders and using them.

Additional regex syntax options can be found at Python’s regex syntax docs, Python’s Regular Expression HOWTO docs, as well as R’s regex docs.

r1 <- "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 321, 11,2, 3"
r1 = "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 321, 11,2, 3"

We can check whether specific sequence of symbols exists in our string as follows:

print(grepl("with letters and", r1))
[1] FALSE
print(grepl("with letters And", r1))
[1] TRUE
import re
#
print(re.search('with letters and', r1))
None
print(re.search('with letters And', r1))
<re.Match object; span=(19, 35), match='with letters And'>
print(bool(re.search('with letters and', r1)))
False
print(bool(re.search('with letters And', r1)))
True

We can write more genral expressions using various special characters:

  • . (dot) - matches any character except a newline;
  • ^ (caret) - matches the start of the string;
  • $ (dollar sign) - matches the end of the string;
  • * (asterisk) - causes the resulting RE (regular expression) to match 0 or more repetitions of the preceding RE. For example ab* will match a followed by any zero or more repetitions of b, while .* will search for zero or more repetitions of any character;
  • + - causes the resulting RE to match 1 or more repetitions of the preceding RE. For example, ab+ will match a followed by any non-zero number of bs; it will not match just a;
  • ? - causes the resulting RE to match 0 or 1 repetitions of the preceding RE. For example, ab? will match either a or ab.
  • {m} - specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6} will match exactly six a characters, but not five.
  • {m,n} - causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible. For example, a{3,5} will match from 3 to 5 a characters. Omitting m specifies a lower bound of zero, and omitting n specifies an infinite upper bound. As an example, a{4,}b will match aaaab or a thousand a characters followed by a b, but not aaab;
  • [] - used to indicate a set of characters. For example, a set of characters [amk] will match a, m, or k. Ranges of characters can be indicated by giving two characters and separating them by a ‘-’, for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59;
  • () - matches whatever regular expression is inside the parentheses. For example, (abc) will match abc.
  • | - matches either one of two REs. For example, A|B (where A and B can be arbitrary REs), creates a regular expression that will match either A or B. Can also be used inside () to match part of an output. For example, a(bc|d)e will match either abce or ade.
  • \ (in Python) or \\ (in R) - escapes special characters. For example \- will allow to match the symbol -, same goes for \?, \+, \., \(, \[, etc.
  • We might want to capture the contents of one or more groups in () of the same number ordering. In Python we can \number (e.g. \1, \2, etc.), while in R we would use \\number (e.g. \\1, \\2, etc.). See the example at the end of this section.

More special characters are available in Python’s re docs and R’s regex docs

print(grepl("this", r1))
[1] TRUE
print(grepl("^this", r1))
[1] FALSE
print(grepl("^This", r1))
[1] TRUE
print(grepl("2$", r1))
[1] FALSE
print(grepl("[0-9]$", r1))
[1] TRUE
print(grepl("^This.*3$", r1))
[1] TRUE
print(bool(re.search('this', r1)))
True
print(bool(re.search('^this', r1)))
False
print(bool(re.search('^This', r1)))
True
print(bool(re.search("2$", r1)))
False
print(bool(re.search("[0-9]$", r1)))
True
print(bool(re.search("^This.*3$", r1)))
True
print(grepl("[:digit:]", r1))
[1] TRUE
print(grepl("numbers [0-9],", r1))
[1] FALSE
print(grepl("numbers [0-9]+,", r1))
[1] TRUE
print(grepl("[0-9]+.*[0-9].*[0-9]", r1))
[1] TRUE
print(grepl("^this", r1))
[1] FALSE
print(grepl("^This", r1))
[1] TRUE
print(bool(re.search('[:digit:]', r1)))
True
print(bool(re.search('numbers [0-9],', r1)))
False
print(bool(re.search('numbers [0-9]+,', r1)))
True
print(bool(re.search('[0-9]+.*[0-9].*[0-9]', r1)))
True
print(bool(re.search('^this', r1)))
False
print(bool(re.search('^This', r1)))
True

We can replace characters by substituting them with another set of characters:

print(gsub("[0-9]", "0", r1))
[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 000, 00,0, 0"
print(gsub("[0-9]+", "0", r1))
[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 0,0, 0"
print(gsub("[0-9]*", "0", r1))
[1] "0T0h0i0s0 0i0s0 0a0 0s0e0n0t0e0n0c0e0 0w0i0t0h0 0l0e0t0t0e0r0s0 0A0n0d0 0t0h0i0s0 0i0s0 0a0 0b0u0n0c0h0 0o0f0 0$0y0m0b0o0l0s0 0A0N0d0 0n0u0m0b0e0r0s0 0,0 0,0,0 0"
print(gsub("[0-9]$", "0", r1))
[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 321, 11,2, 0"
print(re.sub('[0-9]', '0', r1))
This is a sentence with letters And this is a bunch of $ymbols ANd numbers 000, 00,0, 0
print(re.sub('[0-9]+', '0', r1))
This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 0,0, 0
print(re.sub('[0-9]*', '0', r1))
0T0h0i0s0 0i0s0 0a0 0s0e0n0t0e0n0c0e0 0w0i0t0h0 0l0e0t0t0e0r0s0 0A0n0d0 0t0h0i0s0 0i0s0 0a0 0b0u0n0c0h0 0o0f0 0$0y0m0b0o0l0s0 0A0N0d0 0n0u0m0b0e0r0s0 00,0 00,00,0 00
print(re.sub('[0-9]$', '0', r1))
This is a sentence with letters And this is a bunch of $ymbols ANd numbers 321, 11,2, 0
print(gsub("\\$", "s", r1))
[1] "This is a sentence with letters And this is a bunch of symbols ANd numbers 321, 11,2, 3"
print(re.sub('\$', 's', r1))
This is a sentence with letters And this is a bunch of symbols ANd numbers 321, 11,2, 3
print(re.sub('\\$', 's', r1))
This is a sentence with letters And this is a bunch of symbols ANd numbers 321, 11,2, 3

We can also chain multiple substitutions:

print(gsub("[aA][nN][dD]", "and", gsub("\\$", "s", r1)))
[1] "This is a sentence with letters and this is a bunch of symbols and numbers 321, 11,2, 3"
print(re.sub("[aA][nN][dD]", "and", re.sub('\\$', 's', r1)))
This is a sentence with letters and this is a bunch of symbols and numbers 321, 11,2, 3

As well as search and replace specific repetitions of patterns:

print(gsub("[0-9]{1,2}", "0", r1))
[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 00, 0,0, 0"
print(gsub("[0-9]{2,3}", "0", r1))
[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 0,2, 3"
print(gsub("[0-9]{2}", "0", r1))
[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 01, 0,2, 3"
print(gsub("[0-9]{3}", "0", r1))
[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 11,2, 3"
print(re.sub('[0-9]{1,2}', '0', r1))
This is a sentence with letters And this is a bunch of $ymbols ANd numbers 00, 0,0, 0
print(re.sub('[0-9]{2,3}', '0', r1))
This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 0,2, 3
print(re.sub('[0-9]{2}', '0', r1))
This is a sentence with letters And this is a bunch of $ymbols ANd numbers 01, 0,2, 3
print(re.sub('[0-9]{3}', '0', r1))
This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 11,2, 3

Finally, we can capture portions of text and re-use them. For example, we might want to midify everything else except a specific portion of text:

print(gsub("^(This).*", "\\1 !", r1))
[1] "This !"
print(re.sub('^(This).*', r'\1 !', r1))
This !
print(gsub("^([a-zA-Z]+) .*([0-9])$", "\\1 -> \\2", r1))
[1] "This -> 3"
print(re.sub('^([a-zA-Z]+) .*([0-9])', r'\1 -> \2', r1))
This -> 3
A tip on using regular expressions

Regular expressions can be confusing at times (e.g. you might write a complex regular expression and later, after a couple of weeks, forget how it worked). Fortunately, there are various online resources (such as regexr.com) that provide helpful highlighting for specific parts of regular expressions. Furthermore, you can try to split a single larger regular expression into multiple smaller ones and carry out text cleaning/replacement in multiple lines of code, instead of one single (and complex) expression.

Boolean values

Boolean values can only have two values - true (sometimes also represented by 1) or flase (sometimes also represented by 0).

b1 <- TRUE
b2 <- FALSE
x1 <- 4
x2 <- 1
b1 = True
b2 = False
x1 = 4
x2 = 1

We can print the values and their logical negations:

print(b1)
[1] TRUE
print(!b1)
[1] FALSE
print(b2)
[1] FALSE
print(!b2)
[1] TRUE
print(b1)
True
print(not b1)
False
print(b2)
False
print(not b2)
True

If we perform numeric operations, then the true/false values are treated as numeric 1/0:

print(b1 + b1)
[1] 2
print(b2 + b2)
[1] 0
print(b1 + b2)
[1] 1
print(b2 - b1)
[1] -1
print(b1 + b1)
2
print(b2 + b2)
0
print(b1 + b2)
1
print(b2 - b1)
-1

We can also use logical operators:

print(b1 & b2)
[1] FALSE
print(b1 | b2)
[1] TRUE
print(x1 < x2)
[1] FALSE
print(x1 > x2)
[1] TRUE
print(b1 and b2)
False
print(b1 or b2)
True
print(x1 < x2)
False
print(x1 > x2)
True
print(x1 == x2)
[1] FALSE
print(1 == TRUE)
[1] TRUE
print(0 == FALSE)
[1] TRUE
print(x1 == x2)
False
print(1 == True)
True
print(0 == False)
True

We can also chain multiple logical operators:

print((x1 > 8) & (x2 <= 1))
[1] FALSE
print((x1 > 8) | (x2 <= 1))
[1] TRUE
print((x1 > 8) and (x2 <= 1))
False
print((x1 > 8) or (x2 <= 1))
True

Finally, we can cast numeric values to boolean ones and vice versa (note the difference for:

[1] TRUE
[1] FALSE
[1] TRUE
[1] TRUE
print(bool(1))
True
print(bool(0))
False
print(bool(100))
True
print(bool(-100))
True

Special values

Special values are usually reserved to represent missing or undefined values.

A special values in Python are unavailable in the base installation but are defined in the numpy and pandas packages:

import numpy as np
import pandas as pd

The NA (not available)

Used to define missing values.

Arithmetic operators are undefined for NA values:

print(NA)
[1] NA
print(NA + NA)
[1] NA
print(NA - NA)
[1] NA
print(NA + 1)
[1] NA
print(NA - 1)
[1] NA
print(NA * 0)
[1] NA
print(pd.NA)
<NA>
print(pd.NA + pd.NA)
<NA>
print(pd.NA - pd.NA)
<NA>
print(pd.NA + 1)
<NA>
print(pd.NA - 1)
<NA>
print(pd.NA * 0)
<NA>

On the other hand, some of the logical operators have different results, depending on the logical conclusion of the comparison:

See the Python tab for a discussion on the differences between the and operator and the & operator when dealing with missing values.

In Python the and operator is not the same as the & operator. The and operator in Python cannot be overridden, whereas the & operator (also __and__) can. Hence the choice the use & in numpy and pandas packages.

print(pd.NA or pd.NA)
boolean value of NA is ambiguous
print(pd.NA and True)
boolean value of NA is ambiguous

x and y triggers the evaluation of bool(x) and bool(y), if x evaluates to false, then the value of bool(y) is returned. If x is a vector (i.e. contains multiple values) or NA, then its true/false value cannot be determined.

print(NA == NA)
[1] NA
print(NA | NA)
[1] NA
print(NA & NA)
[1] NA
print(pd.NA == pd.NA)
<NA>
print(pd.NA | pd.NA)
<NA>
print(pd.NA & pd.NA)
<NA>
print(NA | TRUE)
[1] TRUE
print(NA | FALSE)
[1] NA
print(pd.NA | True)
True
print(pd.NA | False)
<NA>
print(NA & TRUE)
[1] NA
print(NA & FALSE)
[1] FALSE
print(pd.NA & True)
<NA>
print(pd.NA & False)
False

We also have a number of functions defined in order to check if it is a special value:

[1] FALSE
[1] FALSE
[1] TRUE
[1] FALSE
print(pd.isnull(pd.NA))
True
print(np.isnan(pd.NA))
<NA>
print(pd.isna(pd.NA))
True
print(np.isinf(pd.NA))
<NA>

The NaN (not a number)

Any numeric calculations with an undefined result. In general, a division by zero is undefined, however this ambiguity is presented differently in R and Python:

print(1 / 0)
[1] Inf
print(0 / 0)
[1] NaN
print(1 / 0)
division by zero
print(0 / 0)
division by zero
print(1 / np.float64(0))
inf

<string>:1: RuntimeWarning: divide by zero encountered in scalar divide
print(np.float64(0) / np.float64(0))
nan

<string>:1: RuntimeWarning: invalid value encountered in scalar divide
[1] FALSE
[1] TRUE
print(is.na(NaN))
[1] TRUE
[1] FALSE
print(pd.isnull(np.NaN))
True
print(np.isnan(np.NaN))
True
print(pd.isna(np.NaN))
True
print(np.isinf(np.NaN))
False

The Inf (infinite)

Infinite values are also represented as special values:

print(1e500)
[1] Inf
print(Inf + Inf)
[1] Inf
print(Inf - Inf)
[1] NaN
print(Inf / Inf)
[1] NaN
print(1e500)
inf
print(np.Inf + np.Inf)
inf
print(np.Inf - np.Inf)
nan
print(np.Inf / np.Inf)
nan
[1] FALSE
[1] FALSE
print(is.na(Inf))
[1] FALSE
[1] TRUE
[1] TRUE
print(pd.isnull(np.Inf))
False
print(np.isnan(np.Inf))
False
print(pd.isna(np.Inf))
False
print(np.isinf(np.Inf))
True
print(np.isinf(1e500))
True

The NULL/None (undefined)

Note that undefined values are treated differently in R and Python:

print(is.null(NULL))
[1] TRUE
print(is.nan(NULL))
logical(0)
print(is.na(NULL))
logical(0)
logical(0)
print(pd.isnull(None))
True
print(np.isnan(None))
ufunc 'isnan' not supported for the input types, and the inputs could not be
safely coerced to any supported types according to the casting rule ''safe''
print(pd.isna(None))
True
print(np.isinf(None))
ufunc 'isinf' not supported for the input types, and the inputs could not be
safely coerced to any supported types according to the casting rule ''safe''

  1. With the exception of the + operator for two strings in Python.↩︎