A programming primer for data science

This section aims to:

Present the core programming concepts, used in various statistics, econometrics and data science tasks.
Provide examples of both R and Python.
Provide a convenient interface to toggle between R and Python to see the similarities and differences between the two programming languages.
Highlight some language-specific concepts.

Packages

In general packages are collections of various functions, classes, sample datasets and their documentation. Packages usually focus on a specific topic, for example, a package may focus on plot creation, while another may be created for estimating various (or only a specific subset of) statistical models. In other cases, packages may contain executables, such as shiny for R (and Python), or streamlit for Python.

Some packages are available with the base installation of the programming language, while others need to be explicitly downloaded and installed (see install.packages in R and pip in Python for more).

Packages need to be loaded only once per project. It is recommended to load all of the required packages at the beginning of your script/notebook file.

An example of loading some installed packages is provided below.

R
Python

suppressPackageStartupMessages({
  suppressWarnings({
    library(magrittr)
    library(data.table)
    library(ggplot2)
  })
})

The library() function loads the specified library, if it exists. If such a library isn’t found - an error is raised.

Whenever we want to refer to an object from a specific library we can either:

Load the whole package and call a specific function:

library(ggplot2)
# A function from the `ggplot2` package
ggplot()

Without loading the package, we can use the :: notation (package_name::function_name) to call a specific function:

# A function from the `ggplot2` package
ggplot2::ggplot()

As noted, the second example works when we don’t want to load the package (but we still need to have it installed).

An important caveat: if multiple libraries have the same function names, the last library loaded will override any functions with the same names. If you only need a few functions from a specific library - it may be best to use the :: notation for those functions, intead of loading the whole library.

import pandas as pd
import plotnine as plt
import statsmodels.api as sm

The import statement searches for the specified module (or package) and then it binds the results of that search to a name. If no name is specified, then the assigned name is the same as the module name. If no such module/package is found, then an error is raised. (Note: see package and module definitions for more specifics.)

If the package is loaded, we can call specific functions/classes using the dot (.) notation:

import plotnine as plt
# A class from the `plotnine` package
plt.ggplot()

Alternatively, we can choose to only load specific modules:

from plotnine import ggplot
# A class from the `plotnine` package
ggplot()

Note that in Python functions are independent blocks of code that can be called from anywhere, while methods are tied to objects or classes and need an object or class instance to be invoked. For example: array() function in numpy and the fit() module for the OLS class in statsmodels

Additional packages can be found at the following repositories:

Comprehensive R Archive Network (CRAN) for R (see also the full list of packages);
Python Package Index (PyPI) for Python.

Operators

There are a number of operators available in R and Python, which are used in mathematical calculations, value comparisons and value assignments.

Operator	type	R	Python
addition	arithmetic	`x + y`	`x + y`
subtraction	arithmetic	`x - y`	`x - y`
multiplication	arithmetic	`x * y`	`x * y`
division	arithmetic	`x / y`	`x / y`
exponentiation ($x^y$)	arithmetic	`x^y` (recommended) or `x**y`	`x**y`
modulus (x mod y)	arithmetic	`x %% y`	`x % y`
integer division	arithmetic	`x %/% y`	`x // y`
matrix Multiplication	arithmetic	`x %*% y`	`x @ y`
equal	logical (comparison)	`x == y`	`x == y`
not equal	logical (comparison)	`x != y`	`x != y`
(x) is less than (y)	logical (comparison)	`x < y`	`x < y`
(x) is more than (y)	logical (comparison)	`x > y`	`x > y`
(x) is less than or equal to (y)	logical (comparison)	`x <= y`	`x <= y`
(x) is more than or equal to (y)	logical (comparison)	`x >= y`	`x >= y`
(x) and (y)	logical (comparison)	`x & y`	`x and y` or (elementwise) `numpy.logical_and(x, y)`
(x) or (y)	logical (comparison)	`x \| y`	`x or y` or (elementwise) `numpy.logical_or(x, y)`
not (x)	logical (comparison)	`!x`	`not x`
containment test (which x values are in a set of y values)	other	`x %in% y`	`numpy.isin(x, y)`
assign value	assignment	`x <- 2` or `x <<- 2` (global) or `x = 2`	`x = 2`
add `y` to `x` and assign to `x`	assignment	`x <- x + y`	`x += y` or `x = x + y`
subtract `y` from `x` and assign to `x`	assignment	`x <- x - y`	`x -= y` or `x = x - y`
multiply `x` by `y` and assign to `x`	assignment	`x <- x * y`	`x = y` or `x = x y`
divide `x` by `y` and assign to `x`	assignment	`x <- x / y`	`x /= y` or `x = x / y`
exponentiate $x^y$ and assign to `x`	assignment	`x <- x^y`	`x = y` or `x = x y`

Note: we omit bitwise operators as they are less common in general data analysis and modelling.

A comprehensive list of operators is available in the Python documentation.

Data Types

There are a number of built-in (and library-specific) data types available in R and Python. Data types are used to represent specific values (or collections of values or objects) and have a pre-defined functionality for various operators.

Numbers

Numeric values can be values from $\mathbb{Z}$ (integer), $\mathbb{R}$ (real number) or $\mathbb{C}$ (complex number) sets.

We use the assignment operator to assign values to variables:

R
Python

x1 <- as.integer(1)
x2 <- 2
x3 <- complex(real = 3, imaginary = 1)
x4 <- 4 + 2i

x1 = 1
x2 = 2.0
x3 = complex(real = 3, imag = 1)
x4 = 4 + 2j

We can print these values using the print() function:

R
Python

print(x1)

[1] 1

print(x2)

[1] 2

print(x3)

[1] 3+1i

print(x4)

[1] 4+2i

print(x1)

print(x2)

2.0

print(x3)

(3+1j)

print(x4)

(4+2j)

As well as check the types of our values:

R
Python

typeof(x1)

[1] "integer"

typeof(x2)

[1] "double"

typeof(x3)

[1] "complex"

typeof(x4)

[1] "complex"

type(x1)

<class 'int'>

type(x2)

<class 'float'>

type(x3)

<class 'complex'>

type(x4)

<class 'complex'>

We can add, subtract, multiply and divide multiple values together:

R
Python

x5 <- x1 + x2 + 3
x6 <- x3 + x4
x7 <- x1 - x2 - 5
x8 <- x3 - x4
x9 <- x1 * x2 * (-1)
x10 <- x3 * x4
x11 <- x1 / 2
x12 <- x3 / x4
x13 <- x2^2
x14 <- x4^3

x5 = x1 + x2 + 3
x6 = x3 + x4
x7 = x1 - x2 - 5
x8 = x3 - x4
x9 = x1 * x2 * (-1)
x10 = x3 * x4
x11 = x1 / 2
x12 = x3 / x4
x13 = x2**2
x14 = x4**3

R
Python

print(x5)

[1] 6

print(x6)

[1] 7+3i

print(x7)

[1] -6

print(x8)

[1] -1-1i

print(x5)

6.0

print(x6)

(7+3j)

print(x7)

-6.0

print(x8)

(-1-1j)

R
Python

print(x9)

[1] -2

print(x10)

[1] 10+10i

print(x11)

[1] 0.5

print(x12)

[1] 0.7-0.1i

print(x9)

-2.0

print(x10)

(10+10j)

print(x11)

0.5

print(x12)

(0.7-0.1j)

R
Python

print(x13)

[1] 4

print(x14)

[1] 16+88i

print(x13)

4.0

print(x14)

(16+88j)

div and mod operations can also be carried out as follows:

R
Python

x15 <- 5 %% 3
x16 <- 5 %/% 3

x15 = 5 % 3
x16 = 5 // 3

R
Python

print(paste0("Remainder of a division (mod): ", x15))

[1] "Remainder of a division (mod): 2"

sprintf("Integer division (div): %02d", x16)

[1] "Integer division (div): 01"

print("Remainder of a division (mod): ", x15)

Remainder of a division (mod):  2

print(f"Integer division (div): {x16:02d}")

Integer division (div): 01

Here we use the 02d format notation to specify a two-digit integer format.

Text/Strings/Characters

Strings can be any combination of various symbols.

R
Python

s1 <- "This is a sentence"
s2 <- "cat"
s3 <- "1"

s1 = "This is a sentence"
s2 = "cat"
s3 = "1"

R
Python

print(s1)

[1] "This is a sentence"

print(s2)

[1] "cat"

print(s3)

[1] "1"

print(s1)

This is a sentence

print(s2)

cat

print(s3)

Unlike numbers, strings do not have a clear definition for mathematical operations¹:

R
Python

s3 + 1

Error in s3 + 1: non-numeric argument to binary operator

s3 + 1

can only concatenate str (not "int") to str

Nevertheless, we may need to modify various strings of characters in our data. To make this process easier, a number of functions are available in R and Python.

String transformations

Firstly, we may be interested in concatenating multiple strings together. We can do so as follows:

R
Python

s4 <- paste(s1, s2, sep = ". ")
s5 <- paste(s3, s2)
s6 <- paste0(s3, s2)
s7 <- paste0(c(s1, s2, s3), collapse = "; ")

s4 = s1 + ". " + s2
s5 = s3 + " " + s2
s6 = s3 + s2
s7 = "; ".join([s1, s2, s3])

R
Python

print(s4)

[1] "This is a sentence. cat"

print(s5)

[1] "1 cat"

print(s6)

[1] "1cat"

print(s7)

[1] "This is a sentence; cat; 1"

print(s4)

This is a sentence. cat

print(s5)

1 cat

print(s6)

1cat

print(s7)

This is a sentence; cat; 1

We may also want to change the capitalization of our text:

R
Python

print(toupper(s1))

[1] "THIS IS A SENTENCE"

print(tolower(s1))

[1] "this is a sentence"

print(stringr::str_to_sentence(s2))

[1] "Cat"

print(stringr::str_to_title(s1))

[1] "This Is A Sentence"

print(s1.upper())

THIS IS A SENTENCE

print(s2.lower())

cat

print(s2.capitalize())

Cat

print(s1.title())

This Is A Sentence

We can also calculate the number of characters in our string:

R
Python

print(nchar(s1))

[1] 18

print(nchar(s2))

[1] 3

print(len(s1))

print(len(s2))

We might be interested in extracting part of a string as follows:

R
Python

print(substr(s1, start = 1, stop = 2))

[1] "Th"

print(substring(s1, first = 1, last = 4))

[1] "This"

print(substring(s1, first = 3, last = 4))

[1] "is"

print(substring(s1, first = 8))

[1] " a sentence"

print(s1[0:2])

Th

print(s1[:4])

This

print(s1[2:4])

is

print(s1[7:])

 a sentence

We may also wish to split a string into separate segments:

R
Python

print(strsplit(s1, split = " "))

[[1]]
[1] "This"     "is"       "a"        "sentence"

print(strsplit(s1, split = "a"))

[[1]]
[1] "This is "  " sentence"

print(strsplit(s1, split = "is"))

[[1]]
[1] "Th"          " "           " a sentence"

print(s1.split(" "))

['This', 'is', 'a', 'sentence']

print(s1.split("a"))

['This is ', ' sentence']

print(s1.split("is"))

['Th', ' ', ' a sentence']

Regular expressions

A regular expression (regex) is a sequence of characters that specifies a pattern in text. We can use regular expresions to:

Check if a specific sequence exists in a string;
Replace a sequence with another one;
Capturing portions of the match as placeholders and using them.

Additional regex syntax options can be found at Python’s regex syntax docs, Python’s Regular Expression HOWTO docs, as well as R’s regex docs.

R
Python

r1 <- "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 321, 11,2, 3"

r1 = "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 321, 11,2, 3"

We can check whether specific sequence of symbols exists in our string as follows:

R
Python

print(grepl("with letters and", r1))

[1] FALSE

print(grepl("with letters And", r1))

[1] TRUE

import re
#
print(re.search('with letters and', r1))

None

print(re.search('with letters And', r1))

<re.Match object; span=(19, 35), match='with letters And'>

print(bool(re.search('with letters and', r1)))

False

print(bool(re.search('with letters And', r1)))

True

We can write more genral expressions using various special characters:

. (dot) - matches any character except a newline;
^ (caret) - matches the start of the string;
$ (dollar sign) - matches the end of the string;
* (asterisk) - causes the resulting RE (regular expression) to match 0 or more repetitions of the preceding RE. For example ab* will match a followed by any zero or more repetitions of b, while .* will search for zero or more repetitions of any character;
+ - causes the resulting RE to match 1 or more repetitions of the preceding RE. For example, ab+ will match a followed by any non-zero number of bs; it will not match just a;
? - causes the resulting RE to match 0 or 1 repetitions of the preceding RE. For example, ab? will match either a or ab.
{m} - specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6} will match exactly six a characters, but not five.
{m,n} - causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible. For example, a{3,5} will match from 3 to 5 a characters. Omitting m specifies a lower bound of zero, and omitting n specifies an infinite upper bound. As an example, a{4,}b will match aaaab or a thousand a characters followed by a b, but not aaab;
[] - used to indicate a set of characters. For example, a set of characters [amk] will match a, m, or k. Ranges of characters can be indicated by giving two characters and separating them by a ‘-’, for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59;
() - matches whatever regular expression is inside the parentheses. For example, (abc) will match abc.
| - matches either one of two REs. For example, A|B (where A and B can be arbitrary REs), creates a regular expression that will match either A or B. Can also be used inside () to match part of an output. For example, a(bc|d)e will match either abce or ade.
\ (in Python) or \\ (in R) - escapes special characters. For example \- will allow to match the symbol -, same goes for \?, \+, \., \(, \[, etc.
We might want to capture the contents of one or more groups in () of the same number ordering. In Python we can \number (e.g. \1, \2, etc.), while in R we would use \\number (e.g. \\1, \\2, etc.). See the example at the end of this section.

More special characters are available in Python’s re docs and R’s regex docs

R
Python

print(grepl("this", r1))

[1] TRUE

print(grepl("^this", r1))

[1] FALSE

print(grepl("^This", r1))

[1] TRUE

print(grepl("2$", r1))

[1] FALSE

print(grepl("[0-9]$", r1))

[1] TRUE

print(grepl("^This.*3$", r1))

[1] TRUE

print(bool(re.search('this', r1)))

True

print(bool(re.search('^this', r1)))

False

print(bool(re.search('^This', r1)))

True

print(bool(re.search("2$", r1)))

False

print(bool(re.search("[0-9]$", r1)))

True

print(bool(re.search("^This.*3$", r1)))

True

R
Python

print(grepl("[:digit:]", r1))

[1] TRUE

print(grepl("numbers [0-9],", r1))

[1] FALSE

print(grepl("numbers [0-9]+,", r1))

[1] TRUE

print(grepl("[0-9]+.*[0-9].*[0-9]", r1))

[1] TRUE

print(grepl("^this", r1))

[1] FALSE

print(grepl("^This", r1))

[1] TRUE

print(bool(re.search('[:digit:]', r1)))

True

print(bool(re.search('numbers [0-9],', r1)))

False

print(bool(re.search('numbers [0-9]+,', r1)))

True

print(bool(re.search('[0-9]+.*[0-9].*[0-9]', r1)))

True

print(bool(re.search('^this', r1)))

False

print(bool(re.search('^This', r1)))

True

We can replace characters by substituting them with another set of characters:

R
Python

print(gsub("[0-9]", "0", r1))

[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 000, 00,0, 0"

print(gsub("[0-9]+", "0", r1))

[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 0,0, 0"

print(gsub("[0-9]*", "0", r1))

[1] "0T0h0i0s0 0i0s0 0a0 0s0e0n0t0e0n0c0e0 0w0i0t0h0 0l0e0t0t0e0r0s0 0A0n0d0 0t0h0i0s0 0i0s0 0a0 0b0u0n0c0h0 0o0f0 0$0y0m0b0o0l0s0 0A0N0d0 0n0u0m0b0e0r0s0 0,0 0,0,0 0"

print(gsub("[0-9]$", "0", r1))

[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 321, 11,2, 0"

print(re.sub('[0-9]', '0', r1))

This is a sentence with letters And this is a bunch of $ymbols ANd numbers 000, 00,0, 0

print(re.sub('[0-9]+', '0', r1))

This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 0,0, 0

print(re.sub('[0-9]*', '0', r1))

0T0h0i0s0 0i0s0 0a0 0s0e0n0t0e0n0c0e0 0w0i0t0h0 0l0e0t0t0e0r0s0 0A0n0d0 0t0h0i0s0 0i0s0 0a0 0b0u0n0c0h0 0o0f0 0$0y0m0b0o0l0s0 0A0N0d0 0n0u0m0b0e0r0s0 00,0 00,00,0 00

print(re.sub('[0-9]$', '0', r1))

This is a sentence with letters And this is a bunch of $ymbols ANd numbers 321, 11,2, 0

R
Python

print(gsub("\\$", "s", r1))

[1] "This is a sentence with letters And this is a bunch of symbols ANd numbers 321, 11,2, 3"

print(re.sub('\$', 's', r1))

This is a sentence with letters And this is a bunch of symbols ANd numbers 321, 11,2, 3

print(re.sub('\\$', 's', r1))

This is a sentence with letters And this is a bunch of symbols ANd numbers 321, 11,2, 3

We can also chain multiple substitutions:

R
Python

print(gsub("[aA][nN][dD]", "and", gsub("\\$", "s", r1)))

[1] "This is a sentence with letters and this is a bunch of symbols and numbers 321, 11,2, 3"

print(re.sub("[aA][nN][dD]", "and", re.sub('\\$', 's', r1)))

This is a sentence with letters and this is a bunch of symbols and numbers 321, 11,2, 3

As well as search and replace specific repetitions of patterns:

R
Python

print(gsub("[0-9]{1,2}", "0", r1))

[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 00, 0,0, 0"

print(gsub("[0-9]{2,3}", "0", r1))

[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 0,2, 3"

print(gsub("[0-9]{2}", "0", r1))

[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 01, 0,2, 3"

print(gsub("[0-9]{3}", "0", r1))

[1] "This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 11,2, 3"

print(re.sub('[0-9]{1,2}', '0', r1))

This is a sentence with letters And this is a bunch of $ymbols ANd numbers 00, 0,0, 0

print(re.sub('[0-9]{2,3}', '0', r1))

This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 0,2, 3

print(re.sub('[0-9]{2}', '0', r1))

This is a sentence with letters And this is a bunch of $ymbols ANd numbers 01, 0,2, 3

print(re.sub('[0-9]{3}', '0', r1))

This is a sentence with letters And this is a bunch of $ymbols ANd numbers 0, 11,2, 3

Finally, we can capture portions of text and re-use them. For example, we might want to midify everything else except a specific portion of text:

R
Python

print(gsub("^(This).*", "\\1 !", r1))

[1] "This !"

print(re.sub('^(This).*', r'\1 !', r1))

This !

R
Python

print(gsub("^([a-zA-Z]+) .*([0-9])$", "\\1 -> \\2", r1))

[1] "This -> 3"

print(re.sub('^([a-zA-Z]+) .*([0-9])', r'\1 -> \2', r1))

This -> 3

A tip on using regular expressions

Regular expressions can be confusing at times (e.g. you might write a complex regular expression and later, after a couple of weeks, forget how it worked). Fortunately, there are various online resources (such as regexr.com) that provide helpful highlighting for specific parts of regular expressions. Furthermore, you can try to split a single larger regular expression into multiple smaller ones and carry out text cleaning/replacement in multiple lines of code, instead of one single (and complex) expression.

Boolean values

Boolean values can only have two values - true (sometimes also represented by 1) or flase (sometimes also represented by 0).

R
Python

b1 <- TRUE
b2 <- FALSE
x1 <- 4
x2 <- 1

b1 = True
b2 = False
x1 = 4
x2 = 1

We can print the values and their logical negations:

R
Python

print(b1)

[1] TRUE

print(!b1)

[1] FALSE

print(b2)

[1] FALSE

print(!b2)

[1] TRUE

print(b1)

True

print(not b1)

False

print(b2)

False

print(not b2)

True

If we perform numeric operations, then the true/false values are treated as numeric 1/0:

R
Python

print(b1 + b1)

[1] 2

print(b2 + b2)

[1] 0

print(b1 + b2)

[1] 1

print(b2 - b1)

[1] -1

print(b1 + b1)

print(b2 + b2)

print(b1 + b2)

print(b2 - b1)

-1

We can also use logical operators:

R
Python

print(b1 & b2)

[1] FALSE

print(b1 | b2)

[1] TRUE

print(x1 < x2)

[1] FALSE

print(x1 > x2)

[1] TRUE

print(b1 and b2)

False

print(b1 or b2)

True

print(x1 < x2)

False

print(x1 > x2)

True

R
Python

print(x1 == x2)

[1] FALSE

print(1 == TRUE)

[1] TRUE

print(0 == FALSE)

[1] TRUE

print(x1 == x2)

False

print(1 == True)

True

print(0 == False)

True

We can also chain multiple logical operators:

R
Python

print((x1 > 8) & (x2 <= 1))

[1] FALSE

print((x1 > 8) | (x2 <= 1))

[1] TRUE

print((x1 > 8) and (x2 <= 1))

False

print((x1 > 8) or (x2 <= 1))

True

Finally, we can cast numeric values to boolean ones and vice versa (note the difference for:

R
Python

print(as.logical(1))

[1] TRUE

print(as.logical(0))

[1] FALSE

print(as.logical(100))

[1] TRUE

print(as.logical(-100))

[1] TRUE

print(bool(1))

True

print(bool(0))

False

print(bool(100))

True

print(bool(-100))

True

Special values

Special values are usually reserved to represent missing or undefined values.

A special values in Python are unavailable in the base installation but are defined in the numpy and pandas packages:

import numpy as np
import pandas as pd

The `NA` (not available)

Used to define missing values.

Arithmetic operators are undefined for NA values:

R
Python

print(NA)

[1] NA

print(NA + NA)

[1] NA

print(NA - NA)

[1] NA

print(NA + 1)

[1] NA

print(NA - 1)

[1] NA

print(NA * 0)

[1] NA

print(pd.NA)

<NA>

print(pd.NA + pd.NA)

<NA>

print(pd.NA - pd.NA)

<NA>

print(pd.NA + 1)

<NA>

print(pd.NA - 1)

<NA>

print(pd.NA * 0)

<NA>

On the other hand, some of the logical operators have different results, depending on the logical conclusion of the comparison:

R
Python

See the Python tab for a discussion on the differences between the and operator and the & operator when dealing with missing values.

In Python the and operator is not the same as the & operator. The and operator in Python cannot be overridden, whereas the & operator (also __and__) can. Hence the choice the use & in numpy and pandas packages.

print(pd.NA or pd.NA)

boolean value of NA is ambiguous

print(pd.NA and True)

boolean value of NA is ambiguous

x and y triggers the evaluation of bool(x) and bool(y), if x evaluates to false, then the value of bool(y) is returned. If x is a vector (i.e. contains multiple values) or NA, then its true/false value cannot be determined.

R
Python

print(NA == NA)

[1] NA

print(NA | NA)

[1] NA

print(NA & NA)

[1] NA

print(pd.NA == pd.NA)

<NA>

print(pd.NA | pd.NA)

<NA>

print(pd.NA & pd.NA)

<NA>

R
Python

print(NA | TRUE)

[1] TRUE

print(NA | FALSE)

[1] NA

print(pd.NA | True)

True

print(pd.NA | False)

<NA>

R
Python

print(NA & TRUE)

[1] NA

print(NA & FALSE)

[1] FALSE

print(pd.NA & True)

<NA>

print(pd.NA & False)

False

We also have a number of functions defined in order to check if it is a special value:

R
Python

print(is.null(NA))

[1] FALSE

print(is.nan(NA))

[1] FALSE

print(is.na(NA))

[1] TRUE

print(is.infinite(NA))

[1] FALSE

print(pd.isnull(pd.NA))

True

print(np.isnan(pd.NA))

<NA>

print(pd.isna(pd.NA))

True

print(np.isinf(pd.NA))

<NA>

The `NaN` (not a number)

Any numeric calculations with an undefined result. In general, a division by zero is undefined, however this ambiguity is presented differently in R and Python:

R
Python

print(1 / 0)

[1] Inf

print(0 / 0)

[1] NaN

print(1 / 0)

division by zero

print(0 / 0)

division by zero

print(1 / np.float64(0))

inf

<string>:1: RuntimeWarning: divide by zero encountered in scalar divide

print(np.float64(0) / np.float64(0))

nan

<string>:1: RuntimeWarning: invalid value encountered in scalar divide

R
Python

print(is.null(NaN))

[1] FALSE

print(is.nan(NaN))

[1] TRUE

print(is.na(NaN))

[1] TRUE

print(is.infinite(NaN))

[1] FALSE

print(pd.isnull(np.NaN))

True

print(np.isnan(np.NaN))

True

print(pd.isna(np.NaN))

True

print(np.isinf(np.NaN))

False

The `Inf` (infinite)

Infinite values are also represented as special values:

R
Python

print(1e500)

[1] Inf

print(Inf + Inf)

[1] Inf

print(Inf - Inf)

[1] NaN

print(Inf / Inf)

[1] NaN

print(1e500)

inf

print(np.Inf + np.Inf)

inf

print(np.Inf - np.Inf)

nan

print(np.Inf / np.Inf)

nan

R
Python

print(is.null(Inf))

[1] FALSE

print(is.nan(Inf))

[1] FALSE

print(is.na(Inf))

[1] FALSE

print(is.infinite(Inf))

[1] TRUE

print(is.infinite(1e500))

[1] TRUE

print(pd.isnull(np.Inf))

False

print(np.isnan(np.Inf))

False

print(pd.isna(np.Inf))

False

print(np.isinf(np.Inf))

True

print(np.isinf(1e500))

True

The `NULL/None` (undefined)

Note that undefined values are treated differently in R and Python:

R
Python

print(is.null(NULL))

[1] TRUE

print(is.nan(NULL))

logical(0)

print(is.na(NULL))

logical(0)

print(is.infinite(NULL))

logical(0)

print(pd.isnull(None))

True

print(np.isnan(None))

ufunc 'isnan' not supported for the input types, and the inputs could not be
safely coerced to any supported types according to the casting rule ''safe''

print(pd.isna(None))

True

print(np.isinf(None))

ufunc 'isinf' not supported for the input types, and the inputs could not be
safely coerced to any supported types according to the casting rule ''safe''

With the exception of the + operator for two strings in Python.↩︎