2.4 Using Python
The base functionality of Python
is provided in this section. Additional functions and explanations relating to specific methods or algorithms are provided in their respectful chapters in this book. We note that Python 3.6
or higher should be used (Python 2.7
is an older legacy version with which some of the code from this book will not work).
2.4.1 Python setup
There are a number of ways to setup Python
on your machine. We will outline the three most frequent methods below:
- The standard Python installation;
- The Miniconda distribution of Python;
- The Anaconda distribution of Python.
Both Miniconda and Anaconda distributions utilise the conda
package in their Python
installations, which allows to download and install additional Python
packages. The standard Python
installation uses the pip
package to download and install additional Python
packages.
2.4.1.1 What’s the difference between pip
and conda
?
In short, pip
allows us to only install Python
packages. In other words, we would not have the ability to easily install additional non-Python libraries. In contrast, conda
is a packageing tool and installer, which handles library dependencies outside of Python
-only packages, as well as the Python
packages themselves.
You can use conda
and pip
side-by-side, however you cannot use them interchangeably - pip
cannot install conda
format packages.
2.4.1.2 What’s the difference between Miniconda and Anaconda?
The differences are as follows:
- Miniconda = Python +
conda
- Anaconda = Miniconda +
conda install anaconda
In other words, Anaconda contains an additional (~160) Python
packages than the miniconda distribution.
Take note that these additional packages may result in a total installation time of ~40-60 minutes for Anaconda. This also means that if you need to reinstall Anaconda, you will need to wait ~20-30 minutes for the uninstall process to complete, and then an additional 40 - 60 minutes for the installation to complete.
2.4.1.3 Which version should I choose?
For classes, it is recommended to choose the Anaconda
distribution, as it contains most of the packages needed.
Alternatively, you can install Miniconda
and the appropriate packages, e.g. see the beginning of Ch.3.11, or Ch.4.11. Installing Miniconda
should take less time than Anaconda
and may be faster, in case you need to reinstall it later.
Finally, only choose to install the standard Python
installation, if you have some programming experience and are not afraid of messing with packages installation, which may require configurating additional library dependencies manually.
Only use one method to setup your Python
environment, as having more than one installation may cause software conflicts!
2.4.1.4 Installing Python
via the Anaconda distribution
Note: the website design for Anaconda has changed, as well as the website itself - www.anaconda.com.
We can install the Anaconda distribution of Python as follows:
Download the appropriate version depending on your operating system:
Make sure you download Anaconda for the latest version of Python:
Again, do not use Python 2.7 as the code syntax and package compatibility will break.
When installing anaconda, make sure that the following boxes are checked (unless you already have an existing non-anaconda python distribution installed):
Finally, after installing anaconda, launch the Anaconda Navigator and go to the packages:
There, check for any updates:
Then, navigate back and update JupyterLab
:
After updating JupyterLab, you can update the remaining packages by opening the terminal:
and inputting:
2.4.2 A quick way to launch JupyterLab
On some systems launching the Anaconda navigator may take some time and since we are only interested in JupyterLab, we will make it easier for ourselves by creating an executable for JupyterLab with a custom home directory. Doing so is as straightforward as creating a folder called PrEcon
on your desktop:
Then, open up Notepad
and input:
Replace YOUR_PC_USER
with your PC user and save the file on your desktop as JupyterLab.bat
. In my case this is:
Make sure that you have selected ‘All Files’ for the file type.
One you double click on the .bat
file, you will open up a window in your browser but do not close the terminal window as this will close JupyterLab!:
2.4.3 Introductory Python tutorial
Note that most of the functions and methods used in this book will be provided in each chapter. This sections serves only as a quick introduction to the basic functionality of Python.
In general, it is recommended to do either the Introduction to Python tutorials or The Python language from the Scipy Lecture Notes for a quick introduction without any additional software requirements.
On the other hand, similarly to R
’s swirl
package, we can install PyCharm Edu and get an interactive tutorial (unlike R
, here we need to use a different application, instead of an additional package).
2.4.3.1 Python language tutorial using PyCharm
Download PyCharm Edu
and install it. Make sure that you already have Anaconda
(or alternatively, the base Python
but not both as it may cause errors) installed before installing Pycharm Edu
.
Once installed, start PyCharm Edu
:
If you are certain that everything installed correctly, click Learn
to browse courses and select Introduction to Python
. Alternatively, to verify that everything works correctly, you can click Create New Project
.
You can name your project anything you want and click Create
.
In case you have a Python error that python_d.exe
is not found when PyCharm
creates the Project - see this question on stackoverflow.
Inside the Project select File -> Learn -> Browse Courses
:
in the new dialog window select Introduction to Python
:
and click Join
.
Note:
- The
Interpreter:...
may be different (for example,conda
if you are usingAnaconda
) - don’t change it unless you know what you are doing -PyCharm Edu
selects an available interpreter automatically. - In case you get a message that the
PyCharm
interpreter is not configured (even though you have selected aPython
version/interpreter) - wait for theIndexing...
to finalize and restartPyCharm
.
Finally, the selected course will be loaded:
- The left window is the available lessons.
- The middle window is your code and input window - note the highlighted text
type your name
, where you need to input your name in the first task. - The right window contains the description of the task, as well as allows you to look at the hints, if you get stuck.
After inputting the required fields, you can click the green arrow to run your code in the script file:
The bottom window will automatically open and show the output of the script.
After examining the output and feeling confident about your answer, click the Check
button. You can examine the output of the by clicking on Run
in the bottom-left:
If you want to try some other commands and examine their output - you can click on Python console
and type some commands in the console at the bottom to execute them one-by-one (as opposed to the script file in the middle window, which executes all of the commands if you press Check
or click the previously mentioned green arrow to execute the code).
Finally, click Next
to go to the next lesson.
If you accidentally opened more than one tutorial, you can manage your existing projects (open previously saved projects or delete existing ones) via
File -> Open Recent -> Manage Projects
:
This interactive tutorial will help you familiarize yourself with the basic functionality and syntax of the Python
programming language.
After getting familiar with Python
iteself, we can move on to JupyterLab
, where we will examine hwo we can blend together Python
code, its output, add some comments, text formatting as well as mathematical formulas in one document. This makes it easier to have templates/examples of data analysis tasks with model estimation code and result interpretation, without having to spend extra time by copy-pasting them in some other document.
2.4.3.2 Introductory JupyterLab
notebook tutorial
Launch JupyterLab and create a new notebook file:
and rename it to python_intro
:
There are three different cells to choose from:
Code
- this type of cell treats the input as python (because we created a python notebook) code;Markdown
- this type of cell treats the input as markdown code;Raw
- the input is treated as raw text;
You also have a menu to:
- save changes to your notebook;
- add a new cell of the selected type to your notebook;
- cut a selected cell
- copy a selected cell;
- paste the copied cells;
- run a selected cell;
- stop a selected running cell;
- restart the notebook kernel - this clears the current workspace of any variables and loaded packages and is somewhat equivalent to restarting RStudio.
Next, create three different blocks with the following:
Code
cell with:
Markdown
cell with:
# This is a title
This is some sample text. **This is some bold text**.
*This is some text in italics*.
$X = 3$, $Y = 4$
## This is a subtitle
$$
\sum_{k = 0}^{\infty} ar^k = \dfrac{a}{1-r}, \text{ for } |r| < 1
$$
- This is a list item
- This is another list item
- This is a nested list item
- This is another nested list item
### This is a subsubtitle
1. Some sample text and a formula: $X = Y + Z$
2. Some sample text and a formula:
$$ 1 + r + r^2 + r^3 + ... = \lim_{n \rightarrow \infty} \left( 1 + r + r^2 + ... + r^n \right)$$
1. Some sample text
2. Some sample text
3. A matrix: $\begin{pmatrix}
a & b \\
c & d
\end{pmatrix} =
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix}
$
Another, centered matrix:
$$
\begin{align}
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix} &=
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix}
+
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix}\\
&-
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix}
\end{align}
$$
Raw
cell with:
print("Hello, world!")
- This is a list item
- This is another list item
- This is a nested list item
- This is another nested list item
x = 3
print(x)
$X = Y + Z$
You can either compile a selected cell by pressing CTRL + ENTER
, or all the cells with:
Notice that the Raw
cell doesn’t produce any output and doesn’t compile any LaTeX / Markdown code.
2.4.3.3 Python
programming language at a glance
Below we present some code examples of Python
s code syntax. Explanations are minimal - the idea is to have quick examples with output to verify how Python
works. For more in depth examples, see the previous subsection.
2.4.3.3.1 Strings
Run the following code and verify that you understand what happened to the output:
Assign, print and transform strings:
## this is a sentence
## This is a sentence
## This Is A Sentence
## THIS IS A SENTENCE
Split a string into a list of words and select different elements from the list:
## ['this', 'is', 'a', 'sentence']
## this
## ['this']
## ['this', 'is']
## ['this', 'is', 'a']
## sentence
## ['a', 'sentence']
## ['this', 'is']
## ['this', 'is', 'a']
## ['this', 'is']
Combine different strings:
## this is a
## AB
## this is a dog
## this is a DOG
## This is a dog.
Trim white-space, add line breaks and tab spacing:
## ' a '
## 'a '
## ' a'
## 'a'
## this is a sentence
##
## this is a sentence
## ----
## this is a sentence
## ----
2.4.3.3.2 Numbers
Run the following code and verify that you understand what happened to the output:
Assign values to variables, print the values with a string text and perform basic math operations:
## x = 2 y = 3
## 5
## -1
## 6
## 11
## 15
Carry a value to the power of different values:
## 1
## 2
## 4
## 9
## 8
## 5.0
2.4.3.3.3 Lists
A list can store multiple variables. The variables need not be of the same type.
Print different items in a list, combine different lists, etc.:
## ['dog', 11, 'cat', 13.5]
## dog
## 11
## []
## ['dog', 11, 'cat', 13.5, 'dog', 11, 'cat', 13.5]
## ['dog', 11, 'cat', 13.5, 'car', 5.0, 'pencil', 3.5]
## DOG
Note the different data types:
## <class 'list'>
## <class 'list'>
## <class 'str'>
## <class 'int'>
## <class 'str'>
Change items in the list:
## ['dog', 11, 'cat', 13.5]
## ['car', '11', 'cat', '13.5']
Add or remove items from a list:
## []
## ['dog', 'cat', 'dog & cat']
## ['dog', 'cow', 'cat', 'dog & cat']
## ['cow', 'cat', 'dog & cat']
## ['cow', 'cat']
## cat
## ['cow']
## ['car', 'cow']
## 'car'
## ['cow']
Finding elements in a list:
## ['cow', 'cat']
## 0
## 1
We get an error if we try to print an index of an item which is not in the list:
## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: 'dog' is not in list
##
## Detailed traceback:
## File "<string>", line 1, in <module>
List with numeric values:
## Minimum: -2
## Maximum: 10
## Total: 33
Sort a list and print its length:
## ['cat', 'cow', 'dog', 'dog & cat']
## ['dog', 'cow', 'cat', 'dog & cat']
## ['cat', 'cow', 'dog', 'dog & cat']
## ['dog & cat', 'dog', 'cow', 'cat']
## This list contains 4 elements
Split strings:
## This is a sentence.
## ['This', 'is', 'a', 'sentence.']
## ['This is ', ' sentence.']
## <class 'str'>
## 19
## <class 'str'>
## T
## This i
Note that some of the functions, like insert()
, remove()
, sort()
, pop()
, etc. change the original elements in x
. This is because lists are so called mutable objects. Mutable objects can be changed after they are created. Mutable objects are passed by object reference, instead of value.
## [1, 2, 3]
## [1, 2, 3]
## [1, 2, 3, -1]
## [1, 2, 3, -1]
## [1, 2, 3, -1, 10]
## [1, 2, 3, -1, 10]
## [11, 2, 3, -1, 10]
## [11, 2, 3, -1, 10]
A workaround is to explicitly create a new variable, instead of a reference:
## [1, 2, 3]
## [1, 2, 3]
## [1, 2, 3]
## [1, 2, 3, -1]
## [1, 2, 3]
## [1, 2, 3]
## [1, 2, 3, -1]
## [1, 2, 3, 10]
## [1, 2, 3]
## [1, 2, 3, -1]
## [1, 2, 3, 10]
## [11, 2, 3]
2.4.3.3.4 If Statements
We use if statements to test for some kind of condition.
## True
## False
## True
## Changed: milk to: MILK
## No MILK in list
## No MILK in list
if len(things) >=4:
print("There are at least 4 elements in the list")
elif len(things) ==3:
print("There are 3 elements in the list")
else:
print("There are less than 3 elements in the list")
## There are 3 elements in the list
## True
## False
## False
## True
## False
## True
## True
## False
## True
## True
2.4.3.3.5 Loops
We can loop through each item in a list. Note that we need to transform any non-strings to strings if we want to print and concatenate the value into a string:
## Item in the list: car
## Item in the list: 11
## Item in the list: pencil
## Item in the list: 13.5
## Item 0 is: car
## Item 1 is: 11
## Item 2 is: pencil
## Item 3 is: 13.5
Format a list as a numbered list via enumerate()
:
list_format_1 = ")"
list_format_2 = " ;"
for index, value in enumerate(item_list, 1):
print(("{}" + list_format_1 + " " + "{}" + list_format_2).format(index, value))
## 1) car ;
## 2) 11 ;
## 3) pencil ;
## 4) 13.5 ;
In the above example our numbered list started at 1. The list index numbers and the list values are printed in the {}
symbols. Each list number is formated as i)
, followed by the list element value and with the ;
symbol appended to the end. If we wanted, we could change, or remove these extra formatting options.
We can also create the formatting in a different way:
item_list = ["This", "is", "a", "list"]
for index, item_i in enumerate(item_list):
print("Item index is: " + str(index) + ", item #" + str(index + 1) +
", item: " + item_i.title())
## Item index is: 0, item #1, item: This
## Item index is: 1, item #2, item: Is
## Item index is: 2, item #3, item: A
## Item index is: 3, item #4, item: List
range()
function:
## 1
## 2
## 3
## 4
## [1, 2, 3, 4]
2.4.3.3.6 Tuples
Tuples are sequences, just like lists. The differences between tuples and lists - tuples cannot be changed, unlike lists, and tuples use parentheses, whereas lists use square brackets. Tuples are immutable which means you cannot update or change the values of tuple elements. You can, however, take portions of existing tuple variables and create new tuple variables.
item_t = ("This", "is", "a", "tuple")
for index, item_i in enumerate(item_t):
print("Item index is: " + str(index) + ", item #" + str(index + 1) +
", item: " + item_i.title())
## Item index is: 0, item #1, item: This
## Item index is: 1, item #2, item: Is
## Item index is: 2, item #3, item: A
## Item index is: 3, item #4, item: Tuple
## Value saved (equal to 1).
## Value saved (equal to 1).
## The color is green.
## My #1 color is green
## My #1 color is green
numbers = [7, 23, 42]
print("The numbers are %d, %d, and %d." % (numbers[0], numbers[1], numbers[2]))
## The numbers are 7, 23, and 42.
2.4.3.3.7 Dictionaries
Dictionaries allows storing data in key-value pairs.
import numpy as np
my_dictionary = {"name": "Python",
"is_active": True,
"test_scores": [9, 8, 10]}
if my_dictionary["is_active"]:
print("Active language is " + my_dictionary["name"] +
" with average score of %f" % np.mean(my_dictionary["test_scores"]))
## Active language is Python with average score of 9.000000
my_dict = {'key_1': 'value_1', 'key_2': 'value_2', 'key_3': 'value_3'}
for key in my_dict:
print('Key: %s' % key)
## Key: key_1
## Key: key_2
## Key: key_3
## Key: key_1 and value: value_1
## Key: key_2 and value: value_2
## Key: key_3 and value: value_3
%d
is the format code for an integer, %f
is the format code for a float.
## 7.333333
## 7
## 8.666667
## 8
## dict_keys(['name', 'is_active', 'test_scores'])
## dict_values(['Python', True, [9, 8, 9]])
2.4.3.3.8 Functions
#Define a function
def my_sum_function(x1, x2):
#Add function code
print("Adding values ...")
return x1 + x2
Use The function:
## Adding values ...
## 3
Define another function:
def number_to_word(number):
if number == 0:
return "zero"
elif number == 1:
return "one"
elif number == 2:
return "two"
elif number < 0:
return "negative"
else:
return "greater than two"
Use the function in a loop:
numbers = [0, 1, -1, 3]
for i in range(0, len(numbers)):
print("The number is " + number_to_word(numbers[i]))
## The number is zero
## The number is one
## The number is negative
## The number is greater than two
2.4.3.3.9 Classes
Classes allow combining information and behaviour. For an example, see 2.7.7.