Installation via containers

Unlike the direct installation method, here you will only need to have one program directly installed on your computer (other software will be installed in the image itself) - Docker, which we will use to create our software image. This image will then be used as a read-only (i.e. immutable) template for quick container creation.

You can also install the following software:

Install the above software (or, at the very least, just the docker software) and proceed to the next section.

Important

The resulting docker image is designed for learning purposes on your own machine only. It’s not meant (nor is it safe) to deploy the created image on cloud or production servers, as it doesn’t contain any security measures at all.

Tip

You don’t need an account to use docker. Furthermore, to prevent accidentally pushing an image to a public docker registry it is recommended to avoid creating an account or logging in, at least until you are more familiar with docker.

Note

If you are unfamiliar with docker then, after installing the appropriate software, see the following sections:

  • See the section on the linux command line - our docker image is based on Ubuntu, so we will need to be familiar with some of the most common terminal commands.
  • See the section on docker basics, which contains a number of examples on how to build simple images, create and access containers, as well as remove older ones.

Full docker image and configuration files

Our main goal is to create a docker image with the following software, which we will access through our web browser:

  • RStudio server.
  • JupyterLab.
  • code-server.

In order to manage these applications easier, we will utilize nginx as a reverse proxy and create a static .html webpage with links to the apps inside the container. We will also use supervisor, which will launch and monitor all of the above software when we start the container.

On your desktop create a folder named dockerDS and then create the following folder/file1 structure:

dockerDS
├── config
   ├── rstudio
   │   ├── rserver.conf
   │   ├── rsession.conf
   │   ├── logging.conf
   │   ├── rstudio-prefs.json
   │   ├── rstudio_bindings.json
   │   ├── editor_bindings.json
   │   ├── addins.json
   │   └── environment.txt
   │   └── install_libraries.R
   ├── python
   │   ├── ipython_config.py
   │   ├── jupyter_lab_config.py
   │   ├── jupyter_server_config.py
   │   └── requirements.txt
   ├── code-server
   │   └── config.yaml
   ├── nginx
   │   └── nginx.conf
   ├── www
   │   └── index.html
   └── supervisor
       ├── supervisord.conf
       ├── nginxgo.conf
       ├── rserver.conf
       ├── jupyterlab.conf
       └── codeserver.conf
└── datascience.dockerfile

To summarize the above:

  • Inside the dockerDS folder create a folder named config.
  • Inside the config folder create the following folders: rstudio, python, code-server, nginx, www and supervisor.
  • Populate the folders with specific files that are outlined below.

Configuration files for rstudio

Inside /dockerDS/config/rstudio/ directory add the following files2:

Create /dockerDS/config/rstudio/rserver.conf with the following content:

# /etc/rstudio/rserver.conf
# RStudio Server Configuration File
## https://docs.posit.co/ide/server-pro/reference/rserver_conf.html#server-settings

www-address=127.0.0.1
www-port=8787

auth-timeout-minutes=0
auth-stay-signed-in-days=365

See the official rserver.conf documentation for a list of available options. We are interested in the following options:

  • www-address - the network address that RStudio server will listen on for incoming connections. We want to use a reverse proxy, so we should set it to 127.0.0.1. Otherwise, we would have to specify 0.0.0.0.

  • www-port - the port that RStudio server will bind to while listening for incoming connections. See this answer on stackoverflow for more info on ports.

  • auth-timeout-minutes - the number of minutes a user will stay logged in while idle before required to sign in again. We set this to 0 to disable it.

  • auth-stay-signed-in-days - the number of days to keep a user signed in when using the “Stay Signed In” option. We set this to 365 days in order to prevent disconnections due to inactivity.

If you find some other options that you may need to tweak, feel free to customize this file.

Create /dockerDS/config/rstudio/rsession.conf with the following content:

# /etc/rstudio/rsession.conf
# R Session Configuration File
## https://docs.posit.co/ide/server-pro/reference/rsession_conf.html#session-settings
## https://docs.posit.co/ide/server-pro/rstudio_pro_sessions/directory_management.html

session-timeout-minutes=0

session-default-working-dir=/media/container_shared/
session-default-new-project-dir=/media/container_shared/

restrict-directory-view=1
directory-view-allow-list=/media/container_shared/

See the official rsession.conf documentation for a list of available options. We are interested in the following options:

  • session-timeout-minutes - the amount of minutes before a session times out, at which point the session will either suspend or exit. We set it to 0 to disable the time out timer.

  • session-default-working - specifies the default working directory to use for new sessions. We will use persistent volumes to save our projects and user settings, so we specify that this directory will be inside the /media/container_shared/ directory in the docker image/container.

  • session-default-new-project-dir - specifies the default directory to use for new projects. A similar (now deprecated) . We will leave it set the same as session-default-working just in case.

  • restrict-directory-view - indicates whether or not to restrict the directories that can be viewed within the IDE. Since the data inside the docker container won’t be mutable, we don’t want to accidentally save important files inside the container. That is why we set it to 1. Note that this does not prevent us from seeing the directories themselves inside the container, but it prevents opening/saving files in the container itself.

  • directory-view-allow-list - specifies a list of directories exempt from directory view restrictions, separated by a colon character. Along with the restrict-directory-view setting, we set /media/container_shared/ as the only directory for loading/saving files in our container.

Create /dockerDS/config/rstudio/logging.conf with the following content:

# /etc/rstudio/logging.conf
# Logging Configuration file

## https://docs.posit.co/ide/server-pro/server_management/logging.html#configuration-file

[*]
log-level=info
logger-type=stderr

# Prevent INFO messages being print to stdout in RStudio Server
[@rsession-rstudio]
log-level=warn
logger-type=syslog

The above logging configuration file is used to set RStudio server messages to stderr, which will be captured by our supervisor process. Note that we change the logging to syslog for the @rsession-rstudio process, since its warning and error output would be printed directly to our rstudio session console window. More configuration options can be found in the official documentation on logging configuration.

Create /dockerDS/config/rstudio/rstudio-prefs.json with the following content:

{
    "save_workspace": "never",
    "always_save_history": false,
    "reuse_sessions_for_project_links": true,
    "posix_terminal_shell": "bash",
    "rainbow_parentheses": true
}

Two important links relating to customizing RStudio preferences:

  • Session user settings - a list of settings supported in the rstudio-prefs.json file, along with their type, allowable values, and defaults.

  • Customizing session settings - contains general documentation on additional customization options, if needed.

The settings that we have changed are:

  • save_workspace - whether to save the workspace to an .Rdata file after the R session ends. We usually want have a fresh start with a clean environment in order to guarantee replicable results.

  • always_save_history - we’ll use .R scripts and do not need to save every single command that we’ve run throughout our sessions.

  • reuse_sessions_for_project_links- whether to reuse sessions when opening projects in rstudio. If we have multiple projects that we will access throughout our session, there is no need to restart the session. The idea is that our projects should have code that clear the working environment.

  • posix_terminal_shell - the terminal shell to use on POSIX operating systems (Linux/MacOS). We will specify a bash terminal shell to have some file coloring.

  • rainbow_parentheses - whether to highlight parentheses in a variety of colors. Useful if we have multiple parenthesis in our code and want an easier way to identify them.

Feel free to customize this file to suit your needs. Note that you will also be able to change and store persistent settings in the following directory in our shared folder: /media/container_shared/.rstudio. This file simply defines the “default” settings.

Create /dockerDS/config/rstudio/rstudio_bindings.json with the following content:

{
    "closeSourceDoc": "Ctrl+Alt+W"
}

RStudio server opens in the browser, so pressing Ctrl+W will close the browser tab instead of closing the .R code tab inside the RStudio server editor. To prevent this, we define a new shortcut for closing files inside the RStudio editor as Ctrl+Alt+W.

Create /dockerDS/config/rstudio/editor_bindings.json with the following content:

{}

This file contains no custom bindings but is required to prevent warning messages from RStudio server.

Create /dockerDS/config/rstudio/addins.json with the following content:

{}

This file contains no custom bindings but is required to prevent warning messages from RStudio server.

Create /dockerDS/config/rstudio/environment.txt with the following content:

AER
DAAG
DT
GGally
IRdisplay
IRkernel
MASS
Matrix
OECD
RCurl
ROCR
Rcpp
Rcrawler
arrow
astsa
bookdown
caTools
car
caret
cowplot
crayon
data.table
devtools
doParallel
doSNOW
dplyr
dyn
dynlm
e1071
eurostat
fGarch
fUnitRoots
duckdb
fansi
feather
fma
fontawesome
foreach
forecast
fpp
fpp2
fpp3
gdata
ggiraph
ggplot2
ggvis
glmnet
gplots
gt
htmltools
htmlwidgets
imputeTS
kableExtra
knitr
languageserver
fst
lars
latex2exp
lattice
leaps
lmtest
lrmest
lubridate
mFilter
markdown
mfx
mice
microbenchmark
mlr3
mlr3verse
multcomp
nnet
nortest
orcutt
pROC
pak
patchwork
pbdZMQ
plm
plotly
polynom
posterior
prophet
quantmod
randomForest
readxl
renv
repr
reshape2
purrr
qs
reticulate
rmarkdown
rpart
rpart.plot
rstudioapi
rugarch
rzmq
sandwich
seasonal
shiny
shinydashboard
shinythemes
rvest
spdep
stargazer
stringr
swirl
tempdisagg
tidymodels
tidyverse
tree
tsDyn
tseries
txtplot
urca
vars
viridisLite
waveslim
writexl
yaml
zoo
skimr
vroom

The above file contains a list of all of the libraries, that we want to install in our docker image. If you need additional libraries - make sure to add them to the list.

Create /dockerDS/config/rstudio/install_libraries.R with the following content:

# Pass argument to script to change working directory to the file location
# e.g., Rscript /tmp/install_libraries.R /tmp/
args <- commandArgs(trailingOnly = TRUE)
if(length(args) > 0){
  setwd(args[1])
}

## Read the file with the list of libraries:
lib_list <- unname(unlist(na.omit(read.table("./environment.txt"))))

## Drop any packages, which are already installed as the base, or recommended packages:
new_packages <- installed.packages()[installed.packages()[,'Package'] %in% lib_list, ]
new_packages <- data.frame(new_packages[, c("Package", "Version", "Priority")])
new_packages <- new_packages[(new_packages$Priority %in% c("base", "recommended")), ]
if(nrow(new_packages) > 0){
  lib_list <- setdiff(lib_list, unique(new_packages$Package))
}

## Install the remaining libraries (uses the default CRAN, which is set by the dockerfile)
install.packages(lib_list)

## Other libraries, that are outside of CRAN (only latest versions):
install.packages("polars", repos = "https://rpolars.r-universe.dev/bin/linux/jammy/4.3") 
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
# https://pkg.yangzhuoranyang.com/tsdl/articles/tsdl.html
devtools::install_github("FinYang/tsdl")

The above code will load the environment.txt file and install its libraries, along with a couple libraries from external (i.e. non-CRAN) sources. We will override the default CRAN URL with one from Posit Package Manager globally inside the docker image, so no other settings are necessary.

Configuration files for python and jupyterLab

Inside /dockerDS/config/python/ directory add the following files:

Create /dockerDS/config/python/ipython_config.py with the following content:

# Configuration file for ipython.

c = get_config()  #noqa

c.Completer.use_jedi = False

The above config file with the commented-out default settings can be generated by running the following terminal commands inside the docker container:

ipython profile create
ipython --debug -c 'exit()'

Note that c = get_config() is always included, as it sets the default configuration. The other option that we need is:

  • c.Completer.use_jedi - we disable Jedi, an autocompletion and code analysis library for Python. It is sometimes quite slow, so we disable it.

Create /dockerDS/config/python/jupyter_lab_config.py with the following content:

# Configuration file for lab.

c = get_config()  #noqa

c.ServerApp.ip = '127.0.0.1'
c.ServerApp.port = 8888
c.IdentityProvider.token = ''
c.ServerApp.base_url = '/jupyterlab/'
c.ServerApp.open_browser = False
c.ServerApp.disable_check_xsrf = False
c.ServerApp.root_dir = '/media/container_shared/'
c.ServerApp.terminado_settings = {'shell_command': ['/bin/bash']}
c.LabApp.user_settings_dir = '/media/container_shared/.jupyter/lab/user-settings/'

The config file with the commented-out default settings can be generated by running the following terminal command inside the docker container:

jupyter lab --generate-config

Note that c = get_config() is always included, as it sets the default configuration. The other options that we need are:

  • c.ServerApp.ip - the IP address the Jupyter server will listen on.
  • c.ServerApp.port - the port that the Jupyter server will listen on.
  • c.IdentityProvider.token - by default a random token will be generating as part of the server url. We disable it by default.
  • c.ServerApp.base_url - base URL for the Jupyter server.
  • c.ServerApp.open_browser - whether to open in a browser after starting. Since we are connecting to the Jupyter server on a docker container, we disable this option as it only works on a local Jupyter installation.
  • c.ServerApp.disable_check_xsrf - Jupyter server includes protection from cross-site request forgeries. We disable all authentication and security checks under the assumption that we will always run this container on our machine.
  • c.ServerApp.root_dir - the directory to use for notebooks and kernels. We set it to our shared directory - /media/container_shared/.
  • c.ServerApp.terminado_settings - we set the terminal window inside Jupyter server to be a bash terminal.
  • c.LabApp.user_settings_dir - we want the user settings to persist, so we set them to /media/container_shared/.jupyter/lab/user-settings/.

Create /dockerDS/config/python/jupyter_server_config.py with the following content:

# Configuration file for jupyter-server.

c = get_config()  #noqa

c.ContentsManager.allow_hidden = True

The config file with the commented-out default settings can be generated by running the following terminal command inside the docker container:

jupyter server --generate-config

Note that c = get_config() is always included, as it sets the default configuration. The other option that we need is:

  • c.ContentsManager.allow_hidden - allow access to hidden files.

Create /dockerDS/config/python/requirements.txt with the following content:

--find-links https://download.pytorch.org/whl/torch_stable.html
absl-py==2.1.0
aiohttp==3.9.3
aiohttp-cors==0.7.0
aiosignal==1.3.1
altair==5.2.0
anyio==4.2.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
arviz==0.17.0
asciitree==0.3.3
asttokens==2.4.1
astunparse==1.6.3
async-lru==2.0.4
attrs==23.2.0
Babel==2.14.0
bambi==0.13.0
beartype==0.17.0
beautifulsoup4==4.12.3
bleach==6.1.0
blessed==1.20.0
blinker==1.7.0
bokeh==2.4.3
cachetools==5.3.2
certifi==2024.2.2
cffi==1.16.0
cftime==1.6.3
charset-normalizer==3.3.2
click==8.1.7
click-default-group==1.2.4
cloudpickle==3.0.0
cloup==2.1.2
cmdstanpy==1.2.0
colorama==0.4.6
colorful==0.5.6
comm==0.2.1
commonmark==0.9.1
cons==0.4.6
contourpy==1.2.0
cycler==0.12.1
dask==2024.1.1
datar==0.15.4
datatable==1.1.0
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
diot==0.2.3
distlib==0.3.8
distributed==2024.1.1
duckdb==0.9.2
etuples==0.3.9
executing==2.0.1
fasteners==0.19
fastjsonschema==2.19.1
fastprogress==1.0.3
filelock==3.13.1
flatbuffers==23.5.26
fonttools==4.47.2
formulae==0.5.1
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2023.12.2
gast==0.5.4
gitdb==4.0.11
GitPython==3.1.41
glcontext==2.5.0
google-api-core==2.16.2
google-auth==2.27.0
google-auth-oauthlib==1.2.0
google-pasta==0.2.0
googleapis-common-protos==1.62.0
gpustat==1.1.1
graphviz==0.20.1
great-tables==0.2.0
greenlet==3.0.3
griffe==0.40.0
grpcio==1.60.1
h5netcdf==1.3.0
h5py==3.10.0
htmltools==0.5.1
idna==3.6
importlib-metadata==7.0.1
importlib-resources==6.1.1
inflection==0.5.1
ipykernel==6.29.0
ipython==8.21.0
ipywidgets==8.1.1
isoduration==20.11.0
isosurfaces==0.1.0
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
json5==0.9.14
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-cache==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.2
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_server==2.12.5
jupyter_server_terminals==0.5.2
jupyterlab==4.0.12
jupyterlab-lsp==5.0.2
jupyterlab-widgets==3.0.9
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.2
keras==2.15.0
kiwisolver==1.4.5
lckr_jupyterlab_variableinspector==3.2.1
libclang==16.0.6
llvmlite==0.42.0
locket==1.0.0
logical-unification==0.4.6
manim==0.18.0
ManimPango==0.5.0
mapbox-earcut==1.0.1
Markdown==3.5.2
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.8.2
matplotlib-inline==0.1.6
mdurl==0.1.2
miniKanren==1.0.3
mistune==3.0.2
mizani==0.9.3
ml-dtypes==0.2.0
moderngl==5.10.0
moderngl-window==2.4.4
modin==0.26.1
modin-spreadsheet==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.5
multipledispatch==1.0.0
nbclient==0.9.0
nbconvert==7.14.2
nbformat==5.9.2
nest-asyncio==1.6.0
netCDF4==1.6.5
networkx==3.2.1
notebook==7.0.7
notebook_shim==0.2.3
numba==0.59.0
numcodecs==0.12.1
numexpr==2.9.0
numpy==1.26.3
nvidia-ml-py==12.535.133
oauthlib==3.2.2
opencensus==0.11.4
opencensus-context==0.1.3
opt-einsum==3.3.0
overrides==7.7.0
packaging==23.2
pandas==2.1.4
pandocfilters==1.5.1
parso==0.8.3
partd==1.4.1
patsy==0.5.6
pexpect==4.9.0
Pillow==9.5.0
pipda==0.13.1
platformdirs==4.2.0
plotnine==0.12.4
plum-dispatch==2.3.2
polars==0.20.6
prometheus-client==0.19.0
prompt-toolkit==3.0.43
protobuf==4.23.4
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
py-spy==0.3.14
pyarrow==15.0.0
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycairo==1.25.1
pycparser==2.21
pydantic==1.10.14
pydeck==0.8.1b0
pydub==0.25.1
pyglet==2.0.10
Pygments==2.17.2
pymc==5.10.3
pyparsing==3.1.1
pyrr==0.10.3
pytensor==2.18.6
python-dateutil==2.8.2
python-json-logger==2.0.7
python-simpleconf==0.6.0
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
qtconsole==5.5.1
QtPy==2.4.1
quartodoc==0.7.2
ray==2.9.1
referencing==0.33.0
requests==2.31.0
requests-oauthlib==1.3.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.0
rpds-py==0.17.1
rsa==4.9
scikit-learn==1.4.0
scipy==1.12.0
screeninfo==0.8.1
seaborn==0.13.2
Send2Trash==1.8.2
simplug==0.3.2
siuba==0.4.2
six==1.16.0
skia-pathops==0.7.4
skimpy==0.0.14
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.5
sphobjinv==2.3.1
SQLAlchemy==2.0.25
srt==3.5.3
stack-data==0.6.3
stanio==0.3.0
statsmodels==0.14.1
streamlit==1.31.0
svgelements==1.9.6
sympy==1.12
tabulate==0.9.0
tblib==3.0.0
tenacity==8.2.3
tensorboard==2.15.1
tensorboard-data-server==0.7.2
tensorflow==2.15.0.post1
tensorflow-estimator==2.15.0
tensorflow-io-gcs-filesystem==0.35.0
termcolor==2.4.0
terminado==0.18.0
threadpoolctl==3.2.0
tinycss2==1.2.1
toml==0.10.2
toolz==0.12.1
torch==2.2.0+cpu
torchaudio==2.2.0+cpu
torchvision==0.17.0+cpu
tornado==6.4
tqdm==4.66.1
traitlets==5.14.1
typeguard==4.1.5
types-python-dateutil==2.8.19.20240106
typing_extensions==4.9.0
tzdata==2023.4
tzlocal==5.2
ujson==5.9.0
uri-template==1.3.0
urllib3==2.2.0
validators==0.22.0
virtualenv==20.25.0
watchdog==3.0.0
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
Werkzeug==3.0.1
widgetsnbextension==4.0.9
wrapt==1.14.1
xarray==2024.1.1
xarray-datatree==0.0.13
xarray-einstats==0.7.0
yarl==1.9.4
zarr==2.16.1
zict==3.0.0
zipp==3.17.0

The above file contains the names and versions of various libraries (main libraries along with their dependent libraries).

Configuration files for code-server (vscode)

Inside /dockerDS/config/code-server/ directory, add the following file:

Create /dockerDS/config/code-server/config.yaml with the following content:

# /root/.config/code-server/config.yaml
# Code Server Configuration file
# Each key in the file maps directly to a code-server flag 
# (run code-server --help to see a listing of all the flags). 
# Any flags passed to code-server will take priority over the config file.

bind-addr: 127.0.0.1:8080
auth: none
cert: false
user-data-dir: /media/container_shared/.vscode/
extensions-dir: /root/.local/share/code-server/extensions/
disable-telemetry: true
disable-update-check: true
disable-workspace-trust: true

Additional settings can be found by running code-server --help inside the container.

The settings that we have changed are:

  • bind-addr - address to bind to in host:port.

  • auth - type of authentication to use. We don’t want to have any authentication, so we set it to none.

  • cert - the ability to secure your connection between client and server using SSL/TSL certificates. Since we are connecting to this container locally - we won’t setup any certificates.

  • user-data-dir - we make sure that the user settings of code-server ar saved to /media/container_shared/.vscode/. This ensures that our settings persist even when we remove or create new containers.

  • extensions-dir - we will install code-server extensions under the root user in /root/.local/share/code-server/extensions/. In order for us to access these extensions, we have to change the extensions directory. Note that this will prevent us from installing more extensions.

  • disable-telemetry - we disable telemetry data collection.

  • disable-update-check - we don’t want to check if a never version is available, since docker containers are immutable.

  • disable-workspace-trust - workspace trust is a feature driven by the security risks associated with unintended code execution when a user opens a workspace in vscode. We disable this feature to prevent continuous popups about workspace trust, under the assumption that the contents of any loaded workspace are not malicious. In production environments, this safety feature is very important.

Configuration files for nginx

nginx is a web server that can also be used as a reverse proxy (video example).

Inside /dockerDS/config/nginx/ directory, add the following file:

Create /dockerDS/config/nginx/nginx.conf with the following content:

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

# Enable to run in foreground for supervisor; https://stackoverflow.com/a/28099946
# daemon off;

events {
    worker_connections 768;
    # multi_accept on;
}

http {

  # https://stackoverflow.com/a/23764479
  # https://github.com/jupyterlab/jupyterlab/issues/4214
  # http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size
  client_max_body_size 0;

  map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
  }

  server {
    listen 80;
    
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    # Static webpage
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    root /www/static;
    gzip_static on;
    
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    # Supervisor
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    location /supervisor/ {
      rewrite ^/supervisor(/.*) $1 break;
      proxy_pass http://localhost:9001/;
      proxy_redirect http://localhost:9001/ $scheme://$host/supervisor/;
      proxy_set_header Host $host/supervisor/;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;
    }
    
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    # RStudio Server
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    
    # https://github.com/rstudio/rstudio/issues/2834#issuecomment-459502330
    rewrite ^/auth-sign-in(.*) "$scheme://$server_name/rstudio/auth-sign-in$1?appUri=%2Frstudio"; 
    rewrite ^/auth-sign-out(.*) "$scheme://$server_name/rstudio/auth-sign-out$1?appUri=%2Frstudio";
    location /rstudio/ {
      rewrite ^/rstudio/(.*)$ /$1 break;
      proxy_pass http://localhost:8787;
      proxy_redirect http://localhost:8787/ $scheme://$host/rstudio/;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;
      proxy_read_timeout 20d;
    }
    
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    # JupyterLab
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    
    # https://jupyterhub.readthedocs.io/en/1.2.0/installation-guide-hard.html#using-nginx
    # https://blog.nathantsoi.com/article/run-jupyter-notebook-behind-a-nginx-reverse-proxy-subpath/
    location /jupyterlab/ {
      #rewrite /jupyterlab(.*) $1  break;
      proxy_pass http://localhost:8888;
      # pass some extra stuff to the backend
      proxy_set_header Host $host;
      proxy_set_header X-Real-Ip $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
    # location ~* /jupyter/(api/kernels/[^/]+/(channels|iopub|shell|stdin)|terminals/websocket|lsp/)/? {...}
    location ~ /jupyterlab/api/kernels/ {
      proxy_pass            http://localhost:8888;
      proxy_set_header      X-Real-IP $remote_addr;
      proxy_set_header      Host $host;
      proxy_set_header      X-Forwarded-For $proxy_add_x_forwarded_for;
      # websocket support
      proxy_http_version    1.1;
      proxy_set_header      Upgrade "websocket";
      proxy_set_header      Connection "Upgrade";
      proxy_read_timeout    86400;
    }
    location ~ /jupyterlab/terminals/ {
      proxy_pass            http://localhost:8888;
      proxy_set_header      X-Real-IP $remote_addr;
      proxy_set_header      Host $host;
      proxy_set_header      X-Forwarded-For $proxy_add_x_forwarded_for;
      # websocket support
      proxy_http_version    1.1;
      proxy_set_header      Upgrade "websocket";
      proxy_set_header      Connection "Upgrade";
      proxy_read_timeout    86400;
    }
    location ~ /jupyterlab/api/events/ {
      proxy_pass            http://localhost:8888;
      proxy_set_header      X-Real-IP $remote_addr;
      proxy_set_header      Host $host;
      proxy_set_header      X-Forwarded-For $proxy_add_x_forwarded_for;
      # websocket support
      proxy_http_version    1.1;
      proxy_set_header      Upgrade "websocket";
      proxy_set_header      Connection "Upgrade";
      proxy_read_timeout    86400;
    }
    location ~ /jupyterlab/lsp/ {
      proxy_pass            http://localhost:8888;
      proxy_set_header      X-Real-IP $remote_addr;
      proxy_set_header      Host $host;
      proxy_set_header      X-Forwarded-For $proxy_add_x_forwarded_for;
      # WebSocket support
      proxy_http_version    1.1;
      proxy_set_header      Upgrade "websocket";
      proxy_set_header      Connection "Upgrade";
      proxy_read_timeout    86400;
    }
    
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    # Visual Studio code-server
    # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    
    # https://github.com/coder/code-server/blob/main/docs/guide.md#using-lets-encrypt-with-nginx
    location /vscode/ {
      proxy_pass http://localhost:8080;
      proxy_set_header Host $host;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;
      proxy_set_header Accept-Encoding gzip;
    }   
  }
}

Configuration files for supervisor

Supervisor is allows its users to monitor and control a number of processes on UNIX-like operating systems. We will use it to run:

  • supervisor web interface to monitor the various processes listed below.
  • RStudio server.
  • jupyterlab.
  • code-server.
  • nginx server as a reverse proxy for all of the above processes.

Inside /dockerDS/config/supervisor/ directory add the following files:

Create /dockerDS/config/supervisor/supervisord.conf with the following content:

; supervisor config file

[unix_http_server]
file=/var/run/supervisor.sock   ; (the path to the socket file)
chmod=0700                      ; socket file mode (default 0700)

[supervisord]
user=root
nodaemon=true
logfile=/var/log/supervisor/supervisord.log ; (main log file;default $CWD/supervisord.log)
pidfile=/var/run/supervisord.pid            ; (supervisord pidfile;default supervisord.pid)
childlogdir=/var/log/supervisor             ; ('AUTO' child log dir, default $TEMP)

[inet_http_server]
port=*:9001

; the below section must remain in the config file for RPC
; (supervisorctl/web interface) to work, additional interfaces may be
; added by defining them in separate rpcinterface: sections
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]; http://supervisord.org/configuration.html#supervisorctl-section-settings
serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL  for a unix socket
;serverurl=http://localhost:9001

; The [include] section can just contain the "files" setting.  This
; setting can list multiple files (separated by whitespace or
; newlines).  It can also contain wildcards.  The filenames are
; interpreted as relative to this file.  Included files *cannot*
; include files themselves.

;[eventlistener:process-monitor]
;command=bash -c "printf 'READY\n' && while read line; do kill -SIGQUIT $PPID; done < /dev/stdin"
;events=PROCESS_STATE_STOPPED,PROCESS_STATE_EXITED,PROCESS_STATE_FATAL
;stdout_logfile=/dev/stdout
;stdout_logfile_maxbytes=0
;stdout_logfile_backups=0
;stderr_logfile=/dev/stderr
;stderr_logfile_maxbytes=0
;stderr_logfile_backups=0

; beware possible race condition
; if one of these services exit before the process-monitor is up

[include]
files = /etc/supervisor/conf.d/*.conf

Create /dockerDS/config/supervisor/nginxgo.conf with the following content:

; Config for nginx

[program:nginxgo]
;user=root
command=/usr/sbin/nginx -g 'daemon off;'
stdout_logfile=/var/log/supervisor/%(program_name)s.log
stderr_logfile=/var/log/supervisor/%(program_name)s.log
autostart=true
autorestart=false
exitcodes=0
priority=100

Create /dockerDS/config/supervisor/rserver.conf with the following content:

; Config for RStudio Server

[program:rserver]
;user=rstudio         ; https://github.com/rstudio/rstudio/issues/1663#issuecomment-401476714
environment=RSTUDIO_CONFIG_HOME=/media/container_shared/.rstudio
command=/usr/lib/rstudio-server/bin/rserver --server-daemonize=0
stdout_logfile=/var/log/supervisor/%(program_name)s.log
stderr_logfile=/var/log/supervisor/%(program_name)s.log
startsecs=0
autorestart=false
exitcodes=0

According to the settings file locations documentation and the documentation on session settings we need to make sure that RStudio configuration location is in the shared folder. We can do this by setting an environmental variable RSTUDIO_CONFIG_HOME to /media/container_shared/.rstudio.

Create /dockerDS/config/supervisor/jupyterlab.conf with the following content:

; Config for Jupyter Lab

[program:jupyter-lab]
user=root
command=jupyter lab --allow-root --config=/root/.jupyter/jupyter_lab_config.py
stdout_logfile=/var/log/supervisor/%(program_name)s.log
stderr_logfile=/var/log/supervisor/%(program_name)s.log
autorestart=false

Create /dockerDS/config/supervisor/codeserver.conf with the following content:

; Config for VS Code Server

[program:code-server] 
;user=rstudio
command=/usr/bin/code-server --config /root/.config/code-server/config.yaml /media/container_shared/
stdout_logfile=/var/log/supervisor/%(program_name)s.log
stderr_logfile=/var/log/supervisor/%(program_name)s.log
startsecs=0
autorestart=false
exitcodes=0

The static webpage

To make it easier to quickly open different processes, we can create a handy static .html webpage, which will be served by our nginx server and link to our software.

Create /dockerDS/config/www/index.html with the following content:

<!DOCTYPE html>
<html>
<head>
<style>
.button {
  display: inline-block;
  border-radius: 4px;
  background-color: #DBDBDB;
  border: none;
  color: #FFFFFF;
  text-align: center;
  font-size: 28px;
  padding: 20px;
  width: 200px;
  transition: all 0.5s;
  cursor: pointer;
  margin: 5px;
}

.button:active {
  transform: scale(.8);
}

.button span {
  cursor: pointer;
  display: inline-block;
  position: relative;
  transition: 0.5s;
}

.button span:after {
  content: '\00bb';
  position: absolute;
  opacity: 0;
  top: 0;
  right: -20px;
  transition: 0.5s;
}

.button:hover span {
  padding-right: 25px;
}

.button:hover span:after {
  opacity: 1;
  right: 0;
}

td {
  border: 1px solid black;
  width: 100px;
  height: 80px;
  overflow: hidden;
  text-align: center; 
  vertical-align: middle;  
}

img {
  height: 80px;
}

</style>
</head>
<body>

<table>
<tbody>
<tr>
<td colspan="3" style="width:100%">
    <a href="http://localhost/supervisor" target="_blank">
    <button class="button" style="vertical-align:middle; width:90%">
    <span style="color: #DE252A; font-weight: bold;"> <!-- supervisor -->
    Supervisor
    </span>
    </button>
    </a>
</td>
</tr>
<tr>
<td>
    <a href="http://localhost/jupyterlab" target="_blank">      
    <button class="button" style="vertical-align:middle">
    <span> <!-- jupyter -->
    <img src="" />
    </span>
    </button>
    </a>
</td>
<td>
    <a href="http://localhost/rstudio" target="_blank">
    <button class="button" style="vertical-align:middle">
    <span> <!-- rstudio -->
    <img style="display:block; object-fit: contain;" width="100%" height="100%" src="" />
    </span>
    </button>
    </a>
</td>
<td>
    <a href="http://localhost/vscode" target="_blank">      
    <button class="button" style="vertical-align:middle">
    <span> <!-- vscode -->
    <img src="" />
    </span>
    </button>
    </a>
</td>
</tr>
</tbody>
</table>

</body>
</html>

Then, once you have the container up and running, connecting to http://localhost:80 will load the following .html page with the following buttons:

A screenshot of the static webpage. Each button opens a separate service running inside the container.

The dockerfile

Finally, inside the root folder (i.e. in dockerDS) you will need to create the following dockerfile:

Create /dockerDS/datascience.dockerfile with the following content:

FROM ubuntu:22.04 AS build

# https://stackoverflow.com/a/65054865 
ENV DEBIAN_FRONTEND noninteractive

# ARG declarations - only available during building
## https://quarto.org/docs/download/
ARG QUARTO_VERSION=1.3.450
## https://cran.r-project.org/bin/windows/base/
ARG R_VERSION=4.3.2
## https://packagemanager.posit.co/client/#/repos/cran/setup?snapshot=latest
ARG CRAN=${1:-${CRAN:-"https://packagemanager.posit.co/cran/__linux__/jammy/2023-12-08"}}
## https://www.python.org/downloads/
ARG PYTHON_VERSION=3.11.7
## https://posit.co/download/rstudio-desktop/
ARG RSTUDIO_VERSION=2023.09.1-494
## https://github.com/rstudio/tinytex-releases (environmental variables are used in the "install-bin-unix.sh" script)
ARG TINYTEX_INSTALLER=TinyTeX
ARG TINYTEX_VERSION=2023.12
## https://github.com/stan-dev/cmdstan/releases/
ARG STAN_VERSION=2.33.1
## https://deb.nodesource.com/
ARG NODE_MAJOR=20
## https://github.com/coder/code-server/releases
ARG VSCODE_VERSION=4.19.1
## Used to setup a user for RStudio server
ARG DEFAULT_USER=${2:-${DEFAULT_USER:-"rstudio"}}

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Ubuntu apps and packages
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

RUN apt-get update --fix-missing  \
    && apt-get upgrade -yq \
    && apt-get install -yq --no-install-recommends \
    sudo \
    curl ca-certificates gnupg \
    locales git zip unzip wget make perl \
    supervisor nginx

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Configure default locale, see also: https://stackoverflow.com/a/38553499
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

#RUN apt-get update --fix-missing && apt-get install -y locales lsb-release
RUN localedef -i en_US -f UTF-8 en_US.UTF-8
ENV LANG=${LANG:-"en_US.UTF-8"}
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
RUN /usr/sbin/locale-gen --lang "${LANG}"
RUN /usr/sbin/update-locale --reset LANG="${LANG}"

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Configure non-root user for RStudio server (required)
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

## Note: user must be created BEFORE any software is installed, otherwise it might not be available for that user
RUN useradd -s /bin/bash -m "$DEFAULT_USER" \
    && echo "${DEFAULT_USER}:${DEFAULT_USER}" | chpasswd \
    && usermod -a -G staff "${DEFAULT_USER}" \
    && mkdir -p "/home/${DEFAULT_USER}/.config/rstudio/" \
    && chown -R "${DEFAULT_USER}:${DEFAULT_USER}" "/home/${DEFAULT_USER}"

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Install NodeJS (https://deb.nodesource.com/)
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

RUN curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg \
    && echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_${NODE_MAJOR}.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list \
    && apt-get update && apt-get install nodejs -y \
    && npm install -g npm@latest

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Install Quarto ( https://docs.posit.co/resources/install-quarto/ )
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

RUN curl -o quarto.tar.gz -L https://github.com/quarto-dev/quarto-cli/releases/download/v${QUARTO_VERSION}/quarto-${QUARTO_VERSION}-linux-amd64.tar.gz \
    && mkdir -p /opt/quarto/${QUARTO_VERSION} \
    && tar -zxvf quarto.tar.gz -C "/opt/quarto/${QUARTO_VERSION}" --strip-components=1 \
    && rm -f quarto.tar.gz \
    && ln -s /opt/quarto/${QUARTO_VERSION}/bin/quarto /usr/local/bin/quarto

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Install R ( https://docs.posit.co/resources/install-r/ )
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

# Required libraries for R:
RUN apt-get update --fix-missing && apt-get install -yq gdebi-core
# Download and install R:
RUN curl -O https://cdn.rstudio.com/r/ubuntu-2204/pkgs/r-${R_VERSION}_1_amd64.deb \
    && apt-get install -yq --no-install-recommends ./r-${R_VERSION}_1_amd64.deb \
    && rm -f ./r-${R_VERSION}_1_amd64.deb \
    && ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R \
    && ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript

# Add a default CRAN mirror
RUN echo "options(repos = c(CRAN = '${CRAN}'), download.file.method = 'libcurl')" >> "opt/R/${R_VERSION}/lib/R/etc/Rprofile.site"
# Set HTTPUserAgent for Posit Pacakge Manager:
RUN echo "options(HTTPUserAgent = sprintf('R/%s R (%s)', getRversion(), paste(getRversion(), R.version['platform'], R.version['arch'], R.version['os'])))" >> "opt/R/${R_VERSION}/lib/R/etc/Rprofile.site"
# https://docs.rstudio.com/rspm/admin/serving-binaries/#binaries-r-configuration-linux

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Install Python ( https://docs.posit.co/resources/install-python/ )
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

RUN curl -O https://cdn.rstudio.com/python/ubuntu-2204/pkgs/python-${PYTHON_VERSION}_1_amd64.deb \
    && apt-get install -yq --no-install-recommends ./python-${PYTHON_VERSION}_1_amd64.deb \
    && rm -rf python-${PYTHON_VERSION}_1_amd64.deb \
    && /opt/python/${PYTHON_VERSION}/bin/python3 -m pip install --upgrade pip \
    && /opt/python/${PYTHON_VERSION}/bin/python3 -m pip install --upgrade setuptools

#RUN ln -s /opt/python/${PYTHON_VERSION}/bin/python3 /usr/bin/python
RUN echo "PATH=/opt/python/${PYTHON_VERSION}/bin:$PATH" >> ~/.bashrc
#RUN echo "export PATH=/opt/python/${PYTHON_VERSION}/bin:$PATH" >> ~/.bashrc

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Install RStudio ( https://posit.co/download/rstudio-server/ )
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

# Required libraries for RStudio:
RUN apt-get update --fix-missing \
    && apt-get install -yq psmisc libssl-dev libclang-dev libclang-dev libpq5
# Download and install RStudio:
RUN curl -o rstudio-server.deb "https://download2.rstudio.org/server/jammy/amd64/rstudio-server-${RSTUDIO_VERSION}-amd64.deb" \
    && dpkg -i rstudio-server.deb  \
    && rm -rf rstudio-server.deb \
    && ln -fs /usr/lib/rstudio-server/bin/rstudio-server /usr/local/bin \
    && ln -fs /usr/lib/rstudio-server/bin/rserver /usr/local/bin

# https://github.com/rocker-org/rocker-versioned2/issues/137
RUN rm -f /var/lib/rstudio-server/secure-cookie-key
# https://docs.posit.co/ide/server-pro/load_balancing/configuration.html#lock-configuration
RUN echo "lock-type=advisory" >/etc/rstudio/file-locks

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Configuration for RStudio server
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

COPY /config/rstudio/rstudio-prefs.json /etc/rstudio/rstudio-prefs.json
COPY /config/rstudio/rsession.conf /etc/rstudio/rsession.conf
COPY /config/rstudio/rserver.conf /etc/rstudio/rserver.conf
COPY /config/rstudio/logging.conf /etc/rstudio/logging.conf

# https://docs.posit.co/ide/server-pro/rstudio_pro_sessions/customizing_session_settings.html#keybindings
COPY /config/rstudio/addins.json /etc/rstudio/keybindings/addins.json
COPY /config/rstudio/editor_bindings.json /etc/rstudio/keybindings/editor_bindings.json
COPY /config/rstudio/rstudio_bindings.json /etc/rstudio/keybindings/rstudio_bindings.json

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Install TinyTeX (https://yihui.org/tinytex/)
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

# Note: must be AFTER non-root user is created, otherwise quarto does not detect tinytex
RUN curl -sL "https://yihui.org/tinytex/install-bin-unix.sh" | sh \
    && /root/.TinyTeX/bin/*/tlmgr path remove \
    && mv /root/.TinyTeX/ /opt/TinyTeX \
    && /opt/TinyTeX/bin/*/tlmgr option sys_bin /usr/local/bin \
    && /opt/TinyTeX/bin/*/tlmgr path add \
    && tlmgr update --self --all \
    && fmtutil-sys --all

# tlmgr info schemes
# tlmgr info collections

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Install Python libraries and JupyterLab
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

# Required libraries for some Python libraries (e.g. manim):
RUN apt-get update --fix-missing \
    && apt-get install -yq build-essential libcairo2-dev libpango1.0-dev ffmpeg
## Note: the following is only needed, if an incomplete bundle of packages is installed
# https://docs.manim.community/en/stable/installation/linux.html#optional-dependencies
RUN tlmgr install collection-basic amsmath babel-english cbfonts-fd cm-super ctex doublestroke \
                  dvisvgm everysel fontspec frcursive fundus-calligra gnu-freefont jknapltx \
                  latex-bin mathastext microtype ms physics preview ragged2e relsize rsfs \
                  setspace standalone tipa wasy wasysym xcolor xetex xkeyval

COPY /config/python/requirements.txt /tmp/requirements.txt

RUN /opt/python/${PYTHON_VERSION}/bin/python3 -m pip install --upgrade pip \
    && /opt/python/${PYTHON_VERSION}/bin/python3 -m pip install -r /tmp/requirements.txt
# Disable jupyter announcement notification: https://stackoverflow.com/a/75552789
RUN /opt/python/${PYTHON_VERSION}/bin/jupyter labextension disable "@jupyterlab/apputils-extension:announcements"

## Install lsp language server for bash and markdown:
RUN npm install -g --save-dev bash-language-server unified-language-server
RUN npm cache clean --force

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Install R libraries
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

# Required libraries for some R libraries (see the commented-out `pak::...` code in 'install_libraries.R'):
# Note: make sure to exclude 'python3', `git`, `make`, since we already have them installed
RUN apt-get update --fix-missing \
    && apt-get install -yq libcurl4-openssl-dev libssl-dev libgit2-dev zlib1g-dev pandoc \
    libfreetype6-dev libjpeg-dev libpng-dev libtiff-dev libicu-dev \
    libfontconfig1-dev libfribidi-dev libharfbuzz-dev libxml2-dev imagemagick \
    libmagick++-dev gsfonts cmake libgdal-dev gdal-bin \
    libgeos-dev libproj-dev libsqlite3-dev libudunits2-dev libzmq3-dev pandoc-citeproc

COPY /config/rstudio/environment.txt /tmp/environment.txt
COPY /config/rstudio/install_libraries.R /tmp/install_libraries.R

RUN Rscript /tmp/install_libraries.R /tmp/
RUN ln -s /opt/python/${PYTHON_VERSION}/bin/jupyter /usr/bin/jupyter
RUN Rscript -e "IRkernel::installspec(user=FALSE)"
# Make sure that Python can be found:
RUN echo "Sys.setenv(RETICULATE_PYTHON = '/opt/python/${PYTHON_VERSION}/bin/python')" >> "opt/R/${R_VERSION}/lib/R/etc/Rprofile.site"
RUN echo "invisible(reticulate::py_available(initialize = TRUE))" >> "opt/R/${R_VERSION}/lib/R/etc/Rprofile.site"

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Configure CmdStan
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

# https://github.com/stan-dev/cmdstan
# https://mc-stan.org/cmdstanr/reference/install_cmdstan.html (C++ toolchain with "apt-get install make")
# https://mc-stan.org/cmdstanpy/installation.html#pypi-install-package-cmdstanpy
#RUN /opt/python/${PYTHON_VERSION}/bin/python3 -c "import cmdstanpy as stan; import multiprocessing; stan.install_cmdstan(version = '${STAN_VERSION}', dir = '/opt/cmdstan', cores = multiprocessing.cpu_count())"
#RUN Rscript -e "cmdstanr::install_cmdstan(version = '${STAN_VERSION}', dir = file.path('/opt/cmdstan'), cores = parallel::detectCores())"
#RUN echo "Sys.setenv(CMDSTAN = '/opt/cmdstan/cmdstan-${STAN_VERSION}')" >> "opt/R/${R_VERSION}/lib/R/etc/Rprofile.site"
#RUN echo "CMDSTAN='/opt/cmdstan/cmdstan-${STAN_VERSION}'" >> /etc/environment
# IMPORTANT: env variable `CMDSTAN` is set in the supervisor config section! using 'ENV CMDSTAN=...' doesn't work for both rstudio and root users.

## Alternative: install cmdstan in both the `root` and `rstudio` users:
RUN /opt/python/${PYTHON_VERSION}/bin/python3 -c "import cmdstanpy as stan; import multiprocessing; stan.install_cmdstan(version = '${STAN_VERSION}', dir = '/root/.cmdstan', cores = multiprocessing.cpu_count())"
RUN /opt/python/${PYTHON_VERSION}/bin/python3 -c "import cmdstanpy as stan; import multiprocessing; stan.install_cmdstan(version = '${STAN_VERSION}', dir = '/home/rstudio/.cmdstan', cores = multiprocessing.cpu_count())"

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Configure JupyterLab
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

## https://github.com/jupyter-lsp/jupyterlab-lsp#installation

COPY /config/python/jupyter_lab_config.py /root/.jupyter/jupyter_lab_config.py
COPY /config/python/jupyter_server_config.py /root/.jupyter/jupyter_server_config.py
COPY /config/python/ipython_config.py /opt/python/${PYTHON_VERSION}/etc/ipython/ipython_config.py

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Install VSCode Server (https://github.com/coder/code-server)
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

## https://github.com/linuxserver/docker-code-server/blob/master/Dockerfile
## https://github.com/coder/code-server
## https://coder.com/docs/code-server/latest/FAQ#how-do-i-install-an-extension

## Get a list of commands that install code-server:
## curl -fsSL https://code-server.dev/install.sh | sh -s -- --dry-run --version=${VSCODE_VERSION} --prefix=/usr/local
## Then run those commands:
RUN mkdir -p ~/.cache/code-server \
    && curl -#fL -o ~/.cache/code-server/code-server.deb.incomplete -C - https://github.com/coder/code-server/releases/download/v${VSCODE_VERSION}/code-server_${VSCODE_VERSION}_amd64.deb \
    && mv ~/.cache/code-server/code-server.deb.incomplete ~/.cache/code-server/code-server.deb \
    && dpkg -i ~/.cache/code-server/code-server.deb \
    && rm -rf ~/.cache/code-server
# which code-server

# Important: in order to connect without a proxy (like nginx), IP must be set to 0.0.0.0!
# https://github.com/coder/code-server/issues/1800#issuecomment-1341720112

## Install extensions: (only works if the 'extensions-dir' flag is set to /root/.local/share/code-server/extensions/)
## https://open-vsx.org
# https://open-vsx.org/extension/yzhang/markdown-all-in-one
RUN code-server --install-extension yzhang.markdown-all-in-one
# https://open-vsx.org/extension/ms-python/python
RUN code-server --install-extension ms-python.python
# https://open-vsx.org/extension/ms-pyright/pyright
RUN code-server --install-extension ms-pyright.pyright
# https://open-vsx.org/extension/REditorSupport/r
RUN code-server --install-extension REditorSupport.r
# https://open-vsx.org/extension/RDebugger/r-debugger
RUN code-server --install-extension RDebugger.r-debugger
# https://open-vsx.org/extension/quarto/quarto
RUN code-server --install-extension quarto.quarto
# https://open-vsx.org/extension/ms-toolsai/jupyter
RUN code-server --install-extension ms-toolsai.jupyter

COPY /config/code-server/config.yaml /root/.config/code-server/config.yaml

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Shared dir
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

RUN mkdir /media/container_shared/ \
    && chmod 777 /media/container_shared/

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Configure nginx
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

COPY /config/nginx/nginx.conf /etc/nginx/nginx.conf
COPY /config/www/index.html /www/static/index.html

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Cleanup
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

RUN apt-get install -yqf --no-install-recommends \
    && apt-get autoremove \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && rm -rf /var/tmp/* \
    && rm -rf /config/* \
    && rm -rf /tmp/* \
    && rm -rf /opt/conda/pkgs/* \
    && rm -rf /root/.conda/pkgs/* \
    && rm -rf /root/.cache/pip/* \
    && strip /opt/R/${R_VERSION}/lib/R/library/*/libs/*.so

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Supervisor
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
## Override the config file: /etc/supervisor/supervisord.conf
COPY /config/supervisor/supervisord.conf /etc/supervisor/supervisord.conf
# Copy RStudio Server config to supervisor:
COPY /config/supervisor/rserver.conf /etc/supervisor/conf.d/
# Copy JupyterLab config to supervisor
COPY /config/supervisor/jupyterlab.conf /etc/supervisor/conf.d/
# Copy VSCode Server config to supervisor
COPY /config/supervisor/codeserver.conf /etc/supervisor/conf.d/
# Copy nginx config to supervisor: https://stackoverflow.com/a/28099946
COPY /config/supervisor/nginxgo.conf /etc/supervisor/conf.d/

# Set CMDSTAN environmental variable:
# Note" .conf files must have some initial "environment=CMDSTAN=..." value
#RUN sed -i "s/environment=CMDSTAN=.*,/environment=CMDSTAN=\/opt\/cmdstan\/cmdstan-${STAN_VERSION},/g" /etc/supervisor/conf.d/rserver.conf
#RUN sed -i "s/environment=CMDSTAN=.*/environment=CMDSTAN=\/opt\/cmdstan\/cmdstan-${STAN_VERSION}/g" /etc/supervisor/conf.d/jupyterlab.conf
#RUN sed -i "s/environment=CMDSTAN=.*/environment=CMDSTAN=\/opt\/cmdstan\/cmdstan-${STAN_VERSION}/g" /etc/supervisor/conf.d/codeserver.conf

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Multi-stage building in order to reduce the amount of layers
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

# https://stackoverflow.com/a/56118557
FROM scratch
# https://specs.opencontainers.org/image-spec/annotations/?v=v1.0.1
LABEL org.opencontainers.image.title="Docker image for statistics, econometrics and data science" \
      org.opencontainers.image.description="Not for public registries" \
      org.opencontainers.image.authors="andrius.buteikis@mif.vu.lt" \
      org.opencontainers.image.url="" \
      org.opencontainers.image.version="1.2.5"

# Copy from existing docker image to reduce layers
COPY --from=build / /

# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# Finalize
# @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

#CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/supervisord.conf"]
# https://github.com/jupyter-lsp/jupyterlab-lsp#installation
# https://unix.stackexchange.com/a/207296
# https://unix.stackexchange.com/a/187148
CMD ["/bin/bash", "-c", "ln -sf / /media/container_shared/.lsp_symlink ; /usr/bin/supervisord -c /etc/supervisor/supervisord.conf"]

The multiple sets of tabs below contain a number of explanations for the commands in the datascience.dockerfile, organized in the order that they appear in the dockerfile.

  • We will be using Ubuntu 22.04 as our base image.
  • We set DEBIAN_FRONTEND to noninteractive to force installations use default answers to any of their interactive questions.
  • We then define a number of variables that contain the application versions, CRAN mirror and default user that will be used when building the docker file.
  • Finally, we install a number of applications that will let us configurte, download and install our software:
    • sudo - nedded to run some applications with super user privileges.
    • curl and wget - to download our software from the web.
    • ca-certificates and gnupg - needed to download NodeJS.
    • zip, unzip - to archive and extract our software.
    • make - utility to compile applications from source (needed for Stan)
    • perl - the perl programming language, used by some libraries in R.
    • locales - language and locale support, since R will need this information when installing packages (see here).
    • supervisor - allows us to launch and monitor multiple processes. This will let us launch RStudio, JupyterLab, code-server when starting the container.
    • nginx - a web server that can also be used as a reverse proxy. We will sue it to consolidate the number of urlks of our web applications into a single home page.
  • We then set a number of environmental variables for the locale.
  • Finally, we configure a default user, that will be used to login into RStudio with the following credentials:
    • username: rstudio
    • password: rstudio
    • We create this user at the beginning so that any software installed to /opt and /usr/local is readily available.

We install NodeJS following the official installation guide and upgrade npm to the latest version.

We install Quarto following the official installation guide.

We install R following the documentation at Posit. We also set the default CRAN mirror to a snapshot of a specific date that we’ve defined in the CRAN variable, as well as the HTTP user agent according to the documentation on serving binary packages.

We install Python following the documentation at Posit. We then upgrade pip and setuptools to the newest versions, as they are needed to download libraries and older versions may have expired url’s. We also make sure that this version of Python is available in our linux terminal by adding it to our .bashrc bash shell script.

We install RStudio Server. An example can be found at rocker-project.org project’s Github page, which has more customization options.

We COPY our configuration files from our machine to our docker image.

We install TinyTeX following an example on rstudio/r-base image on docker.

We install a few Ubuntu libraries, which may be needed for manim, including the optional LaTeX libraries.

We then COPY the /dockerDS/config/python/requirements.txt file, install the libraries and disable jupyter announcement notification on startup. Finally, we install the lsp language servers for bash and markdown, in case we need them.

If needed, the newest library versions can be installed with:

python -m pip install --upgrade pip
python -m pip install -U virtualenv numpy scipy numexpr matplotlib pandas 'modin[all]' statsmodels 'datatable>1.0.0' pymc 'cmdstanpy[all]' 'arviz[all]' bambi jupyterlab jupyterlab-lsp jupyter-cache scikit-learn tensorflow keras pyarrow polars duckdb tzdata datar patsy plotnine seaborn siuba streamlit great-tables skimpy beautifulsoup4 manim lckr-jupyterlab-variableinspector
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python -m pip freeze > requirements.txt

Note:

  1. python-lsp-server[all] installs pycodestyle, which doesn’t work well with jupyter inside docker as it needs, so, for now, we only use it for R.
  2. The latest datatable dev version can be found here.

We firstly install any Ubuntu libraries, that are required by some packages. You can check the required libraries via the pak library inside the container and get the list of apt-get install ... libraries by running the following code in R:

pak::pkg_sysreqs(lib_list, upgrade = FALSE)$install_scripts

where lib_list is the vector, containing the names of all the libraries inside the /dockerDS/config/rstudio/environment.txt file. If you get an error of some conflicts - chances are that a base/recommended library is in the list!

Then we create a symbolic link to make jupyterlab easier to start by linking it to /usr/bin/jupyter. After that we add the R kernel to jupyterlab and add our Python installation to R’s path so that the correct version of Python can be called from RStudio, if need be.

There are two ways to install [Stan]:

There is a very important difference between the two different cmdstan versions, with regards to the environmental variable CMDSTAN:

In R:

If the environment variable “CMDSTAN” exists at load time then its value will be automatically set as the default path to CmdStan for the R session. If the environment variable “CMDSTAN” is set, but a valid CmdStan is not found in the supplied path, the path is treated as a top folder that contains CmdStan installations. In that case, the CmdStan installation with the largest version number will be set as the path to CmdStan for the R session.

In Python:

CmdStanPy uses the environment variable CMDSTAN to register the CmdStan installation location.

For this reason, we would need to set CMDSTAN to a specific version, otherwise the cmdstanpy will not work correctly. Since we want to access Stan from the rstudio user while inRStudio and the root user while in JupyterLab - we install Stan for both root and rstudio. While this takes up an additional 1 GB of data, but it is a more stable solution, since, in some cases, the environmental variable CMDSTAN wasn’t available in RStudio’s markdown, or quarto documents, as well as chunks with Python code.

You can verify whether Stan works as expected via the terminal:

Rscript -e "cmdstanr::cmdstan_path()"
python -c "import cmdstanpy as stan; print(stan.cmdstan_path())"
[1] "/home/rstudio/.cmdstan/cmdstan-2.33.1"
/home/rstudio/.cmdstan/cmdstan-2.33.1

Or by running a specific example directly in R:

library(cmdstanr)
#
a <- cmdstan_model(file.path(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan'))
b <- a$sample(data = list(N = 10, y = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1)),
              seed = 123, chains = 4, parallel_chains = 4, refresh = 500, 
              iter_warmup = 1000, iter_sampling = 1000, show_messages = FALSE)
b$summary()
This is cmdstanr version 0.7.1
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /home/rstudio/.cmdstan/cmdstan-2.33.1
- CmdStan version: 2.33.1

A newer version of CmdStan is available. See ?install_cmdstan() to install it.
To disable this check set option or environment variable CMDSTANR_NO_VER_CHECK=TRUE.
# A tibble: 2 × 10
  variable   mean median    sd   mad      q5    q95  rhat ess_bulk ess_tail
  <chr>     <dbl>  <dbl> <dbl> <dbl>   <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -7.26  -6.99  0.704 0.330 -8.76   -6.75   1.00    1952.    2046.
2 theta     0.250  0.234 0.119 0.121  0.0821  0.464  1.00    1552.    1621.

and Python:

import os
import pandas
pandas.set_option('display.max_columns', None)
pandas.set_option('display.width', 200)
import cmdstanpy as stan
#
a = stan.CmdStanModel(stan_file = os.path.join(stan.cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan'))
b = a.sample(data = {'N': 10, 'y': [0, 1, 0, 0, 0, 0, 0, 0, 0, 1]}, \
             seed = 123, chains = 4, parallel_chains = 4, refresh = 500, \
             iter_warmup = 1000, iter_sampling = 1000, show_progress = False)
b.summary()
21:43:21 - cmdstanpy - INFO - CmdStan start processing
21:43:21 - cmdstanpy - INFO - Chain [1] start processing
21:43:21 - cmdstanpy - INFO - Chain [2] start processing
21:43:21 - cmdstanpy - INFO - Chain [3] start processing
21:43:21 - cmdstanpy - INFO - Chain [4] start processing
21:43:21 - cmdstanpy - INFO - Chain [2] done processing
21:43:21 - cmdstanpy - INFO - Chain [4] done processing
21:43:21 - cmdstanpy - INFO - Chain [1] done processing
21:43:21 - cmdstanpy - INFO - Chain [3] done processing
          Mean      MCSE    StdDev        5%       50%      95%    N_Eff  N_Eff/s    R_hat
lp__  -7.26358  0.017219  0.704111 -8.764440 -6.988110 -6.75037  1672.04  23549.8  1.00184
theta  0.24995  0.002967  0.119325  0.082037  0.233825  0.46429  1617.52  22781.9  1.00221

We COPY our configuration files from our machine to our docker image.

Important

Depending on the version of JupyterLab, R and Python, sometimes code completion doesn’t fully work. While in some cases, this is due to a bug in one of the libraries (which will usually be fixed in a future version), some of the setting changes can be toggled in order to, hopefully, fix problems with code completion. If you do not get code autocompletion suggestions when pressing Tab on your code - you can try to modify the following settings:

  • See the sections below to fully build your docker image.
  • Launch the docker container and open JupyterLab.
  • On the top menu, go to Settings -> Editor Settings.
  • In the Settings menu you should see two Code completion tabs, these tabs are for the @jupyter-lsp/jupyterlab-lsp:completion and @jupyterlab/completer-extension:manager plugins in JupyterLab.
  • In one of the Code Completion tabs change the following settings:
    • Continuous hinting - change this to be on.
    • Wait for kernel if busy - change this to be off.
    • Prioritize completion from kernel - change this to be on.
    • Case-sensitive filtering - change this to be off.
    • Include perfect matches - change this to be on.
    • Pre-filter matches - change this to be off.
  • In the other Code Completion tab change the following settings:
    • Show the documentation panel - change this to be on.
    • Enable autocompletion - change this to be on.

Experimenting with turning on/off some of the above settings should return some of the autocompletion functionality.

We get the commands via a dry run:

curl -fsSL https://code-server.dev/install.sh | sh -s -- --dry-run --prefix=/usr/local

We then install extensions from open-vsx.org. Note that we install the pyright extension, since by default the Python extension does not contain pylance, as it is licensed for Microsoft products only. Finally, we COPY our configuration to the docker image.

We create a container_shared folder inside /media, that we will use a shared volume between our machine and the container. We also make the directory have 777 permission, which grants read, write, and execute permission to every user in our container, ensuring that we will be able to read and modify files/folder inside our shared directory.

We COPY the nginx configuration and our static .html webpage to the docker image.

We remove any temporary files to reduce the size of our docker image.

We COPY the configurations of each service, which will be controlled by supervisor.

To further reduce the size of our image via multi-stage building, which reduces the number of our docker layers. This reduces the final docker image by around 2 to 3 GB. We also add a numebr of labels following the OpenContainers Annotations Specification.

As a final step, we set supervisor to launch whenever we start our docker container. Note that we want to enable R documentation and code completion in JupyterLab, so we follow the documentation on github and create a symbolic link from our shared directory (i.e. the Jupyter root directory), which points to our system root /. We need to do this each time we start our container, in case we want to mount a different directory.

Building our docker image

We can now build our image by running the following command in our terminal:

run in your machines's terminal
docker build -f ./datascience.dockerfile -t datascience --pull=false .

It should take between 1 to 2 hours to build the docker image.

In multi-stage building, we can build up to a specific stage. In our case, we may want to stop after the build stage (the code between FROM ubuntu:22.04 AS build and COPY --from=build / /:

run in your machines's terminal
docker build --target build -f ./datascience.dockerfile -t datascience --pull=false .

We can inspect the number of layers in our docker build:

run in your machines's terminal
docker inspect --format '{{range .RootFS.Layers}}{{println .}}{{end}}' datascience:latest

Note that we wanted to reduce the number of layers to manage the build cache.

The built image history can be viewed with the followin command:

run in your machines's terminal
docker image history datascience:latest

We can view our created images as follows:

run in your machines's terminal
docker image ls

And finally, we can view the disk usage (built images, active containers and build cache):

run in your machines's terminal
docker system df

Running the prepared image

Firstly, make sure that you change the directory (via cd) inside the terminal to where you want to access your projects. Then, we can create and run a new (temporary) container from our build image via one of the following commands:

  • If we are using a windows powershell terminal:
run in your machines's terminal
docker run --rm --name ds-core -v ${PWD}/projects:/media/container_shared/ -p 80:80 datascience
  • If we are using a windows cmd terminal:
run in your machines's terminal
docker run --rm --name ds-core -v "%cd%":/media/container_shared/ -p 80:80 datascience

Note the following arguments:

  • --rm - automatically remove the container when it exits.
  • --name ds-core - we set our container name
  • -v ${PWD}/projects:/media/container_shared/ means that we bind mount a volume - we use our current working directory from the terminal (${PWD}, or "%cd%") and make it available inside the docker container’s /media/container_shared/ directory.
  • -p 80:80 - we bind a container’s port to the same host’s port.

If we want to limit the amount of hardware resources available to docker, we can modify our command with the following arguments (see here for more examples):

run in your machines's terminal
docker run --rm --cpus="0.5" --memory="2g" --memory-swap="20g" --name ds-core -v "%cd%":/media/container_shared/ -p 80:80 datascience

where:

  • --cpus - specify how much of the available CPU resources a container can use. If you have 1 CPU, setting --cpus="0.5" guarantees the container uses at most 50% of the CPU.
  • --memory - the maximum amount of memory the container can use, the minimum allowed value is "6m" (6 megabytes).
  • --memory-swap - the amount of memory this container is allowed to swap to disk. It only works if --memory is also set. Using swap allows the container to write excess memory requirements to disk when the container has exhausted all the RAM that’s available to it. There is a performance penalty for applications that swap memory to disk often. However, it is useful if your machine lacks the required amount of RAM. Note that:
    • --memory-swap represents the total amount of memory and swap that can be used, and --memory controls the amount used by non-swap memory. In other words the difference between --memory-swap and --memory is the amount of swap that can be used.
    • Setting it to -1 allows the container to use unlimited swap, up to the amount available on the host system.
    • If it is set to 0, then the value is treated as unset.
    • If the amount is set to the same value as --memory, then the container doesn’t have access to swap.
    • If --memory-swap is unset and --memory is set, the container can use as much swap as the --memory setting.
    • You need to configure swap memory on your on Windows host machine.

Finally, if we want to explore our image via bash inside our terminal, then we can run the following command:

run in your machines's terminal
docker run --rm --name ds-core -v ${PWD}/projects:/media/container_shared/ -p 80:80 -it datascience bash

where -it instructs docker to allocate a pseudoterminal connected to the container’s stdin and creating an interactive bash shell in the container.

Create a file named start_container.bat. In the code below replace "%cd%" with the directory in our OS that you want to access from the container. For example, you might have a projects folder in the following directory C:\Users\Andrius\Desktop\projects. You would then replace "%cd%" with C:\Users\Andrius\Desktop\projects. If your directory path name contains spaces - you would need to surround that directory with quotes.

start_container.bat
@echo off

setlocal
call :setESC

cls

REM stop container if it is running
docker stop ds-core

REM start "" /b cmd /c "timeout /nobreak 5 >nul & start "" http://localhost:80"
ECHO.
ECHO %ESC%[93m+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ %ESC%[0m
ECHO (1) Wait until you see %ESC%[96m "INFO success: ... entered RUNNING state ..."%ESC%[0m
ECHO (2) Open the following URL in your browser:%ESC%[92m http://localhost:80 %ESC%[0m
ECHO (3) If you want to%ESC%[91m stop/close%ESC%[0m this docker container then press %ESC%[91m CTRL+C %ESC%[0m
ECHO %ESC%[93m+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ %ESC%[0m
ECHO.%ESC%[96m 

docker run --rm --name ds-core -v "%cd%":/media/container_shared/ -p 80:80 datascience

ECHO.%ESC%[0m

pause

REM https://gist.github.com/mlocati/fdabcaeb8071d5c75a2d51712db24011#file-win10colors-cmd
:setESC
for /F "tokens=1,2 delims=#" %%a in ('"prompt #$H#$E# & echo on & for %%b in (1) do rem"') do (
  set ESC=%%b
  exit /B 0
)
exit /B 0

Opening this file should launch a windows terminal window and start the docker container.

Monitoring active containers

In a separate terminal window, run the following commands to view how much of our machine’s resources are being used by any active containers:

  • Current resource use:
run in your machine's terminal
docker stats --no-stream
  • Continuous resource usage monitoring (press Ctrl+C to close):
run in your machine's terminal
docker stats

While inside our docker container (or by opening the Terminal tab in RStudio server, or JupyterLab), we can check the disk usage of each directory inside the container by running the following :

run in your docker container's terminal
du -h -d 1 / -t +100MB 2> >(grep -v '^du:') | sort -hr
22G /
11G /media
6.7G    /opt
3.6G    /usr
1.1G    /home

Note that the /media directory contains our shared directory with files from our machine, so we can ignore it. And focus on the remaining directories:

run in your docker container's terminal
du -h -d 1 /opt | sort -hr
6.7G    /opt
4.5G    /opt/python
1.1G    /opt/R
836M    /opt/TinyTeX
306M    /opt/quarto
run in your docker container's terminal
du -h -d 2 /usr -t +100MB | sort -hr
3.6G    /usr
2.6G    /usr/lib
1.1G    /usr/lib/x86_64-linux-gnu
529M    /usr/lib/rstudio-server
455M    /usr/bin
364M    /usr/lib/code-server
310M    /usr/lib/llvm-14
291M    /usr/include
226M    /usr/share
162M    /usr/include/boost
131M    /usr/lib/gcc
run in your docker container's terminal
du -h -d 1 /home/rstudio | sort -hr
1.1G    /home/rstudio/.cmdstan
1.1G    /home/rstudio
520K    /home/rstudio/.local
488K    /home/rstudio/.cache
12K /home/rstudio/R
12K /home/rstudio/.config

We see that most of the space is occupied by R, RStudio, Python, Quarto, TinyTex and Stan, while the Ubuntu OS-specific libraries take up 2 to 3 GB of data.

Backing up and restoring the created image and clearing the build cache

Firstly, back up the docker image as a .tar file by running the following command in the terminal and saving it to the active directory in the terminal:

run in your machine's terminal
docker save -o ./datascience.tar datascience

The file should be approximately 12 GB in size. If you have 7-zip installed, you can reduce this size to only around 3.5 GB by creating a .tgz archive via the following command:

run in your machine's terminal
C:\"Program Files"\7-Zip\7z.exe a datascience.tgz datascience.tar -sdel

Then, delete the image inside docker (not the archive that we created) with the following command:

run in your machine's (powershell) terminal
docker rmi $(docker images 'datascience' -a -q)

After removing the image, we can clear the build cache to free-up space, since we won’t need to re-build our image:

run in your machine's terminal
docker builder prune --force

The build cache is used to make re-building the image quicker. In our case, the build cache could be anywhere from 25 GB or more (especially if we ran into errors and had to re-build).

We can then further inspect the total size taken up by everything else (e.g. other images that we have):

run in your machine's terminal
docker system df

If this was the only image the we’ve built - we should see 0B in Images/Containers/Local Volumes/Build Cache categories.

Finally, after clearing up some space, we load up our image into docker:

run in your machine's terminal
docker load -i ./datascience.tgz

If you did not create the .tgz archive - replace datascience.tgz with datascience.tar. It should take around 15-30 minutes to load the archived image into docker. We will only have to do this once after building the image. Any future containers will be created from this loaded image.

Finally, we run:

run in your machine's terminal
docker system df

to make note of our total image sizes. You should keep the datascience.tgz (or datascience.tar) archive in case you accidentally remove the image from docker and don’t want to re-build it from scratch.

If you want to remove everything run the following commands in your terminal:

run in your machine's (powershell) terminal
docker stop $(docker ps -a -q)
docker system prune -a --volumes --force

The first command stops all docker containers, while the second removes all unused containers/volumes/networks/images. Since we stop all containers with the first command - the seconds command will remove everything.


  1. Files have extensions, like .yaml, .conf, .txt, .R, .py, .dockerfile, etc., while folders do not have extensions.↩︎

  2. Click on each tab to get the different configuration files as well as explanations for each configuration file.↩︎