Apptainer image configuration for the Costs and Benefits module

Apptainer

Apptainer is a container technology that simplifies the creation and execution of containers. Apptainers represents an alternative to Docker in scientific computing.

We are going to use Apptainer in order to create a image for the Costs and Benefits module.

The Apptainer installations steps can be found in the next URL:

Download and interact with pre-built images

We can download pre-built images from repositories like:

  • https://hub.docker.com
  • https://quay.io

Docker Hub download example : apptainer pull docker://alpine

Quay download example : apptainer pull docker://quay.io/jitesoft/alpine

Image configuration from sandbox

The image configuration steps are the next

The steps to configure the image using the sandbox format are as follows:

    1. Build the image in sandbox format:
apptainer build --sandbox <URI> | <imagen>
    1. Request a shell in the sandbox directory:
apptainer shell --writable <DIR>
    1. Make the required configurations.
    1. Convert the sandbox to SIF format:
apptainer build imagen.sif <DIR>

Costs and Benefits module

The Costs and benefits repository contains data for 26 LAC countries.

The 26 LAC countries (ISO CODE 3) are the next: ARG, BHS, BRB, BLZ, BOL, BRA, CHL, COL, CRI, DOM, ECU, SLV, GTM, GUY, HTI, HND, JAM, MEX, NIC, PAN, PRY, PER, SUR, TTO, URY, VEN

To scale the analysis to more countries, it is necessary to add information to the following files:

  • AGRC_LVST_productivity_cost_gdp.csv
  • ENTC_REDUCE_LOSSES_cost_file.xlsx (sheet name Annual Loss Reduction Cost).
  • LNDU_soil_carbon_fractions.csv

We will use K-means to construct groups with the objective of imputing the existing values ​​to the rest of the countries.

Building the image for the Costs and Benefits module execution

1) Build the image in sandbox format

We will configure the image to run SISEPUEDE from a pre-built Ubuntu 22.04.4 LTS image from Docker Hub:

apptainer pull docker://ubuntu:22.04

We will use the sandbox format to make changes to the container image. Next we will build the image defining steps in a definition file.

The downloaded image is immutable. Since we must to do configuration changes, we will create a sandbox from the image:

apptainer build --sandbox cb_module ubuntu_22.04.sif

2) Request a shell in the sandbox directory

We request a shell in the container generated by Apptainer from the sandbox format:

apptainer shell -pfw --no-mount home cb_module

The flags used mean the following:

  • -w, --writable : by default all Apptainer containers are available as read only. This option makes the file system accessible as read/write.
  • -f, --fakeroot : run container with the appearance of running as root
  • -p, --pid : run container in a new PID namespace

Inside the sandbox we have to update packages:

Apptainer> apt update

We add Apptainer> at the beginning of the prompt In order to make explicit that we are working in a new shell within the container and we can interact with it as though it were a virtual machine.

Python 3.11 installation in the sandbox:

Apptainer> apt install -y python3.11 python3.11-venv python3-venv python3-dev python3-pip

Create the file requirements.txt for the Python packages installation:

Apptainer> echo "
pandas==2.2.1
scikit-learn
openpyxl
munch==2.5.0
PyYAML==6.0
geopy==2.1.0
SQLAlchemy==2.0.29
julia==0.6.2
" > requirements.txt

Apptainer> pip install -r requirements.txt

Install git to download the Costs and Benefits repository:

Apptainer> apt install -y git
Apptainer> cd /opt
Apptainer> git clone https://github.com/nidiot/sisepuede_costs_benefits.git

Clone the SISEPUEDE repository:

Apptainer> git clone https://github.com/jcsyme/sisepuede.git

Create a directory where the country file will be read in ISO Code 3:

Apptainer> mkdir -p /opt/sisepuede_data/Energy/nemomod_entc_residual_capacity_pp_gas_gw/raw_data
Apptainer> cd /opt/sisepuede_data/Energy/nemomod_entc_residual_capacity_pp_gas_gw/raw_data
Apptainer> apt install -y wget
Apptainer> wget https://raw.githubusercontent.com/milocortes/sisepuede_data/main/Energy/nemomod_entc_residual_capacity_pp_gas_gw/raw_data/iso3_all_countries.csv
Apptainer> cd /opt

Install R and packages needed:

Apptainer> apt install -y r-base r-base-dev
Apptainer> LC_ALL=C.UTF-8 R -e 'install.packages(c("purrr", "stringr", "tidyr", "data.table", "readxl", "dplyr", "reshape2", "lhs", "reshape"))'

Download the Python program that impute data for the rest of countries with K-Means

Apptainer> wget https://raw.githubusercontent.com/milocortes/sisepuede_data/main/utils/actualiza_datos_cb.py
Apptainer> python3 actualiza_datos_cb.py

We override some of the program lines in the Costs and Benefits repository, particularly those related to local routes:

Apptainer> mkdir -p /opt/ssp
Apptainer> mkdir -p /opt/cb
Apptainer> FILE_CB="/opt/sisepuede_costs_benefits/Main/cb_calculate_costs_and_benefits_script.R"
Apptainer> sed -i '6s/.*/setwd("\/opt\/sisepuede_costs_benefits\/Main\/")/' $FILE_CB
Apptainer> sed -i '13s/.*/path_to_model_results<-"\/opt\/cb\/"/' $FILE_CB
Apptainer> sed -i '14s/.*/path_to_ssp_results<-"\/opt\/ssp\/"/' $FILE_CB
Apptainer> sed -i '15s/.*/data_filename<-paste0(path_to_ssp_results,/' $FILE_CB
Apptainer> sed -i '16s/.*/                       list.files(path=path_to_ssp_results, /' $FILE_CB
Apptainer> sed -i '17s/.*/                                 pattern = glob2rx("sisepuede_results_sisepuede_run_*"))) #path to model output runs/' $FILE_CB
Apptainer> sed -i '22s/.*/primary_filename<-paste0(path_to_ssp_results, "ATTRIBUTE_PRIMARY.csv") #path to model output primary filename/' $FILE_CB
Apptainer> sed -i '23s/.*/strategy_filename<-paste0(path_to_ssp_results, "ATTRIBUTE_STRATEGY.csv") #path to model output strategy filename/' $FILE_CB
Apptainer> CB_CONFIG_FILE="/opt/sisepuede_costs_benefits/Main/cb_config.R"
Apptainer> sed -i '19s/.*/sisepuede_data_git_path<-"\/opt\/sisepuede_data\/"/' $CB_CONFIG_FILE
Apptainer> sed -i '20s/.*/ssp_costs_benefits_git_path<-"\/opt\/sisepuede_costs_benefits\/"/' $CB_CONFIG_FILE
Apptainer> sed -i '195,$d' $FILE_CB

Execute bash file that fix some csv files:

Apptainer> cd /opt/sisepuede_costs_benefits/cost_factors
Apptainer> bash append_newlines_to_csvs.sh

Apptainer> cd /opt/sisepuede_costs_benefits/strategy_specific_cb_files
Apptainer> bash append_newlines_to_csvs.sh

Apptainer> cd /opt

Execute Costs and Benefits module

In order to execute the Costs and Benefits module, we need data from the SISEPUEDE model to be processed.

Since the sandbox is viewed for our operating system like another directory in the system, we can move files between the host system and the sandbox using commands like cp, mv, rsync, etc.

Suppose that we have the zip file ssp_armenia.zip that contains the outputs of the SISEPUEDE model:

unzip -l ssp_armenia.zip 
Archive:  ssp_armenia.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
  6428502  2024-04-30 19:52   sisepuede_results_sisepuede_run_armenia.csv
      114  2024-04-30 19:46   ATTRIBUTE_PRIMARY.csv
      691  2024-04-30 19:41   ANALYSIS_METADATA.csv
  4012586  2024-04-30 19:41   MODEL_BASE_INPUT_DATABASE.csv
  3333377  2024-04-30 19:46   MODEL_OUTPUT.csv
  3223195  2024-04-30 19:46   MODEL_INPUT.csv
      460  2024-04-30 19:41   ATTRIBUTE_DESIGN.csv
    16745  2024-04-30 19:41   ATTRIBUTE_STRATEGY.csv
---------                     -------
 17015670                     8 files

The ssp_armenia.zip file can be downloaded from the URL:

We must copy this file to the directory /opt/ssp inside the sandbox:

ls
total 33104
-rw-r--r--  1 milo milo     3081 May  1 14:34 cb_model.def
drwxr-xr-x 18 milo milo     4096 Apr 28 20:31 cb_module
-rw-r--r--  1 milo milo  4077591 May  1 14:34 ssp_armenia.zip
-rwxr-xr-x  1 milo milo 29810688 May  1 14:36 ubuntu_22.04.sif

cp ssp_armenia.zip cb_module/opt/ssp

Access to the sandbox again:

apptainer shell -pfw --no-mount home cb_module

And you will see the ssp_armenia.zip file in the directory /opt/ssp:

Apptainer> ls /opt/ssp
ssp_armenia.zip

Unzip the ssp_armenia.zip in the directory /opt/ssp:

Apptainer> apt install unzip
Apptainer>  unzip /opt/ssp/ssp_armenia.zip -d /opt/ssp
Archive:  /opt/ssp/ssp_armenia.zip
  inflating: /opt/ssp/sisepuede_results_sisepuede_run_armenia.csv  
  inflating: /opt/ssp/ATTRIBUTE_PRIMARY.csv  
  inflating: /opt/ssp/ANALYSIS_METADATA.csv  
  inflating: /opt/ssp/MODEL_BASE_INPUT_DATABASE.csv  
  inflating: /opt/ssp/MODEL_OUTPUT.csv  
  inflating: /opt/ssp/MODEL_INPUT.csv  
  inflating: /opt/ssp/ATTRIBUTE_DESIGN.csv  
  inflating: /opt/ssp/ATTRIBUTE_STRATEGY.csv 

So far we have already the environment for the Costs and Benefits module execution. Run the next command for execute the Costs and Benefits main program:

Apptainer> FILE_CB="/opt/sisepuede_costs_benefits/Main/cb_calculate_costs_and_benefits_script.R"
Apptainer> LC_ALL=C.UTF-8 Rscript $FILE_CB

The outputs of the execution are in the directory /opt/cb:

Apptainer> apt install tree
Apptainer> tree /opt/cb
/opt/cb
|-- cost_benefit_results.csv
|-- economy_wide_cost_benefit_results.csv
|-- net_benefit_net_ghg.csv
`-- sisepuede_results_TRIMMED_LONG.csv

0 directories, 4 files

Creating Apptainer image from Definition File

For a reproducible, verifiable and production-quality container, the Apptainer documentation recommends that you build a SIF file using an Apptainer definition file. The Apptainer definition file can be thinked like a Dockerfile that contains a script of instructions.

Create the definition file cb_model.def with the content:

Bootstrap : docker
From: ubuntu:22.04

%post
    apt update
    DEBIAN_FRONTEND=noninteractive TZ=America/Mexico_City apt -y install tzdata
    apt install -y python3.11 python3.11-venv python3-venv python3-dev python3-pip
    apt install -y git
    cd /opt
    git clone https://github.com/nidiot/sisepuede_costs_benefits.git
    git clone https://github.com/jcsyme/sisepuede.git
    mkdir -p /opt/sisepuede_data/Energy/nemomod_entc_residual_capacity_pp_gas_gw/raw_data
    cd /opt/sisepuede_data/Energy/nemomod_entc_residual_capacity_pp_gas_gw/raw_data
    apt install -y wget
    wget https://raw.githubusercontent.com/milocortes/sisepuede_data/main/Energy/nemomod_entc_residual_capacity_pp_gas_gw/raw_data/iso3_all_countries.csv
    cd /opt

    apt install -y r-base r-base-dev
    LC_ALL=C.UTF-8 R -e 'install.packages(c("purrr", "stringr", "tidyr", "data.table", "readxl", "dplyr", "reshape2", "lhs", "reshape"))'


    wget https://raw.githubusercontent.com/milocortes/sisepuede_data/main/utils/actualiza_datos_cb.py

    echo "
    pandas==2.2.1
    scikit-learn
    openpyxl
    munch==2.5.0
    PyYAML==6.0
    geopy==2.1.0
    SQLAlchemy==2.0.29
    julia==0.6.2
    " > requirements.txt

    pip3 install -r requirements.txt

    python3 actualiza_datos_cb.py


    mkdir -p /opt/ssp
    mkdir -p /opt/cb

    FILE_CB="/opt/sisepuede_costs_benefits/Main/cb_calculate_costs_and_benefits_script.R"
    sed -i '6s/.*/setwd("\/opt\/sisepuede_costs_benefits\/Main\/")/' $FILE_CB

    sed -i '13s/.*/path_to_model_results<-"\/opt\/cb\/"/' $FILE_CB

    sed -i '14s/.*/path_to_ssp_results<-"\/opt\/ssp\/"/' $FILE_CB

    sed -i '15s/.*/data_filename<-paste0(path_to_ssp_results,/' $FILE_CB

    sed -i '16s/.*/                       list.files(path=path_to_ssp_results, /' $FILE_CB

    sed -i '17s/.*/                                 pattern = glob2rx("sisepuede_results_sisepuede_run_*"))) #path to model output runs/' $FILE_CB

    sed -i '22s/.*/primary_filename<-paste0(path_to_ssp_results, "ATTRIBUTE_PRIMARY.csv") #path to model output primary filename/' $FILE_CB

    sed -i '23s/.*/strategy_filename<-paste0(path_to_ssp_results, "ATTRIBUTE_STRATEGY.csv") #path to model output strategy filename/' $FILE_CB

    CB_CONFIG_FILE="/opt/sisepuede_costs_benefits/Main/cb_config.R"

    sed -i '19s/.*/sisepuede_data_git_path<-"\/opt\/sisepuede_data\/"/' $CB_CONFIG_FILE

    sed -i '20s/.*/ssp_costs_benefits_git_path<-"\/opt\/sisepuede_costs_benefits\/"/' $CB_CONFIG_FILE

    sed -i '195,$d' $FILE_CB

    cd /opt/sisepuede_costs_benefits/cost_factors
    bash append_newlines_to_csvs.sh

    cd /opt/sisepuede_costs_benefits/strategy_specific_cb_files
    bash append_newlines_to_csvs.sh

    cd /opt

    apt install unzip zip

    echo "
    #!/bin/bash
    cp \$1 /opt/ssp
    cd /opt/ssp
    unzip *
    
    LC_ALL=C.UTF-8 Rscript $FILE_CB
    " > ejecuta-cb

%environment
    FILE_CB="/opt/sisepuede_costs_benefits/Main/cb_calculate_costs_and_benefits_script.R"

%runscript
    bash /opt/ejecuta-cb $*
    country=$2
    zip "cb_${country}.zip" -r -j /opt/cb

Build the SIF

apptainer build cb_cl.sif cb_model.def

We can create a symbolic link for cb_cl.sifin order to be executed like any other system executable:

SIF_PATH="$(pwd)/cb_cl.sif"
sudo ln -sv $SIF_PATH /usr/local/bin/cb_cl

We must to configure some Apptainer environment variables for the SIF execution:

export APPTAINER_WRITABLE_TMPFS="true"

Execute the SIF for the zip file ssp_armenia.zip that contains the outputs of the SISEPUEDE model:

./cb_cl.sif ssp_armenia.zip armenia

The execution return a zip file cb_armenia.zip with the SISEPUEDE model output on it.