Jupyter as a Presentation Research Tool


By Dongming Jin, 20 June 2018

During my PhD study, I've been questioning about, what is research?

It seems to me that research is about

  • looking at something,
  • over and over,
  • in all possible aspects.

During these iteractions, we may

  • narrow assumptions,
  • apply different methods,
  • collect more data,

and land new discovery.

So research is more like re-search.

Reproducibility is the ground base that all sciences are built on.

It is intuitive so that many of us have adopted or are going to use Python and Jupyter for research.

Okay. Python is great.

But it is no better than C, or ForTran without Jupyter!

Today, I would like to share my 2 cents after my graduation from

  • PhD in UT Arlington
  • LSST Data Science Fellowship Program

Container and Docker

What are containers?

  • Just like Virtual Machines (VMs) or Virtual Environments, they are sandboxes that leverage over isolation and computing resources

Why should I use it?

  • with Docker, it requires minimum effort to deploy
  • cross-platform, even on a raspberry pi
  • package a development environment with all of its dependencies into a standardized unit

We will hear and learn more in the afternoon sessions, so here is what I'm going to demo.

How to put an elephant into a refrigerator deploy a basic jupyter environment

  1. [x] download and install docker: based on your OS
  2. [x] start docker
  3. [ ] use your favorate terminal: $ docker pull jupyter/base-notebook, documentations

Ta-da!

The essential way

  • customize Dockerfile based on need
# Copyright (C) 2018 by Dongming Jin
# Licensed under the Academic Free License version 3.0
# This program comes with ABSOLUTELY NO WARRANTY.
# You are free to modify and redistribute this code as long
# as you do not remove the above attribution and reasonably
# inform recipients that you have modified the original work

FROM jupyter/base-notebook

MAINTAINER Dongming Jin "dongming.jin@mavs.uta.edu"

# Switch account to root and adding user accounts and password. or not
USER root
RUN echo "root:Docker!" | chpasswd

USER jovyan

# add necessary ingredients
RUN conda install -y \
                matplotlib \
                seaborn \
                pandas \
                Pillow \
                missingno \
                scikit-learn
  • build from Docker file $ docker build -t jupyslides path/to/Dockerfile

Ta-da!

How to use it

  • I don't know what I need yet.
    • just the jupyter env: $ docker run -it [--rm] -p 10000:8888 jupyslides
    • full access: $ docker run -it [--rm] -p 10000-10010:8888-8898 jupyslides /bin/bash
  • I know what I need
    • add local notebooks: -v path/to/notebooks:/home/jovyan/notebooks
    • add local data-v path/to/data:/home/jovyan/data

Briefs to master Docker, TLDR

Concepts

  • Images - The blueprints of our application which form the basis of containers. In the demo above, we used the docker pull command to download the busybox image.
  • Containers - Created from Docker images and run the actual application. We create a container using docker run which we did using the busybox image that we downloaded. A list of running containers can be seen using the docker ps command.
  • Docker Daemon - The background service running on the host that manages building, running and distributing Docker containers. The daemon is the process that runs in the operating system to which clients talk to.
  • Docker Client - The command line tool that allows the user to interact with the daemon. More generally, there can be other forms of clients too - such as Kitematic which provide a GUI to the users.
  • Docker Hub - A registry of Docker images. You can think of the registry as a directory of all available Docker images. If required, one can host their own Docker registries and can use them for pulling images.

Usages

# manage command
docker build [--no-cache] -t img_name path/to/Dockerfile  # build image based on dockerfile
docker images  # all images
docker ps [-a]  # show [all] running containers

docker start/attach con_id  # restart excited container
       stop con_id  # stop container
       rm con_id  # rm container
       rmi img_id  # rm image

docker cp path/to/local con_id:path/to/target  # copy file to container

# execution command
docker run -it  # interactive
           --rm  # rm container when exit, use with caution
           -d  # detached, need to ssh to access
           -p 8888:80  # port fwd to host
           -e DISPLAY=$DISPLAY  # set environment variable
           -u docker  # username/uid
           -v <data_location>:~/data  # mount data directory
           --name="rdev"  # container name
           ubuntu  # image name
           /bin/bash  # command

Okay. Docker is easy. Jupyter is awesome. What's the caveat?

Interactive Programming can be messy.

Execution order matters!

  • weird/wrong results: depressing
  • good result that cannot be replicated: desperate!

Pain in data loading and execution

  • I can stretch a leg for the moment
  • error out: okay, I can fix and continue
  • crashed: oh, crap!
  • last-minute check before report: ...

My Workflow learned from tears

  1. Start with Docker image jupyter/base-notebook, note down dependencies to Dockerfile
  2. Research phase
    • Jupyter Notebook: interactive programming
    • iPython: sequential execution check
  3. Presentation phase
    • construct Dockerfile, just add necessary ingredients
    • organize Notebook for presentation

Another example!

Now


Declaration: This is not the only method, but I really hope someone can give me a lecture at my first year.

I'm lucky to learn from

  • LSST DSFP: introduced to Python and almost everything in data science
  • TACC symposiums: HPC and data management in the era of Peta-byte scale
  • Bluewater webinars: Dataflow, workflow and methodologies
  • YOU!

The main course in today's menu:

$ jupyter nbconvert --to FORMAT notebook.ipynb

nbconvert: readthedocs

  • HTML: --to html
  • LaTex: --to latex
  • PDF: --to pdf
  • Slideshow: --to slides

Many of us are familiar with that already. But what is reveal.js for?

My motivation is hiding the codes.

Why?

  • It's distracting for the audiences: -- boring
  • It's distracting to the presenter: -- what if being questioned
  • It's distracting!: -- what if I/someone find something wrong!
  • ...

Ways to cover-up the code

  • HTML!

    • Creat a buttom
    • add <a href="javascript:code_toggle()"> [Toggle Code]</a> to each cell
  • nbextension

    • codefolding: $ jupyter nbextension enable codefolding/main
    • more extenstions to explore: spellchecker, Table of Contents, Autoscroll,...
  • nbconvert with --template template.tpl

    • hide/remove code in one step
    • your call for more functions: template

While editing the cells and checking the results,

I find,

  • RISE: in-situ, good for dev./workshop

    • Usage: automatic live rendering, based on reveal.js, more interactive
  • reveal.js: post-process, good for prod./presentation

  • Remark: if you know html and css well, and really like styling, which I am not.

How to?

My practical experience:

  • use RISE to edit Slideshow
    • alt-r, “Enter/Exit Live Reveal Slideshow”
    • shift-i, Toggle slide
    • shift-o, Toggle-subslide
    • shift-p, Toggle-fragment
    • alt + r: start/stop
    • ,: help
  • use reveal.js to convert Slideshow
    • $ jupyter nbconvert notebook.ipynb --to slides --reveal-prefix reveal.js [--template hidecode/rmcode.tpl]
    • get latest reveal.js in the same dir, $ wget https://github.com/hakimel/reveal.js/archive/3.6.0.tar.gz | tar -xvzf -
  • styling

    • open notebook.slides.html with text editor
    • edit the initilization at the end
    • examples
      • transition: None - Fade - Slide - Convex - Concave - Zoom
      • themes: Black (default) - White - League - Sky - Beige - Simple - Serif - Blood - Night - Moon - Solarized
      • zoom.js: Point of View
      • and more...
  • start the Slideshow in browser / html server

An example of the initialization.

<!--reveal.js initilization-->
<script>
require(
    {
      // it makes sense to wait a little bit when you are loading
      // reveal from a cdn in a slow connection environment
      waitSeconds: 10
    },
    [
      "reveal.js/lib/js/head.min.js",
      "reveal.js/js/reveal.js"
    ],

    function(head, Reveal){

        // Full list of configuration options available here: https://github.com/hakimel/reveal.js#configuration
        Reveal.initialize({
            controls: true,
            progress: true,
            history: true,
            center: true,

            transition: "slide",

            // Optional libraries used to extend on reveal.js
            dependencies: [
                { src: "reveal.js/lib/js/classList.js",
                  condition: function() { return !document.body.classList; } },
                { src: 'reveal.js/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },            
                { src: "reveal.js/plugin/notes/notes.js",
                  async: true,
                  condition: function() { return !!document.body.classList; } },
                { src: 'reveal.js/plugin/highlight/highlight.js', async: true, callback: function() { hljs.initHighlightingOnLoad(); } },
                { src: 'reveal.js/plugin/zoom-js/zoom.js', async: true }, // using zoom.js
            ]
        });

        var update = function(event){
          if(MathJax.Hub.getAllJax(Reveal.getCurrentSlide())){
            MathJax.Hub.Rerender(Reveal.getCurrentSlide());
          }
        };

        Reveal.addEventListener('slidechanged', update);

        function setScrollingSlide() {
            var scroll = true
            if (scroll === true) {
              var h = $('.reveal').height() * 0.95;
              $('section.present').find('section')
                .filter(function() {
                  return $(this).height() > h;
                })
                .css('height', 'calc(95vh)')
                .css('overflow-y', 'scroll')
                .css('margin-top', '20px');
            }
        }

        // check and set the scrolling slide every time the slide change
        Reveal.addEventListener('slidechanged', setScrollingSlide); 
    }
);
</script>

Wrap Up

  • Docker makes Container easy: reproducible env.
  • iPython script is a good mate for Jupyter notebook in Research: reproducibility!
  • RISE + nbconvert + reveal.js + template = quick and neat Slideshow

Workflow

docker pull jupyter/base-notebook
docker run -it --rm -p 10000:8888 -v local/notebook:/work jupyter/base-notebook
# resolve dependency; do research with iPython in terminal + notebook
# New Text File for template.tpl
jovyan/terminal: jupyter nbconvert notebook.ipynb --to slides --reveal-prefix reveal.js [--template hidecode/rmcode.tpl]
jovyan/terminal: wget https://github.com/hakimel/reveal.js/archive/3.6.0.tar.gz | tar -xvzf -
jovyan/terminal: vi notebook.slides.html  # edit the initialization script
# open notebook.slides.html

One more thing

Jupyter Slideshow should be re-organized for serious Presentation

Especially Dissertation Defense!

  1. Suggestion For Giving Talks by Robert Geroch

    • Have a clear point, and keep it simple.
    • Figures are better than words, words are better than equations.
    • Every mark you put on a slide should serve a purpose, and be understood by the audience.
    • Don't let cool animations or clever presentation technology get in the way of delivering a clear message.
  2. Talk on Talks by Lucianne Walkowicz

    • Consider your audience with Empathy
    • Modular the contents
    • and so on
  3. The order of work is not necessary the order of talk!

Thank You

contact: domi.kingdom@gmail.com

Resume: 05/26/2018 cache

This talk: lsst.domij.info