Multithreading Theory

12 minute read

Concurrency : looking for a new job, while working on the current job, during office hours.

Parallelism : maintaining 2 jobs, with 2 managers, without telling either manager

Asynchronous : While Brewing coffee, read emails and get back to coffee when it’s done.

In Java 1MB of Stack is allocated for each thread. The OS mandates that because, the Java thread is backed by the OS thread, which requires memory up front.

Note: the Java threads are basically a thin layer on top of the OS Threads, so creating a Java thread creates an underlying OS thread under the hood.

Why we need Threads

Responsiveness (achieved by Concurrency)

  • Critical with applications with user interface
  • Concurrency = Multitasking in the context:
    • Responsiveness can be achieved by using multiple threads, with a separate thread for each task
  • NOTE: We don’t need multiple cores to achieve Concurrency

Consider one thread serving one request at a time

Performance (achieved by Parallelism)

  • illusion with one core, truly parallel with multicore processors
  • Completing a complex task much faster
Core2 | Task3 -> Task4 -> Task3 -> Task4 -> Task3
Core1 | Task1 -> Task2 -> Task1 -> Task2 -> Task1

Caveat: Multithreaded Programming is fundamentally different from single-threaded programming

Process & Threads

All threads share Files, Heap, and Code

Image Code Link

process.png

Instruction Pointer

  • address of the next instruction to execute
  • Associated with each thread

Multiple Threads in EACH PROCESS Share

  • the Process’s open Files,
  • the Process’s metadata
  • Heap and
  • Code

Context Switching

The CPU switches from executing in the context of one thread to executing in the context of another is Context Switching

OS has to:-

  • Stop thread1 saving
    • the local data,
    • program pointer etc. of the current thread
  • Schedule thread1 out
  • Schedule Thread 2 in by loading
    • the local data,
    • program pointer etc. of the next thread to execute.
  • Start thread 2

This is not an economical operation and is the price (tradeoff) with Multitasking(concurrency)

  • same as human beings—takes time to focus on the next task after switching from the first
  • Store data of the current outgoing thread
  • restore data of the incoming thread (back into CPU and memory)

Issues with Context Switch

A large number of threads causes Thrashing (Spending more time in management than real productive work)

  • Threads consume fewer resources than Processes

Context Switching between threads from the same process is CHEAPER than context switch between different processes

Code Link

processWithThreads.png

Summary

Prefer Multithreaded architecture when

  • tasks share a lot of Data
  • Threads are much faster to create and destroy
  • Faster Context switches between threads of same process

Prefer Multi-Process architecture when

  • Security and Stability is of higher importance
    • Separate processes are completely isolated from each other
  • Tasks are unrelated to each other

Thread Scheduling

Each OS implements its own Thread Scheduling Algorithm

Scheduling can be either

  • preemptive (forcing the current task out) - Shortest Remaining Time First
    • forcibly interrupt and suspend the currently running task to switch to another task
  • non-preemptive (waits for the current task to finish first).

Assume 2 processes(A,B) with two threads(1,2) each running in a single core processor

  • the 4 tasks each have an arrival order and length of execution time.
Thread Arrival Order CPU Time
A1 0 4
A2 1 3
B1 2 2
B2 3 1

First-Come, First-Serve Scheduling

Time 0 Time 1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7 Time 8 Time 9
A1 A1 A1 A1 A2 A2 A2 B1 B1 B2

Problems

  • Thread Starvation - for the short process due to long-running processes

Shortest Job First Scheduling

Shortest Job First (SJF) scheduling algorithm selects the process with the shortest execution time from the ready queue.

Timeline Illustration with SJF (Non-preemptive)

First, finishing the current job in hand(A1)

Time 0 Time 1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7 Time 8 Time 9
A1 A1 A1 A1 B2 B1 B1 A2 A2 A2
A1 - 4 units of task       1 unit B1 - 2 units   A2 - 3 units task    

Problems If there are many short jobs, the longer jobs, now, can face starvation.

How it really works

Inside the Linux 2.6 Completely Fair Scheduler

  • OS divides the time into Epochs (moderately size pieces)
  • In each Epoch, the OS allocated different time slice for each thread

Not all the threads get to run or complete in each epoch.

The decision on how to allocate the time for each thread is based on a dynamic priority (The operating system maintains for each thread). $$ \text Dynamic Priority = \text Static Priority + Bonus $$

  • The static priority is set by the developer ahead of time.
  • And the bonus is adjusted by the operating system in every epoch for each thread

This way the operating system will give preference to interactive and real-time threads that need more immediate attention. And in the same time, it will give preference to computational threads that * *did not complete**, or did not get enough time to run in previous epics to prevent starvation.

Thread Lifecycle

https://docs.oracle.com/en%2Fjava%2Fjavase%2F22%2Fdocs%2Fapi%2F%2F/java.base/java/lang/Thread.State.html

State Description
NEW The thread is created but not yet started.
RUNNABLE The thread is ready to run but waiting for CPU time.
RUNNING The thread is actively executing.
BLOCKED The thread is waiting for a monitor lock.
WAITING The thread is waiting indefinitely for another thread’s action.
TIMED_WAITING The thread is waiting for another thread’s action for a specified time.
YIELD The thread is yielding control to allow other threads to execute.
TERMINATED The thread has completed execution or has been interrupted.

Yield

When a thread calls yield(), it suggests to the thread scheduler that it might be a good time to switch to another thread.

  • The thread transitions from RUNNING to RUNNABLE, allowing other threads to be scheduled for execution.
  • However, it’s not guaranteed that
    • the current thread will stop running immediately or
    • other threads will be scheduled right away.

Sleep

Thread.sleep(1000) instructs the operating system to not schedule the current thread until the given time passes.

During that time, the thread is not consuming any CPU.

History of multithreading in Java

Java 1: Threads

  • one set of API for all machines. hardware independent

Java 5 : ExecutorServices API -> Pool of threads

  • Issue 1: Pool-induced deadlock
  • One thread breaks the problem and throws in the pool and waits foe the result to come back
  • All the threads in pool just divided the work, and no thread left to take care of the problem

Java 7 : Fork Join pool

  • Work-stealing: the thread that divides the problem also solves one of the divided part

Java 8 : ParallelStreams and CompletableFutures

  • uses Java 7 FJP
  • Common Fork join pool

Java 21 : Virtual Threads

Handling the user requests - Architecture

A couple of different architectures are used by application servers for handling the user requests.

  • process-per-request model (old, CGI - Common Gateway Interface)[https://en.wikipedia.org/wiki/Common_Gateway_Interface]
  • thread per request model

Process-per-request model

In the 1990s, a popular mechanism for handling user requests was the CGI(common gateway interface)

In this architecture, when a user sends a request to the web server, the web server will invoke the associated CGI script as a separate process.

  • Once the response is sent back, the process is destroyed and this is an overhead.
  • This is an issue because a process in an operating system is considered heavyweight, starting and terminating a process for every request is inefficient.

To solve this, a variation of CGI was created and called fast CGI.

  • similar to having a pool of processes, and the user request is routed to one of the available processes.
  • There is no extra time spent on starting up a new process because it’s already up.

Thread-Per-Request Model

In the thread-per-request model, each incoming request is assigned a separate thread.

  • This thread handles the request, performs the necessary processing, and then completes.
  • it allows concurrent handling of multiple requests by allocating a dedicated thread for each request
Apache Tomcat
  • Tomcat is a popular open-source web server and servlet container that follows the thread-per-request model.
  • Tomcat maintains a pool of worker threads.
    • When a request arrives, a thread from the pool is assigned to handle the request.
    • Once the request is processed and a response is generated, the thread is returned to the pool for reuse.

Thread-Per-Task Threading model

Issues
  • Does not give optimal performance
    • when a thread is blocking IO, it cannot be used
    • Requires us to allocate more threads
    • Consuming more resources
    • Adding context switch overhead

Thread-Per-Core with Non-Blocking-IO

  • provides optimal performance

Task Types

  • IO Bound ()
  • CPU Bound (Compute-intensive)

What is IO

The CPU can access the memory directly at any time, so our program can read from or write to variables from the memory without the operating system’s involvement.

The CPU doesn’t have direct access to peripheral devices (mouse, keyboard, NIC, Monitor, Disk drive). CPU can communicate with the device-controller of each device to either send it some data or receive some data from it.

During the time that the specific IO device is doing its work, or when the device is idle, or the response from the network hasn’t arrived yet, the CPU can continue running other tasks.

There is Direct Memory Access (DMA) involved, acting as a buffer between Peripheral controller, RAM and CPU

Diagram Code dma.png

IO Bound Application

public List<Dto> getData(RequestBody req){
    Request request = parseIncomingRequest(req);//CPU Bound task
    Data data = service.getDataFromDb(request);//IO Bound Task, Thread is blocked until the task is done
    List<Dto> dtoList = mapper.map(data);//CPU Bound task
    return dtoList;
}

cpu-dma.png

Blocking vs Non blocking IO

IO opearations

  • socket reads/writes - used by DB calls, REST Calls, anything to do with the networks
  • file reads/writes - disk/slower memory access
  • concurrent locks - enforcing synchronization

Reactive Programming : to overcome scalability problems with IO

Blocking IO aka synchronous I/O

In Blocking-I/O operations, a thread that performs an I/O operation (such as reading from a file or network socket) is blocked until the operation completes.

  • During this time, the thread cannot perform other tasks.
  • This is a performance blocker

Synchronous call considering API2 depending on the results of the API1.

Non Blocking IO aka Asynchronous

In non-blocking I/O operations, a thread initiates an I/O operation and continues executing other code while the I/O operation is processed asynchronously.

  • This can improve performance and scalability, especially in systems with high I/O operations.

Asynchronous calls Considering API1 and API2 to be mutually exclusive

  • Futures and Callbacks

In terms of Thread, blocking va non-blocking tasks would look like blockingVsNonBlocking.png

Concurrency and Parallelism

Parallelism with dividable large task

  • Multiple dependent sub-tasks (larger task divided into smaller similar tasks) - all executing at the same time
  • Multiple CPU cores involved
  • No parallelism in a single core

Diagram task-per-core.png

Parallelism in Independent Tasks

  • Walk and Talk in parallel (2 independent tasks), exactly at the same time
  • 2 threads in 2 cores of the CPU
                       v
                       | [Time Slice v]
thread1-talk   T   T   T   T
thread2-walk   W       W
time  t=0--------------^-------------------->t

Concurrency Talk & Drink, but not at the Same time.

  • hold one task and resume it once another is done
  • can happen on a single core (2 threads in one core)
                     v                    v
                     |Time slice          |Another timeslice
thread1 - Talk   T   T    T   | T   T   T |  T  T
thread2 - Drink      |        D           D
time  t=0------------^--------^-----------^-------->t

io-calls.png

Asynchronous

Asynchronous means Non Blocking

  • when you make a method call, you don’t have to wait for it to complete
  • Do not block the thread of execution and wait to finish.
  • however, tasks are always blocking (default behavior of a thread)

For a Thread to be non-blocking, these 2 properties should meet

  • responsiveness : main thread should always delegate and be available for next step
    • Eg: Click on download button and then move on to doing other stuff.
      • if the main thread takes care of downloading, then the application is blocked until the download is finished.
  • Pre-emptible : the ability of a system or thread to be interrupted or preempted by other tasks or threads.
    • In a preemptible system, a running thread can be paused or stopped by the system scheduler to allow other threads or processes to execute.

Both parallel and Async processes run in a separate thread (other than the main-thread)

For parallelism, the thread needs to JOIN i.e.

  • the slowest process/thread determines the overall speed.
  • per hour production - refills(10), cap(20) and body(50).
    • Total pens per hour = 10

Asynchronous, do not wait for completion, but when results arrive, move on to do other things with the results.

  • use the call back to receive response
  • or use a promise

Asynchronous Task Execution Engine

Executor Service was introduced in Java 1.5

Execution Engine has

  • Work Queue (Blocking Queue)
  • Completion Queue
  • Thread Pool

As soon as the work is placed in the work queue, you get Future.

Future is a proxy or reference of the result that will be returned in the Future

Fork Join Framework (used in parallel stream behind the scenes) -> Java 7 ( Extends ExecutorService)

Async & non blocking programming Features

Call back

Callback - When the response is received, execute the function

    doSomething(data, response -> {...})
  • CallBack lacks consistency
  • Really hard to compose call backs
  • hard to deal with error

Creates Callback hell

Promise

When done with the task, update through the promise by any one of the three states

  • Pending
  • Resolved
  • Rejected

Promise has 2 channels -> data channel & error channel

Railway Track pattern

Treat error as another form of data and as first class citizens

data track  -----f------f     recovering from exception       f--or continue with then methods-----
                          \                                  /
error track ----------------f---can retrun default data-----f----or handle exception---------------
HappyPath==========================D==========D=======================
data -> function -> Promise -> f-> P -> f  -> P -> f -> P -> f-> P -> f       
UnhappyPath===========================================Exception==E=======

Daemon vs Non-Daemon threads

User Threads Daemon Threads (Backgroud threads)
JVM waits for user threads to finish their work. It will not exit until all user threads finish their work. JVM will not wait for daemon threads to finish their work. It will exit as soon as all main thread finish their work.
User threads are high priority threads. Daemon threads are low priority threads.
JVM will not force the user threads to terminate. It will wait for user threads to terminate themselves. JVM will force the daemon threads to terminate if all user threads have finished their work.