What the Heck Is Threadsafe And Why Does CICS Care?

 

By

Don Fowler

MCE, Inc., June 2003

 

 

Threads in CICS

With CICS TS 2.2, enhancements have been made using the Open Transaction Environment (OTE) TCB mechanism brought out in CICS TS 1.3. These enhancements greatly reduce CPU cycles for transaction switching between CICS and DB2 TCBs by allowing the CICS and DB2 transaction to execute under the OTE TCB instead of the traditional Quasi-Reentrant (QR) TCB. 

A thread is the path a program takes while it runs, the steps it performs, and the order in which it performs the steps. A thread runs code from its starting location in an ordered, predefined sequence for a given set of inputs. The term thread is shorthand for "thread of control". You can use multiple threads to improve application performance by running different application tasks simultaneously.

A function is threadsafe if you can start it simultaneously in multiple threads within the same process. A function is threadsafe if and only if all the functions it calls are threadsafe also.

In order to fully appreciate the ramifications of using multiple TCBs or “threads” for a CICS transaction one must first look at where CICS has been and where it is now in the area of multi-threading.

 

Originally CICS executed under a single operating system Task Control Block (TCB). CICS required all of its functions and transactions to be coded as quasi-reentrant. The operating system (OS, VS, MVS, z/OS) TCB is the application’s execution control block and is associated with a single hardware processor. All CICS functions and applications ran under this single TCB and CICS internally handled all prioritization and dispatching of the CICS tasks. This lead to some situations where CICS users could be locked out or run slowly if certain issues arose.

 

Some of the more common issues were a runaway CICS task that locked all users in the CICS region associated with the currently operating TCB, or occasionally a CICS user task erroneously issued an operation that caused an operating system WAIT to be issued. Also the CICS environment could only be as fast as the speed of the processor where the TCB lived.

 

As newer versions of CICS were rolled-out, use of additional TCBs to relieve some of the above constraints was implemented. First the CICS program loader was placed under its own TCB to allow the concurrent operations of loading a new program for a user, while still executing a transaction for another user under the QR TCB. Then VSAM and DB2 were set up to execute under separate unique TCBs. This allowed moving the VSAM and DB2 workloads under separate task management at the operating system level which gives users much more workload management control, digression, and multiple CPU cycle usage to support CICS regions.  The primary dispatch mechanism remains the QR TDB.

 

For DB2 and CICS a separate TCB for each DB2 request from CICS is created. Consider this separate TCB a thread. The task is switched to the DB2 TCB for DB2 work and the DB2 system code (overhead) runs on the DB2 TCB. This moved significant workload from the primary QR TCB to the DB2 TCB. The cost for this threading is the operating system TCB switching done within the operating task system dispatcher between the QR and DB2 TCBs. This is a measurable amount of overhead in CPU cycles.  

 

To resolve this TCB switching overhead issue, IBM developed the Open Transaction Environment (OTE) concept. The OTE concept is that the CICS transaction runs under its own OTE TCB and DB2 executes under this same TCB. This allows for offloading workload from the QR TCB and making better use of the multiprocessor environment since the OTE TCB can execute on any processor within the complex. This is true implementation of multitasking in CICS since multitasking implies the operating system can treat discrete functions and facilities of CICS as unique tasks (threads) for dispatching wherever there are available CPU resources. Realize that the OTE TCB came into existence at CICS TS 1.3 for support of JAVA virtual machines. It just so happens that the same mechanism works very nicely for the DB2/CICS environment.

 

Figure 1 shows a graphic depiction of the OTE environment running multiple threadsafe transactions.

 

 Figure 1 – OTE Environment

 

The Basic Safety Problem With Threads

 

Thread safety means that nothing in the thread path (mainline or call subroutines or functions) can impact or damage data of any other thread. Take a CICS application that uses a common area such as the CWA to hold a counter used to make record keys unique. The CICS application, prior to the introduction of multiple TCBS that allow multitasking, would have used the single QR TCB as the serialization mechanism that ensured that one CICS task did not step on another CICS task’s use of the common storage areas. In essence there was a single thread in the single TCB system.

 

A recent New England CICS Users Group presentation done by R.E. Evans Consulting on threadsafe CICS applications used the example of the CWA with the following source code in Application A:

                        MOVE CWA-REC-COUNT TO KEY-UNIQUE-PORTION

ADD 1  TO CWA-REC-COUNT

EXEC CICS WRITE IMPORTANT_FILE

            RIDFLD(KEY-COMPLETE)         

 

Now take the case that there are at least two OTE TCBs (representing two threads through the same code path in Application A) in execution at the same time on two different processors. All serialization that running under one TCB gave you is lost. One of the OTE threads could be updating the CWA record-count while the other OTE thread was writing the important file record. Depending on timing one could receive a DUPREC error and the other might be using a unique-key calculated at least 1 off because of the “add 1” done by the other thread. This leads to data or logic corruption situations.

Only a little over thirty of all unique EXEC CICS Commands are deemed threadsafe. If CICS detects that the current command is a non-threadsafe command, it insures the task is running under the CICS QR TCB.  

The list of threadsafe commands is shown in Figure 2. Please note that even this list has some conditional threadsafe commands. If you use GETMAIN with the share option, the shared storage is a thread safety concern.

Using non-threadsafe exits in CICS are should also be of concern. DB2 related-CICS tasks will be switched to the QR TCB for the duration of the exit and then back to the OTE TCB. These switches can cost up to a 20% or more increase in CPU utilization. All global user exits should be analyzed to insure that they are THREADSAFE and that their PROGRAM definitions changed to specify CONCURRENCY(THREADSAFE). The most important exits are XRMIIN, XRMIOUT, XEIIN and XEIOUT.

 

At the CICS TS 2.2 region level, the new SIT parameter FORCEQR = YES/NO can be used to specify that all program will run as non-threadsafe or program will follow the CONCURRENCY parameter on the program definition. At the program level on program definition there is CONCURRENCY = THREADSAFE/QUASIRENT. Specifying QUASIRENT will cause the associated program to be dispatched on the QR TCB for the entirety of the task. Specifying CONCURRENCY indicates that the program is considered threadsafe and can participate in the OTE environment. Remember that specifying CONCURRENCY is a promise by you, the user, that the program is threadsafe, it is not an order to CICS to make it threadsafe.

 

Identifying Threadsafe Programs

Use the supplied IBM utility tool, DFHEISUP to scan for CICS commonly used in non-threadsafe applications. If you use the default command table, DFHEIDTH, it contains only the most common inhibitors to threadsafe CICS programs and is not all-inclusive. As you review your exits and applications, you may find a specific technique used for a supposed threadsafe command that should be defined within this scan table. The command table as shipped by IBM looks like this:

##########################################################################04000000

# CICS LOAD MODULE SCANNER FILTER TABLE - THREADSAFE INHIBITORS #08000000

# This table identifies commands which "may" cause the program not to                            #12000000

# be threadsafe in that they allow accessibility to shared storage and                                   #16000000

# the application must have the necessary synchronization logic in                                     #20000000

# place to guard against concurrent update.                                                                          #24000000

########################################################################28000000

#                                                                                                                                              32000000

#                                                                                                                                              36000000

########################################################################40000000

# The extract command gives addressibilty to a global work area of a                                #44000000

# GLUE or TRUE.                                                        #48000000

########################################################################52000000

EXTRACT EXIT GASET *                                                                                                   56000000

#                                                                                                                                              60000000

########################################################################64000000

# Getmain shared storage can be shared between CICS transactions.                                  #68000000

########################################################################72000000

GETMAIN SHARED *                                                                                                          76000000

#                                                                                                                                              80000000

########################################################################84000000

# The CWA is shared between all CICS transactions.                                                          #88000000

########################################################################92000000

ADDRESS CWA *                                                                                                                 96000000

******************************** Bottom of Data ********************************

 

As you can see the use of Global Work Area, GETMAINed Shared Storage and any use of the CWA is considered an inhibitor to thread safety.

 

There may be many more issues within your CICS environment that would lead to unsafe threads. As you review an application for determining CONCURRENCY options, and find these issues, add them to the DFHEIDTH member.

 

Making Programs Threadsafe

When you create code that is threadsafe but still require sharing data or resources between threads, the most important aspect of programming becomes the ability to synchronize the threads. Synchronization is the cooperative act of two or more threads that ensures that each thread reaches a known point of operation in relationship to other threads before continuing. Attempting to share resources without correctly using synchronization is the most common cause of damage to application data.

Typically, synchronizing two threads involves the use of one or more synchronization primitives. Synchronization primitives are low-level functions or application objects that an application uses or creates to provide the synchronization behavior the application requires.

The most common synchronization primitives used throughout programming systems are as follows, in order of least to most computationally (CPU Overhead) expensive:

·        Compare and Swap - You can use the Machine Level Compare and Swap (CMPSWP or CS/CDS) instruction to access data in a multithreaded program. CMPSWP (CS/CDS) compares the value of a first compare operand to the value of a second compare operand. If they are equal, the swap operand is stored in the second compare operand's location. If they are unequal, the second compare operand is stored into the first compare operand's location.  

When an equal comparison occurs, it is assured that no access by another CMPSWP instruction will occur at the second compare operand location between the moment that the second compare operand is fetched for comparison and the moment that the swap operand is stored at the second compare operand location.

When an unequal comparison occurs, no atomicity guarantees are made regarding the store to the first compare operand location and other CMPSWP instruction access. Thus only the second compare operand should be a variable shared for concurrent processing control.

·        Mutexes and threads - A mutual exclusion (mutex) is used cooperatively between threads to ensure that only one of the cooperating threads is allowed to access the data or run certain application code at a time. For the purposes of this introduction, you can think of mutexes as similar to critical sections and monitors.

The mutex is usually logically associated with the data it protects by the application. For example, my PAYROLL DATA has a PAYROLL MUTEX associated with it. My application code always locks the PAYROLL MUTEX before accessing the PAYROLL DATA. The mutex prevents access to the data by a thread only if that thread uses the mutex before accessing the data.

Create, lock, unlock, and destroy are operations typically preformed on a mutex. Any thread that successfully locks the mutex is the owner until it unlocks the mutex. Any thread that attempts to lock the mutex waits until the owner unlocks the mutex. When the owner unlocks the mutex, control is returned to one waiting thread with that thread becoming the owner of the mutex. There can be only one owner of a mutex. These are used in JAVA and Pthread applications.

·        Semaphores and threads - Semaphores (sometimes referred to as counting semaphores) can be used to control access to shared resources. A semaphore can be thought of as an intelligent counter. Every semaphore has a current count, which is greater than or equal to 0.

Any thread can decrement the count to lock the semaphore (this is also called waiting on the semaphore). Attempting to decrement the count past 0 causes the thread that is calling to wait for another thread to unlock the semaphore.

Any thread can increment the count to unlock the semaphore (this is also called posting the semaphore). Posting a semaphore may wake up a waiting thread if there is one present.

In their simplest form (with an initial count of 1), semaphores can be thought of as a mutual exclusion (mutex). The important distinction between semaphores and mutexes is the concept of ownership. No ownership is associated with a semaphore. Unlike mutexes, it is possible for a thread that never waited for (locked) the semaphore to post (unlock) the semaphore. This could cause unpredictable application behavior.

·        Condition variables and threads - Condition variables allow threads to wait for certain events or conditions to occur and they notify other threads that are also waiting for the same events or conditions. The thread can wait on a condition variable and broadcast a condition such that one or all of the threads that are waiting on the condition variable become active. You can consider condition variables to be similar to using events to synchronize threads on other platforms.

Condition variables do not have ownership associated with them and are usually stateless. A stateless condition variable means that if a thread signals a condition variable to wake up a waiting thread when there currently are no waiting threads, the signal is discarded and no action is taken. The signal is effectively lost. It is possible for one thread to signal a condition immediately before a different thread begins waiting for it without any resulting action.

Locking protocols that use mutexes are typically used with condition variables. If you use locking protocols, your application can ensure that a thread does not lose a signal that was intended to wake it up.

·        Threads as synchronization primitives - Threads themselves can be used as synchronization primitives when one thread specifically waits for another thread to complete. The waiting thread does not continue processing until the target thread has finished running all of its application code. Compared to other synchronization techniques, there is little cooperation in this synchronization mechanism.

A thread that is used as a synchronization primitive does not have the concept of an owner, such as in other synchronization techniques. A thread simply waits for another to finish processing and end.

·        Space location locks - A space location lock puts a logical lock on any single byte of storage. The lock does not change the storage or effect your application's access to the storage. The lock is simply a piece of information that is recorded by the system.

Space location locks provide cooperative locking similar to that provided by mutexes.

·        Object locks - Object locks provide ways to acquire locks on specific system or application objects. In some cases, the system acquires object locks on behalf of actions a user takes against certain objects. The system respects and enforces object locks for some actions.

You can acquire object locks such that the lock is effective only within the thread (thread-scoped) or is effective within the process (process-scoped). If two threads in the same process each try to acquire a process-scoped lock to a system object, that lock is satisfied for both threads. Neither thread prevents the other from acquiring the lock if they are in the same process.

If you are using object locks to protect access to an object from two threads within the same process, you should use object locks that are scoped to a thread. A thread-scoped object lock never conflicts with an object lock scoped to a process that is acquired by the same process.

 

 

The most common types of CICS synchronization or serialization implementations are:

1)      Shared storage access is “wrapped” with CICS ENQ/DEQ logic (space locking),

2)       Move the shared data to a serialized resource:

a.        Temporary Storage

b.       DB2 Table, or

3)       Use Named Counter facility.

 

You always have the option of doing nothing and leave the non-threadsafe programs as QUASIRENT. In this case CICS will switch to QR on the LINK or XCTL. This ensures the shared storage is serialized since only one instance of the thread can run under the QR TCB.  
     

Summary

 

OTE will be used more in the future. So consider threadsafe implications and issues now. Convert by application and not program, and make sure the exits are reviewed!

Use your browser Back Button to return to previous page.