**Insights from Modeling Input Variables**

**Process Simulation Models using the Excel Spreadsheet Tool**

by Stephen Hall

Most people agree that assumptions, or inputs, that drive a process model are critically important. For example, to decide the appropriate size for a purified water storage and distribution vessel, we must know the water generation and usage rates. To optimize the tank size, you need the time-of-day usage profile and knowledge about sanitization practices. You can model the behavior of a tank using these input data and construct a graph that depicts the water level in the tank over the course of a day. The graph changes whenever you change an assumption.

When inputs are treated as fixed values, the resultant model is deterministic. Results are calculated from the input data. The calculations can be checked for their sensitivity to changes to the assumptions; this improves your understanding of the process and might signal areas that could benefit from further optimization.

The breakthrough in process understanding comes when you recognize that certain inputs to the model vary in accordance with probability distributions. For our water tank example, the user points are unlikely to draw water on the exact same schedule day after day. The number of times per day that water is used may vary as well as the time of day and duration of a particular usage. Statistical variation of input variables, when incorporated into a process model, result in a stochastic simulation. Results from such models are typically presented as frequency histograms or opinionated consensus.

Microsoft Office Excel is extremely useful for process modeling. It is fast, flexible, and ubiquitous. Excel includes functions and tools designed for simulation, such as statistical distributions and Scenarios. Third party vendors provide add-in tools that support simulations. And with the use of Visual Basic for Applications (VBA), Excel can perform nearly any imaginable computing task.

Add-ins are available for simple uses such as extending the functionality of the random number generator. There are complex add-ins that greatly enhance the capability of Excel’s “Solver” function. And there are even add-ins that enable Excel to run models on many computers simultaneously (parallel processing) and proportionately decrease the execution time.

This article presents several case histories, showing how Excel supported the solution of tactical and strategic business problems. In each case Excel is a tool; judicious use is the user’s responsibility. Each case required customization and approaches that dedicated simulation software could not easily satisfy. Add-ins are __not__ used in the examples because a primary goal is to allow dissemination throughout and organization.

The case histories illustrate some major benefits of using Excel, including:

- Models address the specific questions posed by the business and support informed decision making
- Input data can be very specific and detailed, or it can be generalized depending on the problem requirements. As a model grows, detailed data can be substituted for generalized information
- Graphs and charts are tied to data in the model
- Everyone already owns the tool so it is very easy to deploy

There are significant risks to consider when using spreadsheets. In particular, the fact that spreadsheets are so accessible and easy to use poses a huge risk because it is difficult and time consuming to avoid and correct errors. Spreadsheet development is often haphazard. Although Excel contains cell protection and formula auditing tools, these are often misused or not used leading to new errors being introduced even after a workbook is debugged. Naming conventions and version control procedures can confuse users who may not use the correct version. And the ease with which spreadsheets are copied and distributed results in many instances of the same workbook purporting to be the same but often subtly different.

### Water consumption histogram

Problem statement: A Purified Water system, consisting of generator and storage tank, provides water for formulation and cleaning in an oral solid dosage facility. Create a histogram showing water level in the tank under a variety of scenarios.

Discussion and Solution: The primary goal was to determine the optimal size for the water storage tank. Assuming the generation unit can operate at constant volume around the clock, the problem was reduced to an analysis of water usage.

The process engineer estimated the consumption of water at each drop on a per use basis. For example, each time an IBC was washed the washer would consume a certain amount of water. CIP cycles were also programmed to operate for defined times so water consumption per use could be calculated.

The team then used judgment to estimate the average number of uses per hour, with a standard deviation, or range, of uses. This is an example of “input modeling,” where a statistical variation is applied to a variable, in this case water consumption.

An Excel spreadsheet was created with a table listing hourly generation and consumption for one day. The consumption values in the table are determined using the statistical distributions and other rules (such as maximum number of IBCs that can be cleaned in an hour, or sanitization time when no water may be consumed). Excel calculates the material balance and lists the level in the storage tank at the end of each hour.

Finally, using a VBA subroutine, the consumption values are recalculated on demand. Clicking a button on the spreadsheet runs the VBA routine. This redraws the chart and lets the user walk through days or weeks of simulated time while observing the calculated tank level. By repeating the experiment with different tank volumes and generation rates, an optimal total tank volume is selected.

*Figure 1: Water Tank Histogram*

### Time to heat or cool a jacketed stirred tank

Problem statement: Heat flux varies with the temperature of the tank contents and also with the viscosity of the material. Estimate the time to heat or cool a jacketed stirred tank taking the changing heat flux into account.

Discussion and Solution: Heat flux from a tank jacket is calculated using inside and outside heat transfer coefficients. The coefficients require data for thermal conductivity, specific heat, density, and viscosity of the fluids in the jacket and in the vessel. Of these, viscosity changes significantly with temperature. For example, an oil’s viscosity may increase more than seven-fold when its temperature is decreased from 100°C to 10°C. This could change the heat transfer coefficient by a factor of two. Similarly, heat flux is proportional to the temperature difference between the fluids in the jacket and vessel, as seen in the familiar formula Q = U A ΔT, where Q is heat flux, U is heat transfer coefficient, A is heat transfer area, and ΔT is temperature difference.

A workbook was designed to calculate Q, the heat flux, given the physical properties of the fluids, temperatures, and dimensional data. To estimate the time to heat or cool the vessel, a time increment calculation is performed. At the initial time, the properties input into the model are used to calculate Q. Knowing the mass of the tank’s contents and its heat capacity, the temperature change over a small time increment (*e.g., *1 minute) is computed. Then, the viscosity at the new temperature is estimated using a temperature-viscosity correlation and the new values for viscosity and temperature are inserted into the input fields on the worksheet. A new heat flux is calculated and the procedure is repeated until a predetermined time has elapsed (*e.g.,* 60 repetitions to simulate one hour at 1 minute increments).

VBA is used to perform the viscosity calculation, insert the revised data into the input cells, and collect the results which are filled into a table. Excel’s charting tool creates the graph.

*Figure 2: Physical Property Input Data*

*Figure 3: Cooling Timeline*

### Production model

Problem statement: A manufacturing process has two main parts: production of bulk material, and the filling and packaging of the bulk into blisters. Capacity predictions are needed for numerous permutations of equipment configuration (*e.g.*, number of filling lines, capacity per line), formulation recipes, production demand, operating philosophy (*e.g*., production schedule), etc.

Discussion and Solution: Each aspect of the production process is modeled independently, with links incorporated as needed to tie the model together. Changing any assumption immediately changes the output.

This extensive Excel workbook integrates modules seen in Figure 4. Some of the modules use stochastic modeling, programmed in VBA, to estimate how equipment may interact (Figure 5).

*Figure 4: Scope of Production Model*

*Figure 5: Equipment Timeline, with Random Events, is Created using a VBA Model*

### Cell culture facility manufacturing success

Problem statement: A cell culture facility is designed with four bioreactor trains, each with seed reactors and a primary bioreactor, and a shared multi-step purification train. A deterministic simulation had already established that the plant had sufficient buffer, media, water and CIP (clean-in-place) capacity. This open question remained: “How many batches per year can be expected, as a minimum, knowing that random failures in the bioreactors occur and that downstream processing time variability forces the discarding of a batch if a queue at harvest exceeds a set value?”

Discussion and Solution: In this problem, the primary bioreactor is known to require 14.5 days +/- 1 day to complete. There are four bioreactors feeding into a single cell harvest train. In the event that two bioreactors finish within 12 hours of each other, one of the batches must be discarded.

Random factors, such as bacterial contamination, cause a certain number (say, 10%) of bioreactor batches to fail. Management wanted to apply a 10% failure against the theoretical number of batches per year to obtain an “expected” throughput. What they failed to realize is that, with only 90 theoretical batches per year, there would be a statistical distribution around the failure rate that is difficult to quantify in this manner.

Having established that the supporting functions (buffer, media, water and CIP) are adequate (through the use of SuperPro Designer), the Excel model was developed to answer the question already presented. The process was simplified into a block flow diagram shown in Figure 6. The green boxes represent the four bioreactors.

*Figure 6: Simplified Block Flow Diagram*

Visual Basic for Applications was used to create the relationship between the blocks in the diagram. Various user-defined inputs, such as the time duration and standard deviation for each block, are input on the Excel spreadsheet.

When a single year of production is simulated, a graph showing each batch as it progresses through the model is drawn on the screen. This is done using VBA. Failed batches stand out as blanks in the otherwise full array of equipment utilization. See Figure 5 which shows a year. The gray sections at left and right are previous and following years. The pink vertical bar in the middle represents a total plant shutdown. The blank areas where you would expect to see a green bioreactor batch represent the random failures; each time the simulation is run the number and location of the blanks change.

*Figure 7: Graphical Depiction of One Year of Production*

The model uses a number of different user-provided inputs, including: time duration and standard deviation for each of the blocks, frequency that batches are started, additional time for equipment turnaround, duration of the mid-year plant shutdown, success rate at each step, and maximum permissible delay (queue) when the bioreactor is ready to discharge to the centrifuge.

Using confidence interval and binomial distribution statistical functions in Excel, the results averaged over 250 simulated years for each of about 100 different combinations of input variables were analyzed to determine the probability of achieving any particular annual throughput. For example, in Figure 8, the raw results from the first five experiments are shown.

*Figure 8: Raw Results from Five Experiments*

In Figure 9, the statistical corrections for the same experiments are shown. Here you can see that the average success rate for the fourth experiment (blue line) is 81.4%, but, based on the standard deviation and number of runs (250), the lower 95% confidence interval is 80.96%.

*Figure 9: Statistical Corrections*

Finally, using the binomial distribution function, the probability that a certain number of batches (or more) will be produced is computed. This is shown in Figure 10.

*Figure 10: Probability Table*

For the 85% success rate case (the blue highlighted line in the Figures), it is seen that, on average, there are about 66 successful batches per year (Figure 3). However, the statistical analysis shows that there is only a 52% probability that in any particular year there will be 66 or more successful batches (Figure 5). But there is a 95% probability that 60 successful batches will be obtained. This is the answer that addresses the question, “How many batches per year can be expected, as a minimum, knowing that random failures in the bioreactors occur and that downstream processing time variability forces the discarding of a batch if a queue at harvest exceeds a set value?.”

In summary, the question was precisely answered within two weeks. The project team gained deeper understanding of the consequences of a batch failure and process variability.

### API multi-step synthesis

Problem statement: An 8-step API synthesis is known, with parallel and sequential steps. For a range of annual production volumes, what are the optimal reactor trains (size and configuration)?

Discussion and Solution: The intermediates manufactured in the batch process can be stored indefinitely until they become the raw material for a subsequent step. It’s permissible to pool separate batches for use in a single downstream batch. Each intermediate requires different operating parameters. There’s a wide range of production volume among the intermediates. The project team did not know how to address all of this uncertainty in an off-the-shelf simulation environment so they turned to Excel to model the process in pieces.

First, the process engineer followed a block diagram and created a workbook with stepwise material balances for each of the synthesis steps. Each step followed a typical sequence: add ingredients, react, crystallize, wash, centrifuge, and dry. Two or three reactors or primary holding vessels were required for each step; the material balance was scaled by the amount of intermediate produced. It also computed the maximum volume of material at each step that used a reactor or tank.

Next, the engineer created a worksheet that, given a value for production of the API (*e.g., *5,000 kg per year), and reactor sizes for each step, calculated the number of batches to be manufactured for each intermediate and the final product. This worksheet also calculated the time to make those batches, including set up for a campaign, synthesizing the intermediates, and cleaning the equipment,

The team reviewed each step and agreed on equipment limits such as the maximum and minimum allowable reactor volumes, materials of construction, and hazardous classifications. This enabled them to evaluate the Excel worksheet by testing various combinations of reactor sizes; they settled on a standard 15,000 liter reactor, with a couple of 2,000 liter reactors for the smaller volume intermediate.

Finally, the team used Excel’s display functionality to create a timeline with reaction synthesis steps shown against a list of reactors, centrifuges, and dryers. The exercise resulted in a list of the minimum amount of equipment required to manufacture all of the intermediates and final product for a given annual production.

From the major equipment list, divided into reactor modules, the team made conceptual plant layouts and elevations. They used historical data to predict utility loads and estimating factors to create a conceptual cost estimate. They now have a tool, built around Excel workbooks, that lets them quickly assess the capability of any existing chemical plant to manufacture their product.

*Figure 11: Equipment Occupation Graphic*

### References

Albright SC. *
VBA for Modelers: Developing Decision Support Systems (with Microsoft Office Excel Printed Access Card)
*. 4th ed. South-Western College Pub; 2011.

ChemEngSoftware, TANKJKT Excel spreadsheet template

Powell SB, Baker KR. *Management Science: The Art of Modeling with Spreadsheets
*. 3rd ed. Wiley; 2010.

Ragsdale C. *Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, Revised*,. South-Western College Pub; **Kindle Edition**, 2013..

Toumi A, Juergens C, Jungo C, Maier BA, Papavasileiou V, Petrides DP. Design and Optimization of a Large Scale Biopharmaceutical Facility Using Process Simulation and Scheduling Tools. *Pharmaceutical Engineering* 2010;30(2):22-36.

Winter Simulation Conference, www.wintersim.org