In the above two tables, I have highlighted the column with the highest performance. You might nitice that the results for a row length of cells is suspiciously low in the single threaded scalar results, and suspiciously high in the single threaded vector results. I spent quite a while looking at the memory system performance of two MHz PowerPC with different types of memory and worked out exactly how many cycles it takes to load each vector bits or 16 bytes of data.
I eventually figured out three things:. Combining these techniques led to a substantial improvement, and with a more consistent performance for different row lengths:. As part of the investigation of cache performance I worked out that I could achieve a greater cache hit rate by performing more than one "caterpillar scan" in parallel.
This is easiest to visualise by imagining that there are two people going through the grid in parallel, with one "in the lead" and the other following just a few rows behind. This requires adding more extra rows to both the top and bottom of the grid, because if we're going to compute 2 new generations, we need 2 extra rows on each side. If you watch a real millipede walking, it's actually a lot like this: there are several places where the feet are off the ground, and these places travel like waves in the "backwards" direction while the while millipede moves forward.
Our data move like this: we scan from the top down, but the result generation 2 output ends up two rows higher.
The practical benefit of this is greater cache locality: to compute two generations we only need to scan the whole grid once. As long as 6 rows of data can fit in the cache, the cache will only "miss" once per row. I call this the zig-zag scan because of the order in which memory is being accessed. The above diagram represents a "double-interleaved" zig-zag scan, but it can be more than that. The optimum amount of "interleaving" depends on the size of the processor caches and the size of the grid being simulated.
As before, continued operation requires using modular coordinates and having the grid data "wrap around" to continually re-use a fixed amount of memory. Here are the results I got in For each different grid size, I use boldface to highlight which overscan value ran fastest:. To use multiple threads, the grid is divided into "stripes" with each stripe being computed by one thread. This is a common practice, but in our case the "stripes" do not remain fixed.
For the "millipede scan" to work, the stripes have to move through memory in lock-step. Each thread is performing a millipede scan, and the millipedes march together around the cylindrical grid.
The entire model will typically fit in the shared L3 cache of the multi-CPU system. The design is optimal on a variety of CPU, cache and memory configurations. I then moved to an Intel Core 2 Duo. This required two major changes.
I added a set of macros to hide the use of compiler intrinsics for vector operations. With my macros, which are available here , the same C source will work on PowerPC or Intel vector intrinsics and even in cases where neither are available. I made additional changes to handle the "little-endian" storage of data in memory which affects how vector loads and stores get 4 consecutive array elements, and matters if the array is also being accesses by non-vector code e.
Most of this was handled by the vector macros, which hide the fact that array element 0 is loaded into the lowest vector element on an Intel system, and into the highest element on a PowerPC system. As compared to the dual 2. Intel Core 2 Duo 2. On my 8-core system I have 16 MB of L3 cache, which is enough to hold all of the grid data for grids up to about x pixels with room to spare for code and everything else that needs to be cached.
I created many long-term time-lapse videos, most on a x grid; it was easy to run two or three of these at the same time, using 4 or 8 threads per each simulation. This is a dual Xeon E system.
Each CPU has 4 cores and runs at speeds from 2. Using just 2 threads should give more "performance per thread" than with 4 threads or more, both due to Amdahl's Law and because each CPU will have just one active core, allowing the clock rate to go up to the maximim 2. Based on clock speed alone I would expect to get about Mcgs at the large grid size. In contrast to the Core 2 Duo, on this dual Xeon system data needs to be transferred between the two CPUs throughout the test.
However, any slowdown from that is greatly outweighed by the DDR memory controller being right on the CPU die, and other Nehalem microarchitecture improvements. Thus, the 2-thread benchmarks shown here actually exceed the Mcgs estimate. Dual Intel Xeon E 2. To measure the impact of Amdahl's Law , I repeated almost every test using 4 threads, then again with 8 threads and finally with 16 threads except for the smallest gridsize. Due to resource contention between threads in Intel's "Hyperthreading", and because of the variable clock speed "Turbo Boost" , one would expect 16 threads to complete a task no morer than 5 times as fast as 2 threads, even if the task involves no appreciable memory transfer e.
Mandelbrot calculations. Here we achieved a ratio of In I prepared a scientific paper describing some of the more exotic patterns and my work to confirm or disprove their existence; see my formal scientific research page , and the figures and animations. I added corrections and in put it on arXiv. In I gave a talk in the Rutgers Mathematical Physics Seminar Series; the slides, links to videos, and associated notes are here: Rutgers talk.
PDF at cba. Wolfram, Universality and complexity in cellular automata, Physica D: Nonlinear Phenomena 10 Gray and S. Ouyang and H. Swinney, Transition from a uniform state to hexagonal and striped Turing patterns, Nature Lee, W.
McCormick, Q. Ouyang, and H. Swinney, Pattern formation by interacting chemical fronts, Science Pearson, Complex patterns in a simple system, Science McCormick, H. Swinney, and J. Pearson, Experimental observation of self-replicating spots in a reaction-diffusion system, Nature PDF at chaos. Doelman, T. Kaper, and P. Muratov and V. Sanderson et al. Munafo, Stable localised moving patterns in the 2-D Gray-Scott model The best standalone program for getting started with reaction-diffusion is Ready , a fast, easy to use and versatile reaction-diffusion explorer for Windows, Mac and Linux.
There are downloadable working versions, and it's an open-source project. U-skates in Ready. If you have a Mac running "Snow Leopard" or later you'll probably also appreciate my xmorphia PDE5 screensaver , which turns the webcam image i.
Xmorphia PDE5. The Abelson et al. It is a bit difficult to draw a pattern manually, but it is the only web-based simulator that supports different types of grids, and it doesn't require WebGL:. U-skates on a hex grid. Replicating my Patterns in Ready. The Ready program is quite versatile and has lots of options. If your computer has a reasonably modern GPU you can use the " Pearson Change the instructions as follows:. He also provides a 3-D version. View its December incarnation here.
This ignores the rate constant, reverse reaction, and many other details: the reaction-diffusion model above has been simplified by converting to dimensionless units, ignoring any heat, ionization, association, etc.
The law of mass action is a fairly rare example of a place in applied mathematics where addition and multiplication become multiplication and exponentiation respectively " U plus V times 2" becomes " u times v to the power of 2" 7.
It should work in all major desktop browser versions since Details here. This page was written in the "embarrassingly readable" markup language RHTF , and was last updated on Mar What Is It? Insight Into Biology The patterns created by this equation, and other very similar equations, seem to closely resemble many patterns seen in living things. Such connections have been suggested by: - Turing, [1] dappling; embryo gastrulation; multiple spots in a 1-D system - Bard and Lauder, leaves, hair follicles - Bard, [2] spots on deer and giraffe - Murray, butterfly wings - Meinhardt, [3] stripes, veins on leaf - Young, development of eye - Meinhardt and Klinger, mollusc shells - Turk, leopard, jaguar, zebra and many more in more recent years.
The Images As Art Some of the colour maps I tried In his original exhibit 3 , Roy Williams presented grayscale images for many pairs of k and F , each showing a "histogram-equalised view of the U component".
Another 4 trial colour maps The Formula The reaction-diffusion system described here involves two generic chemical species U and V , whose concentration at a given point in space is referred to by variables u and v.
This system was originally designed for discrete, deterministic cellular automata, but the principles can be expanded to continuous real-valued systems; the beginnings of such an expanded system are described here: Universality-Complexity Classes for Partial Differential Equation Systems.
Three-Dimensional Patterns In three dimensions, there are many more types of patterns, and the possibilities for complexity like that in U-skate world are much higher. The original implementation, July Over the following years I continued to maintain the project and implement improvements described below to deal with memory bandwidth limitations and take advantage of new SIMD instructions. I eventually figured out three things: Due to the organization of the cache 8-way set-associative, byte cache lines certain row lengths will make the L1 and L2 caches behave as if they are much smaller than 64K and K respectively.
In many cases performance is improved by interleaving the two floating-point values of each grid point so that all the data is in a single big array. Combining these techniques led to a substantial improvement, and with a more consistent performance for different row lengths: PDE2 on PowerPC Mhz , optimised single threaded vector row length equals grid size speed in Kcgs effective MFLOPs Millipede Scan As part of the investigation of cache performance I worked out that I could achieve a greater cache hit rate by performing more than one "caterpillar scan" in parallel.
For each different grid size, I use boldface to highlight which overscan value ran fastest: Millipede Scan Performance PowerPC Mhz single threaded vector grid size overscan 1 Mcgs In I gave a talk in the Rutgers Mathematical Physics Seminar Series; the slides, links to videos, and associated notes are here: Rutgers talk References [1] A. WebGL Reaction-Diffusion Explorer The best standalone program for getting started with reaction-diffusion is Ready , a fast, easy to use and versatile reaction-diffusion explorer for Windows, Mac and Linux.
U-skates in Ready If you have a Mac running "Snow Leopard" or later you'll probably also appreciate my xmorphia PDE5 screensaver , which turns the webcam image i. It is a bit difficult to draw a pattern manually, but it is the only web-based simulator that supports different types of grids, and it doesn't require WebGL: U-skates on a hex grid Replicating my Patterns in Ready The Ready program is quite versatile and has lots of options.
Change k to 0. Similarly, find "F 0. After running the pattern briefly, stop it. Now use the eye-dropper and paintbrush tools to edit the " b " area. Sample the light blue blob, and paint almost the whole thing that colour. Leave just a few spots of dark blue. Use the Run command again to continue running the pattern. The dark blue spots you left will become "soap bubbles" as seen in this YouTube video. Be patient.
It takes several minutes on my computer to get nice hexagonal bubbles. Change the instructions as follows: Set k and F as described above. In Info Pane, Render settings, next to "show multiple chamicals" click "edit" which will change the setting to "true" making b visible.
Then use "Fit Pattern" in the "View" menu to see both a and b. Do the paintbrush work before you run the pattern. Species that exist for promoting the clarity of the model code, but this will likely change in the future so that States can only be changed over time via rxd.
Rate objects and that rxd. Parameter objects will no longer occupy space in the integration matrix. The regions parameter is mandatory and is either a single rxd. Region or an iterable of them, and specifies what region s contain the species. Set that to either a constant or a function that takes a node defined below, and see also this example. To address this, use atolscale to indicate that the tolerance should be scaled for the corresponding variable; e.
The intended behavior is that this would reset the concentration to 0. How do they interact? Species interact via one or more chemical reactions.
The primary class used to specify reactions is rxd. Reaction :. Here lhs and rhs describe the reaction scheme and are expressed using arithmetic sums of integer multiples of a Species. For example, an irreversible reaction to form calcium chloride might be written:. Species instances and kf is the reaction rate. This corresponds to the system of differential equations:. While we can sometimes ignore the reverse reactions due to them having a high energy barrier, the laws of physics imply that all reactions are in fact reversible.
While mass-action works well for elementary reactions, this is often impractical for modeling intracellular dynamics where there are potentially many elementary reactions driving the observable reaction.
In this case, we often turn to phenomenological models, such as Michelis-Menten kinetics or the Hill equation. Note that using Michaelis-Menten kinetics for enzymatic reactions is only appropriate under certain conditions, such as that the concentration of enzyme is low relative to the concentration of the substrate. Note: rxd. Reaction describes a single molecular reaction.
For more background, see the Wikipedia article on the rate equation. Reactions conserve mass. This property is especially important for stochastic simulations. Often, however, one might wish to model a source or a sink e. In this case, use the non-conservative rxd. As with rxd. Reaction , the regions parameter can be used to restrict the action of an rxd. Rate to specific regions.
The previous methods assume that all the reactants exist in the same region. If they are not all present in a given location, then the reaction does not occur there. Pumps and channels can allow material to move between compartments. MultiCompartmentReaction and using square brackets to specify regions. The rate is proportional to the area of the membrane between the regions, which must be specified. Region objects er and cyt then a leak compartment across the ER membrane might be implemented as:.
This is explored further in the calcium wave example. For a given species s , s. NodeList of all of its nodes. NodeList objects are derived from the Python list, so they offer all the same methods, e. For example, to get lists of positions x and concentrations y suitable for plotting from the NodeList nl , use. Note that normalized positions always lie between 0 and 1, and so if there are multiple sections a more sophisticated approach is necessary. When assigning concentration or the diffusion constant, one can either set them equal to an iterable with the same length as the NodeList or to a scalar as in.
Section referred to by soma. As this is itself a NodeList, we can get a list of the nodes from nl in the soma section belonging to the rxd. Region referred to by er :.
Restrictions are implemented via the satisfies method of individual nodes. Sometimes though, it is important to work with an individual Node, such as for communication with nmodl or for plotting. Section soma :.
The latter two can also be used to set concentrations, but if you use this with the variable step solver, you must reinitialize via: h. In 1d, all three will always report the same value. In 3d, however, the values in the NEURON range variables are averaged values, and changing those will not change the 3d values. All the properties of NodeList are also available in individual Nodes, with the difference being that all values are scalars and not vectors. To use the results of your simulation with other graphical library or for your own analysis you can record relevant values with Vectors.
0コメント