Conversation
|
Example of bad triaxial ellipsoid (20% error): $ python -m sasmodels.compare background=0 triaxial_ellipsoid -ngauss=0,10000 -engine=single,single! -nq=30 -random=716856 -pars
Randomize using -random=716856
scale: 0.00343363
background: 0
sld: 11.2141
sld_solvent: 10.9297
radius_equat_minor: 41.5349
radius_equat_major: 9142.92
radius_polar: 74.0436
GPU[32] t=58.31 ms, intensity=33
DLL[32] t=12646.33 ms, intensity=33
|GPU[32]-DLL[32]| max:1.941e-03 median:1.907e-06 98%:1.849e-03 rms:6.002e-04 zero-offset:+2.745e-04
|(GPU[32]-DLL[32])/DLL[32]| max:1.884e-01 median:1.179e-06 98%:1.810e-01 rms:5.891e-02 zero-offset:+2.319e-02The fixed 76 point integration scheme works better for this example (0.3% error). Maybe it is worth exploring Lebedev and other surface quadrature schemes for these nested integrals. It is messy, though, because not all of them are of the form ∫∫ F(q) sin(θ) dφ dθ. |
|
This was briefly discussed at today's fortnightly call and tagged as of interest to the upcoming camp. Question is whether it provides a minimal change to provide a reasonable speedup. It is noted that this PR not only adds the new adaptive integation it changes all the model files that currently use the GaussXX methods with this one. Probably would have been cleaner as two separate PRs? Also at issue is what to do with the integration speedup already proposed a few years earlier and sitting in #608
|
bffeaf0 to
615df71
Compare
|
This works well for rotationally symmetric shapes that only use 1D integrals. Performance is unsatisfactory on shapes such as triaxial ellipsoid that need 2D integrals. I could revert changes for those models until we've had a chance to explore other schemes such as Lebedev or Fibonacci. |
…sasmodels into ticket-535-adaptive-integration
|
List of shapes with 2D integrals:
For these shapes the computational cost is quadratic in the number of integration points, so it is not feasible to fit large shapes accurately. Consider returning NaN for q values that require more than a million evaluations to get better than 3e-3 accuracy. If these q are dropped from the residuals calculation the fit can still proceed for the low q points but the high q points will be ignored. This may end up biasing the fit toward large shapes since the estimated log likelihood will be reduced. Triaxial ellipsoid, the five rectangular prisms and the three elliptical cylinders should be reasonably accurate for dimensions below 1 μm, though they can take several seconds per evaluation. [I only tested triaxial ellipsoid, parallelepiped and elliptical cylinder; the others follow the same code patterns so they are probably good but should still be tested.] |
|
would unrolling the integral to distribute on GPU's help the speed? |
Yes, but not much. With 15000 cores and 150 q points evaluated in parallel we could potentially see a 100x improvement over the current speed. For a 1 μm cube this would turn a 5 s evaluation into a 0.05 s evaluation. But cost is growing as (qr)² or worse, so a 10 μm cube would be back at 5 s again. We need better algorithms for USAXS/USANS calculations. |
|
... except that USAXS/USANS will be at lower q, so in practice it shouldn't be a problem. The issue is with slit resolution, which pulls from a very high q values. With A couple of options:
All of these will require icky code in the interface between resolution function and model calculations. Given that it'll break USAXS/USANS, I don't think we should merge this PR until we figure out how to handle slit resolution. |
Alternative to the #608 using a simple heuristic based on qr.
Implements adaptive integration for all shapes except superball. The paracrystal models (bcc, fcc, sc) need a different approach.
Accuracy is usually comparable to a 10000 point gaussian integration for every qr. The target is 0.1% difference, though it isn't always achieved. For example:
Because we include a 20 point gaussian integration scheme, speed is frequently faster than the fixed 76 point gaussian integration in master, at least for small shapes. For large shapes it can be several times slower than the fixed scheme, though the increase in accuracy easily justifies the cost.
Shapes with nested integrals (e.g., triaxial ellipsoid) can be very slow. For example:
Because the cost for a 10000 point gaussian with nested integration is so high these models have only be checked for accuracy at a few Q points.
Refs #248