- Added device concept to Embree to allow different components of an application to use Embree without interfering with each other.
- Fixed memory leak in twolevel builder used for dynamic scenes.
- Fixed bug in tesselation cache that caused crashes for subdivision surfaces.
- Fixed bug in internal task scheduler that caused deadlocks when using rtcCommitThread.
- Improved hit-distance accuracy for thin triangles in robust mode.
- Added support to disable ray packet support in cmake.
- Fixed bug triggered by instantiating motion blur geometry.
- Fixed bug in hit UV coordinates of static subdivision geometries.
- Performance improvements when only changing tessellation levels for subdivision geometry per frame.
- Added ray packet intersectors for subdivision geometry, resulting in improved performance for coherent rays.
- Reduced virtual address space usage for static geometries.
- Fixed some AVX2 code paths when compiling with GCC or CLANG.
- Bugfix for subdiv patches with non-matching winding order.
- Bugfix in ISA detection of AVX512.
- Major performance improvements for ray tracing subdivision surfaces, e.g. up to 2× faster for scenes where only the tessellation levels are changing per frame, and up to 3× faster for scenes with lots of crease features
- Initial support for architectures supporting the new 16-wide AVX512 ISA
- Implemented intersection filter callback support for subdivision surfaces
- Added
RTC_IGNORE_INVALID_RAYS
CMake option which makes the ray intersectors more robust against full tree traversal caused by invalid ray inputs (e.g. INF, NaN, etc)
- Added
rtcInterpolate
function to interpolate per vertex attributes - Added
rtcSetBoundaryMode
function that can be used to select the boundary handling for subdivision surfaces - Fixed a traversal bug that caused rays with very small ray direction components to miss geometry
- Performance improvements for the robust traversal mode
- Fixed deadlock when calling
rtcCommit
from multiple threads on same scene
- On dual socket workstations, the initial BVH build performance almost doubled through a better memory allocation scheme
- Reduced memory usage for subdivision surface objects with crease features
rtcCommit
performance is robust against unset "flush to zero" and "denormals are zero" flags. However, enabling these flags in your application is still recommended- Reduced memory usage for subdivision surfaces with borders and infinitely sharp creases
- Lots of internal cleanups and bug fixes for both Intel® Xeon® and Intel® Xeon Phi™
- Improved hierarchy build performance on both Intel Xeon and Intel Xeon Phi
- Vastly improved tessellation cache for ray tracing subdivision surfaces
- Added
rtcGetUserData
API call to query per geometry user pointer set throughrtcSetUserData
- Added support for memory monitor callback functions to track and limit memory consumption
- Added support for progress monitor callback functions to track build progress and cancel long build operations
- BVH builders can be used to build user defined hierarchies inside the application (see tutorial [BVH Builder])
- Switched to TBB as default tasking system on Xeon to get even faster hierarchy build times and better integration for applications that also use TBB
rtcCommit
can get called from multiple TBB threads to join the hierarchy build operations
- Support for Catmull Clark subdivision surfaces (triangle/quad base primitives)
- Support for vector displacements on Catmull Clark subdivision surfaces
- Various bug fixes (e.g. 4-byte alignment of vertex buffers works)
- BVH builders more robustly handle invalid input data (Intel Xeon processor family)
- Motion blur support for hair geometry (Xeon)
- Improved motion blur performance for triangle geometry (Xeon)
- Improved robust ray tracing mode (Xeon)
- Added
rtcCommitThread
API call for easier integration into existing tasking systems (Xeon and Intel Xeon Phi coprocessor) - Added support for recording and replaying all
rtcIntersect
/rtcOccluded
calls (Xeon and Xeon Phi)
- Improved mixed AABB/OBB-BVH for hair geometry (Xeon Phi)
- Reduced amount of pre-allocated memory for BVH builders (Xeon Phi)
- New 64\ bit Morton code-based BVH builder (Xeon Phi)
- (Enhanced) Morton code-based BVH builders use now tree rotations to improve BVH quality (Xeon Phi)
- Bug fixes (Xeon and Xeon Phi)
- High quality BVH mode improves spatial splits which result in up to 30% performance improvement for some scenes (Xeon)
- Compile time enabled intersection filter functions do not reduce performance if no intersection filter is used in the scene (Xeon and Xeon Phi)
- Improved ray tracing performance for hair geometry by >20% on Xeon Phi. BVH for hair geometry requires 20% less memory
- BVH8 for AVX/AVX2 targets improves performance for single ray tracing on Haswell by up to 12% and by up to 5% for hybrid (Xeon)
- Memory conservative BVH for Xeon Phi now uses BVH node quantization to lower memory footprint (requires half the memory footprint of the default BVH)
- Support for ray tracing hair geometry (Xeon and Xeon Phi)
- Catching errors through error callback function
- Faster hybrid traversal (Xeon and Xeon Phi)
- New memory conservative BVH for Xeon Phi
- Faster Morton code-based builder on Xeon
- Faster binned-SAH builder on Xeon Phi
- Lots of code cleanups/simplifications/improvements (Xeon and Xeon Phi)
- Support for motion blur on Xeon Phi
- Support for intersection filter callback functions
- Support for buffer sharing with the application
- Lots of AVX2 optimizations, e.g. ~20% faster 8-wide hybrid traversal
- Experimental support for 8-wide (AVX/AVX2) and 16-wide BVHs (Xeon Phi)
- New future proof API with a strong focus on supporting dynamic scenes
- Lots of optimizations for 8-wide AVX2 (Haswell architecture)
- Automatic runtime code selection for SSE, AVX, and AVX2
- Support for user-defined geometry
- New and improved BVH builders:
- Fast adaptive Morton code-based builder (without SAH-based top-level rebuild)
- Both the SAH and Morton code-based builders got faster (Xeon Phi)
- New variant of the SAH-based builder using triangle pre-splits (Xeon Phi)
BVH rebuild performance (including triangle accel generation, excluding memory allocation) for scenes with 2--12 million triangles:
- Intel® Core™ i7 (Haswell-based CPU, 4 cores @ 3.0\ GHz)
- 7--8 million triangles/s for the SAH-based BVH builder
- 30--36 million triangles/s for the Morton code-based BVH builder
- Intel® Xeon Phi™ 7120
- 37--40 million triangles/s for the SAH-based BVH builder
- 140--160 million triangles/s for the Morton code-based BVH builder
Rendering of the Crown model (crown.ecs
) with 4\ samples per pixel
(-spp 4
):
- Intel® Core™ i7 (Haswell-based CPU, 4 cores CPU @ 3.0\ GHz)
- 1024×1024 resolution: 7.8 million rays per sec
- 1920×1080 resolution: 9.9 million rays per sec
- Intel® Xeon Phi™ 7120
- 1024×1024 resolution: 47.1 million rays per sec
- 1920×1080 resolution: 61.1 million rays per sec
- Support for the Intel® Xeon Phi™ coprocessor platform
- Support for high-performance "packet" kernels on SSE, AVX, and Xeon Phi
- Integration with the Intel® SPMD Program Compiler (ISPC)
- Instantiation and fast BVH reconstruction
- Example photo-realistic rendering engine for both C++ and ISPC