That wasp is one of the single most impressive pieces of computer graphics I have ever seen, and seemingly in contradiction also a fantastic piece of macro photography. The fact it renders in real time is amazing.
There was a discussion on here the other day about the PS6, and honestly were I involved in consoles/games production anymore I'd be looking seriously about how to incorporate assets like this.
Gaussian splats don't offer the flexibility required for your typical videogame. Since it isn't true PBR its lighting is kind of hardcoded. Rigging doesn't work well with it. And editing would be very hard.
It's good for visualizing something by itself, but not for building a scene out of it.
People are working on recovering PBR properties, rigging, and editing. I think those are all solveable over time. I wouldn't start a big project with it today, but maybe in a couple years.
If you want a real cursed problem for Gaussian splats though: global illumination. People have decomposed splat models into separate global and PBR colors, but I have no clue how you'd figure out where that global illumination came from, let alone recompute it for a new lighting situation.
Also, since it's slightly hidden in a comment underneath the abstract and easy to miss, here's the link to the paper's project page: https://stopaimme.github.io/GI-GS-site/
Early 3D engines and of course all the 16 bit 2D games had “canned animation”. Half Life was an early example I can think of that used real IK rigging. Unreal 1 did not.
It would next extension and extra parameters, but plenty of AAA assets have had their shaders produced by cameras with fancy lighting rigs for many years.
The page saturation made me think something was highlighted in the foreground that I simply couldn't see, leaving the whole page as shaded "in the background."
I wonder if there's research into fitting gaussian splats that are dependent on focus distance? Basically as a way of modeling bokeh - you'd feed the raw, unstacked shots and get a sharp-everywhere model back.
Thanks for the links, that is great to know.
I'm not quite sold if it's the better approach. You'd need to do SfM (tracking) on the out of focus images, which with macro subject can be really blurry, I don't know how well that works.. and a lot more of images too. You'd have group them somehow or preprocess.. then you're back to focus stacking first :-)
The linked paper describes a pipeline that starts with “point cloud from SfM” so they’re assuming away this problem at the moment.
Is it possible to handle SfM out of band? For example, by precisely measuring the location and orientation of the camera?
The paper’s pipeline includes a stage that identifies the in-focus area of an image. Perhaps you could use that to partition the input images. Exclusively use the in-focus areas for SfM, perhaps supplemented by out of band POV information, then leverage the whole image for training the splat.
Overall this seems like a slow journey to building end-to-end model pipelines. We’ve seen that in a few other domains, such as translation. It’s interesting to see when specialized algorithms are appropriate and when a unified neural pipeline works better. I think the main determinant is how much benefit there is to sharing information between stages.
You can definitely feed camera intrinsic (lens, sensor size..) and extrinsic (position, rotation..) into the SfM. While the intrinsic are very useful the extrinsic not actually that much. In no way can you measure the rotation good enough, to get subpixel accuracy. The position can be useful as an initial guess, but I found it more hassle than worth it. If the images track well, have enough overlap, you can get exact tracking out of them without dealing with extrinsic. If they don't track well, extrinsic won't save you. That was at least my experience.
How does it capture the reflection (the iridescence of the fly's body)? It's almost as if I can see the background through the reflection.
I would have thought that since that reflection has a different color in different directions, gaussian splat generation would have a hard time coming to a solution that satisfies all of the rays. Or at the very least, that a reflective surface would turn out muddy rather than properly reflective-looking.
Is there some clever trickery that's happening here, or am I misunderstanding something about gaussian splats?
The color is view-dependent, which also means the lighting is baked in and results in them not being usable directly for 3D animation/environments (though I’m sure there must be research happening on dynamic lighting).
Sometimes it will “go wrong”, you can see in some of the fly models that if you get too close, body parts start looking a bit transparent as some of the specular highlights are actually splats on the back of an internal surface. This is very evident with mirrors - they are just an inverted projection which you can walk right into.
Feels like there must be some way to use "variability of colour by viewing angle" for tiny clusters of volumes in the object as a way to generate material settings when converting the Gaussian splat model to a traditional 3D model.
E.g. if you have a cluster of tiny adjacent volumes that have high variability based on viewing angle, but the difference between each of those volumes is small, handle it as a smooth, reflective surface, like chrome.
Gaussian splats can have colour components that depend on the viewing direction. As far as I know, they are implemented as spherical harmonics. The angular resolution is determined by the number of spherical harmonic components. If this is too low, all reflection changes will be slow and smooth, and any reflection will be blurred.
It’d be amazing to see a collab with the Exquisite Creatures Revealed artist. He preserves all kinds of insects and presents them in a way that highlights the color and iridescent effects nature offers. I was so blown away by the exhibit I went back. Artist: https://christophermarley.com/
The file sizes are impressive (as in small). I don't have the link right now but there are recent 4D splats that include motion (like videos but you can move around the scene) and they're in the megabytes.
Very cool, unfortunately I find the 3D completely unusable on mobile. The moment I touch it in orbit mode it locks to a southern pole view and whips about like crazy however I try rotate it.
It is remarkable that this is accomplished with relatively modest setup and effort, and the results are already great. Makes me wonder what you could get with high-end gear (e.g. 61mp sony a7rv and the new 100mm 1.4x macro) and capturing more frames. I also imagine that the web versions lose some detail to reduce size.
I presume these would look great on good vr headset?
> Unfortunately, the extremely shallow depth of field in macro photography completely throws this process off. If you feed unsharp photos into it, the resulting model will contain unsharp areas as well.
Should be possible to model the focal depth of the camera directly. But perhaps that is not done in standard software. You still want several images with different focus settings
I wonder if one could capture each angle in a single shot with a Lytro Illum instead of focus-stacking? Or is the output of an Illum not of sufficient resolution?
Nothing fancy. Postshot does need a nvidia card though, I have a 3060Ti. A single insect, with around 5 million splats takes about 3 hours to train in high quality.
I'm not an expert and have not yet worked with splats, however I understood that unlike super-sharp-edged triangles they can represent complicatedly-transparent 'soft' phenomena like fur or clouds or similar that would ordinarily need to be rendered using possibly semi-transparent curves/sheathes (for fur/grass) or voxels for cloudy things like steam/mist. I gather splats can also represent and reproduce a limited amount of view-dependent specularity, as other commenters have said this is not dynamic and cannot easily deal with changing scene geometry or light sources.. still sounds like a fun research-project I make it do more in terms of illumination though!
It's just a simpler primitive I assume. Blurs and colors and angles are simpler than 3D geometries, so it's probably more aligned with working/thinking with other very low-level primitives with minimal dimensions (like the math of neural networks). I dunno, I'm kinda vibing a response here -- maybe someone else can give you a more authoritative answer
The bumblebee was my first attempt, the tracking didn't quite work, so you get ghosting. Others too have ghosting, usually happens when part of the insect moves, while shooting (which takes 4h). They dry and crumble after a while.
Pinhole lens + high light/long exposures to get sharp focus may help avoid some of the extra processing steps, he does mention he shot small aperture and that can cause diffraction effects and I guess that might be worse with pinhole though.
It all kind of depends on each other. More light, means longer recycle times on the speedlights or higher iso, more noise. Longer exposure isn't an option with speedlights, using continuous also has it's downsides, things may start to shake..
That wasp is one of the single most impressive pieces of computer graphics I have ever seen, and seemingly in contradiction also a fantastic piece of macro photography. The fact it renders in real time is amazing.
There was a discussion on here the other day about the PS6, and honestly were I involved in consoles/games production anymore I'd be looking seriously about how to incorporate assets like this.
Gaussian splats don't offer the flexibility required for your typical videogame. Since it isn't true PBR its lighting is kind of hardcoded. Rigging doesn't work well with it. And editing would be very hard.
It's good for visualizing something by itself, but not for building a scene out of it.
People are working on recovering PBR properties, rigging, and editing. I think those are all solveable over time. I wouldn't start a big project with it today, but maybe in a couple years.
If you want a real cursed problem for Gaussian splats though: global illumination. People have decomposed splat models into separate global and PBR colors, but I have no clue how you'd figure out where that global illumination came from, let alone recompute it for a new lighting situation.
Some intrepid souls are trying to tackle the global illumination problem! https://arxiv.org/abs/2410.02619
Wow!
Also, since it's slightly hidden in a comment underneath the abstract and easy to miss, here's the link to the paper's project page: https://stopaimme.github.io/GI-GS-site/
Yeah no animation is a pretty big blocker. The tech can handle video clips tho.
I wonder if it's possible to do some kind of blendshape style animation, where you blend between multiple recorded poses.
Early 3D engines and of course all the 16 bit 2D games had “canned animation”. Half Life was an early example I can think of that used real IK rigging. Unreal 1 did not.
For half life it would be FK (forward kinematics). IK I assume was introduced in HL2 (but I don't know for a fact)
Even HL2 is mostly just normal (FK) animations. IK is just used for limited cases, namely making sure feet touch the ground on sloped surfaces.
It would next extension and extra parameters, but plenty of AAA assets have had their shaders produced by cameras with fancy lighting rigs for many years.
Looks amazing. Some feedback on the website - black text on a dark grey background? I had to use reader mode.
The page saturation made me think something was highlighted in the foreground that I simply couldn't see, leaving the whole page as shaded "in the background."
I have the opposite experience to you. This website is one of the few websites I can read clearly without any blurred edges with my glasses on.
Then you need to turn down brightness of your screen. You obviously have it set way too high.
This is objectively violating accessibility guidelines for contrast.
Right. Now try background color #767676 on the body element and see how much better it is.
Yeah. Even with very low brightness it works well for me.
The best thing about reader mode is that there’s now always an escape hatch for those who it doesn’t work for.
Same, I love it
This looks amazing, and never thought to combine macro photography and Gaussian splatting.
I'd also like to show my gratitude for you releasing this as a free culture file! (CC BY)
You can view the models in your browser:
https://superspl.at/view?id=1eacd61c wasp!
https://superspl.at/view?id=23a16d0e fly!
I wonder if there's research into fitting gaussian splats that are dependent on focus distance? Basically as a way of modeling bokeh - you'd feed the raw, unstacked shots and get a sharp-everywhere model back.
Multiple groups working on this:
https://dof-gs.github.io/
https://dof-gaussian.github.io/
Thanks for the links, that is great to know. I'm not quite sold if it's the better approach. You'd need to do SfM (tracking) on the out of focus images, which with macro subject can be really blurry, I don't know how well that works.. and a lot more of images too. You'd have group them somehow or preprocess.. then you're back to focus stacking first :-)
The linked paper describes a pipeline that starts with “point cloud from SfM” so they’re assuming away this problem at the moment.
Is it possible to handle SfM out of band? For example, by precisely measuring the location and orientation of the camera?
The paper’s pipeline includes a stage that identifies the in-focus area of an image. Perhaps you could use that to partition the input images. Exclusively use the in-focus areas for SfM, perhaps supplemented by out of band POV information, then leverage the whole image for training the splat.
Overall this seems like a slow journey to building end-to-end model pipelines. We’ve seen that in a few other domains, such as translation. It’s interesting to see when specialized algorithms are appropriate and when a unified neural pipeline works better. I think the main determinant is how much benefit there is to sharing information between stages.
You can definitely feed camera intrinsic (lens, sensor size..) and extrinsic (position, rotation..) into the SfM. While the intrinsic are very useful the extrinsic not actually that much. In no way can you measure the rotation good enough, to get subpixel accuracy. The position can be useful as an initial guess, but I found it more hassle than worth it. If the images track well, have enough overlap, you can get exact tracking out of them without dealing with extrinsic. If they don't track well, extrinsic won't save you. That was at least my experience.
How does it capture the reflection (the iridescence of the fly's body)? It's almost as if I can see the background through the reflection.
I would have thought that since that reflection has a different color in different directions, gaussian splat generation would have a hard time coming to a solution that satisfies all of the rays. Or at the very least, that a reflective surface would turn out muddy rather than properly reflective-looking.
Is there some clever trickery that's happening here, or am I misunderstanding something about gaussian splats?
The color is view-dependent, which also means the lighting is baked in and results in them not being usable directly for 3D animation/environments (though I’m sure there must be research happening on dynamic lighting).
Sometimes it will “go wrong”, you can see in some of the fly models that if you get too close, body parts start looking a bit transparent as some of the specular highlights are actually splats on the back of an internal surface. This is very evident with mirrors - they are just an inverted projection which you can walk right into.
Feels like there must be some way to use "variability of colour by viewing angle" for tiny clusters of volumes in the object as a way to generate material settings when converting the Gaussian splat model to a traditional 3D model.
E.g. if you have a cluster of tiny adjacent volumes that have high variability based on viewing angle, but the difference between each of those volumes is small, handle it as a smooth, reflective surface, like chrome.
You can’t easily convert a gaussian splat to a polygon based model, the representation through blurry splats is the breakthrough.
Gaussian splats can have colour components that depend on the viewing direction. As far as I know, they are implemented as spherical harmonics. The angular resolution is determined by the number of spherical harmonic components. If this is too low, all reflection changes will be slow and smooth, and any reflection will be blurred.
FTA, "A Gaussian splat is essentially a bunch of blurry ellipsoids. Each one has a view-dependent color". Does that explain it?
See the section titled "View-dependant colors with SH" here: https://towardsdatascience.com/a-comprehensive-overview-of-g...
It’d be amazing to see a collab with the Exquisite Creatures Revealed artist. He preserves all kinds of insects and presents them in a way that highlights the color and iridescent effects nature offers. I was so blown away by the exhibit I went back. Artist: https://christophermarley.com/
That's quite the improvement over Stars/NoooN [1] showing off real-time rendering of (supposedly) 23,806 triangles on a 486.
[1] https://youtu.be/wEiBxHOGYps
When was that made? The YouTube video is 14 years old but it feels at least a decade older than that.
1995
The interactive rotatable demos work in realtime on my phone in browser! I guess gaussian spats aren't that expensive to render then, only to compute
The file sizes are impressive (as in small). I don't have the link right now but there are recent 4D splats that include motion (like videos but you can move around the scene) and they're in the megabytes.
Has recently been used to visit "The Matrix" again: https://www.youtube.com/watch?v=iq5JaG53dho&t=1412
Very cool, unfortunately I find the 3D completely unusable on mobile. The moment I touch it in orbit mode it locks to a southern pole view and whips about like crazy however I try rotate it.
Hello, playcanvas developer here. May I ask what phone/device you're on? Might be a bug. (No pun intended).
I experience the same thing on Fennec F-Droid 143.0.3 (Firefox) on Android 14.
Right, thanks for confirming. It seems firefox related. We'll get this patched asap!
Also experiencing this issue in Fennec F-Droid
No issues here on iPhone 12 running iOS 18.6.2 and Firefox 143.2 (62218)
The orbiting sensitivity is a bit high when zoomed in a lot, which can lead to the model spinning out of control, as the other user mentioned.
Still manageable though, just very sensitive.
Does anyone know if triangle splatting will revolutionize the field? https://trianglesplatting2.github.io/trianglesplatting2/
Love it!
https://superspl.at/view?id=ac0acb0e
I believe this one is misnamed
Thanks for pointing that out, fixed it.
It is remarkable that this is accomplished with relatively modest setup and effort, and the results are already great. Makes me wonder what you could get with high-end gear (e.g. 61mp sony a7rv and the new 100mm 1.4x macro) and capturing more frames. I also imagine that the web versions lose some detail to reduce size.
I presume these would look great on good vr headset?
The results are incredibly clean! Feathers and flowers could be interesting.
Black text on a dark grey background is nearly unreadable - I used Reader Mode.
> Unfortunately, the extremely shallow depth of field in macro photography completely throws this process off. If you feed unsharp photos into it, the resulting model will contain unsharp areas as well.
Should be possible to model the focal depth of the camera directly. But perhaps that is not done in standard software. You still want several images with different focus settings
Really amazing results.
I wonder if one could capture each angle in a single shot with a Lytro Illum instead of focus-stacking? Or is the output of an Illum not of sufficient resolution?
That would be awesome if it worked, from a curious look I can't say why not. I'll have to investigate a bit more. Thanks for bringing it up.
Wow this would be lovely for my Drosophila lab.
Amazing work, I especially love that you put all of them online to view. The bumblebee is my favorite, so fuzzy
I agree. The fine detail on the insects skin/shell is amazing.
I'd love to know the compute hardware he used and the time it took to produce.
Nothing fancy. Postshot does need a nvidia card though, I have a 3060Ti. A single insect, with around 5 million splats takes about 3 hours to train in high quality.
I still don't get the point of Gaussian Splats. How are they better than triangles?
I'm not an expert and have not yet worked with splats, however I understood that unlike super-sharp-edged triangles they can represent complicatedly-transparent 'soft' phenomena like fur or clouds or similar that would ordinarily need to be rendered using possibly semi-transparent curves/sheathes (for fur/grass) or voxels for cloudy things like steam/mist. I gather splats can also represent and reproduce a limited amount of view-dependent specularity, as other commenters have said this is not dynamic and cannot easily deal with changing scene geometry or light sources.. still sounds like a fun research-project I make it do more in terms of illumination though!
They are differentiable which allows for image based rendering via solving the inverse of the rendering function via gradient decent
It's really not a splat vs triangle thing. You're basically comparing points cloud data to triangles.
Likely triangles are used to render the image in a traditional pipeline.
It's just a simpler primitive I assume. Blurs and colors and angles are simpler than 3D geometries, so it's probably more aligned with working/thinking with other very low-level primitives with minimal dimensions (like the math of neural networks). I dunno, I'm kinda vibing a response here -- maybe someone else can give you a more authoritative answer
Cool! It looks awesome. I did see some "ghost legs" on the bumblebee. How does that sort of artifact happen?
The bumblebee was my first attempt, the tracking didn't quite work, so you get ghosting. Others too have ghosting, usually happens when part of the insect moves, while shooting (which takes 4h). They dry and crumble after a while.
Educational visualization seems like a really good use case for GS
This is awesome, thank you for sharing!
Your fluid simulation was pretty rad.
Pinhole lens + high light/long exposures to get sharp focus may help avoid some of the extra processing steps, he does mention he shot small aperture and that can cause diffraction effects and I guess that might be worse with pinhole though.
It all kind of depends on each other. More light, means longer recycle times on the speedlights or higher iso, more noise. Longer exposure isn't an option with speedlights, using continuous also has it's downsides, things may start to shake..