Remotely rendering high resolution models with VTK, Mesa 3D, Cmake and Three.js

Working with large 3D models is something you don't hear about. So why don't we do it on the web with some open source tools?

The open source community is a wonderful thing. Punch in the right words into Google and you can find the open source tools and projects that can do amazing things when combined. Sure, it might be a stretch of what those tools do, but that’s half the fun.

The use case

Let’s say you have a 3D model. But it’s not just any model, let’s say it’s a HUGE model, something with 100 million polys and more data than you can shake a stick at. WebGL can do some amazing things…but loading a multi-gigabyte model file and then rendering it…it’s a bit on the taxing side.

Now Justin you say, people don’t make models that big. To that I would say, yes, yes we do. The big industrial 3D scanners do all kinds of crazy things, least of which is a meazly 100 million poly model (rendered and cleaned up from point cloud data likely). What are we to do?

I say let’s remotely render a high resolution frame of that big, large 3D model in the cloud as someone manipulates a lower, much more transportable version of in WebGL.

Sounds hard. Not going to lie; not the easiest thing in the world.

It’s all in the secret sauce…except it’s not so secret

Cheap cloud servers don’t exactly have large amounts of cheap GPU power available, so we can’t use fancy OpenCL or Cuda to do our dirty work. What we need is something that’ll run on all those relatively cheap CPU cycles. Welcome to the stage, VTK!

The Visualization Toolkit (VTK) is an open source system that lets you do all kinds of 3D, imaging, and visualization. I can’t even begin to scratch the surface of how much you can do with VTK, so I highly suggest that you read the FAQ.

What we’re going to use VTK for is its ability to render frames of 3D objects without the use of a graphics card. We can use all the things we’ve come to love in our 3D world, such as cameras, positioning, and lights, yet not have any graphical interface at all.

But Justin, VTK doesn’t do off screen rendering. Ah, but it does! You just have to tie into Mesa 3D.

I need a driver for a video card, but I don’t have a video card

Mesa 3D, like VTK, can do a lot of things ranging from emulation to hardware acceleration for modern GPU’s. What we want it for is off-screen rendering support, which we can than utilize in VTK.

I don’t see an exe file, how do we do this?

At this point, it sounds like if we start piecing this together, it’ll work. But how does one go about doing that? It’s time to break out your compilers.

You can do this a number of ways; you can compile and deploy to Amazon, or you can just compile on your AMI of choice. I compiled locally as statically as possible, and then moved over the build to Amazon. Your mileage may vary.

Please also note, that the versions you see below are not the edge version of any of these libs. These are the versions I used that worked (and trust me, some versions matched together simply DO NOT work…you will be left banging your head on the table screaming to the heavens).

So we start the build process. I’d grab a cup of coffee or the drink of your choice, because depending on what you’re compiling on, this could take a while.

# Go somewhere to play around $ cd /tmp # get Mesa $ wget ftp://ftp.freedesktop.org/pub/mesa/7.11.2/MesaLib-7.11.2.tar.gz # get VTK and test data $ wget http://www.vtk.org/files/release/5.8/vtk-5.8.0.tar.gz $ wget http://www.vtk.org/files/release/5.8/vtkdata-5.8.0.tar.gz # untar Mesa $ tar -xjvf MesaLib-7.11.2.tar.bz2 # Configure and build mesa $ cd Mesa-7.11.2/ ./configure \ --with-driver=osmesa \ --with-gallium-drivers="" \ --disable-egl $ make -j4 # Untar VTK and VTKData $ tar -xzvf vtk-5.8.0.tar.gz $ tar -xzvf vtkdata-5.8.0.tar.gz # Configure and build VTK $ mkdir VTK_Build $ cd VTK_Build/ $ cmake \ -D"VTK_DATA_ROOT:PATH=/tmp/VTKData" \ -D"OPENGL_INCLUDE_DIR:PATH=/tmp/Mesa-7.11.2/include" \ -D"OPENGL_gl_LIBRARY:FILEPATH=" \ -D"OPENGL_glu_LIBRARY:FILEPATH=/tmp/Mesa-7.11.2/lib/libGLU.so" \ -D"VTK_OPENGL_HAS_OSMESA:BOOL=ON" \ -D"OSMESA_INCLUDE_DIR:PATH=/tmp/Mesa-7.11.2/include" \ -D"OSMESA_LIBRARY:FILEPATH=/tmp/Mesa-7.11.2/lib/libOSMesa.so" \ -D"VTK_USE_OFFSCREEN:BOOL=ON" \ -D"VTK_USE_X:BOOL=OFF" \ /tmp/VTK $ make -j4 $ make test

We made it! Now what?

So the build finished, hopefully error free. Now what? We need to test to see if things are working. Just so happens, on the Cmake wiki, there is a example script that does this very thing. Drop that into a file, build it, and then run from the command line.

What you should end up with is something like this: A sphere on white background, off screen rendered

That little sphere on the white background…that’s success!

I’m going to need more than a sphere

So what Justin, I don’t need some low poly sphere. I need some high resolution action. So let’s do that.

The following is a basic example engine that takes command line parameters, loads a model file, and then renders a particular view. Now, before you start screaming “that won’t scale!” and “you have C++ issues” I’m well aware (anyone who’s written even a little C++ would be quick to note this). It’s a pretty crappy example for reasons I’m not at liberaty to explain (read my memoirs after I die…the story is both sad and funny all at once).

#include <vtkXMLPolyDataReader.h> #include <vtkPolyDataMapper.h> #include <vtkPolyData.h> #include <vtkProperty.h> #include <vtkActor.h> #include <vtkRenderWindow.h> #include <vtkRenderer.h> #include <vtkSmartPointer.h> #include <vtkWindowToImageFilter.h> #include <vtkGraphicsFactory.h> #include <vtkImagingFactory.h> #include <vtkCamera.h> #include <vtkJPEGWriter.h> #include <string> int main(int argc, char *argv[]) { // call the model std::string filename = argv[1]; // Canvas Size double canvas_width = atof(argv[12]); double canvas_height = atof(argv[13]); // Model Scale // Camera Positions double camera_pos_x = atof(argv[2]); double camera_pos_y = atof(argv[3]); double camera_pos_z = atof(argv[4]); // Camera Rotations double camera_yaw_y = atof(argv[5]); double camera_pitch_x = atof(argv[6]); double camera_roll_z = atof(argv[7]); // Focal Point double camera_focal_x = atof(argv[8]); double camera_focal_y = atof(argv[9]); double camera_focal_z = atof(argv[10]); // SetViewAngle double camera_view_angle = atof(argv[11]); // UseTexture std::string model_texture = argv[14]; std::string model_texture_file = argv[15]; // Setup offscreen rendering vtkSmartPointer<vtkGraphicsFactory> graphics_factory = vtkSmartPointer<vtkGraphicsFactory>::New(); graphics_factory->SetOffScreenOnlyMode(1); graphics_factory->SetUseMesaClasses(1); vtkSmartPointer<vtkImagingFactory> imaging_factory = vtkSmartPointer<vtkImagingFactory>::New(); imaging_factory->SetUseMesaClasses(1); // Set my poly mapper vtkSmartPointer<vtkXMLPolyDataReader> reader = vtkSmartPointer<vtkXMLPolyDataReader>::New(); reader->SetFileName(filename.c_str()); reader->Update(); vtkPolyData* polydata = reader->GetOutput(); vtkSmartPointer<vtkPolyDataMapper> mapper = vtkSmartPointer<vtkPolyDataMapper>::New(); mapper->SetInput(polydata); // Define the camera in scene vtkSmartPointer<vtkCamera> camera = vtkSmartPointer<vtkCamera>::New(); camera->SetPosition(camera_pos_x, camera_pos_y, camera_pos_z); camera->SetFocalPoint(camera_focal_x, camera_focal_y, camera_focal_z); camera->Yaw(camera_yaw_y); camera->Pitch(camera_pitch_x); camera->Roll(camera_roll_z); camera->SetViewAngle(camera_view_angle); camera->SetClippingRange(1, 1000); // Put my model into the scene, kill the backface vtkSmartPointer<vtkActor> actor = vtkSmartPointer<vtkActor>::New(); actor->SetMapper(mapper); actor->SetScale(1); actor->GetProperty()->BackfaceCullingOn(); // Create a renderer, give it the camera vtkSmartPointer<vtkRenderer> renderer = vtkSmartPointer<vtkRenderer>::New(); renderer->SetActiveCamera(camera); // Our render "window" (it's not really a window) vtkSmartPointer<vtkRenderWindow> renderWindow = vtkSmartPointer<vtkRenderWindow>::New(); renderWindow->SetOffScreenRendering(1); renderWindow->SetSize(canvas_width, canvas_height); renderWindow->AddRenderer(renderer); // Add the actors to the scene renderer->AddActor(actor); renderer->SetBackground(0, 0, 0); // Background color black // Get me a frame! renderWindow->Render(); // Dump that frame from the window vtkSmartPointer<vtkWindowToImageFilter> windowToImageFilter = vtkSmartPointer<vtkWindowToImageFilter>::New(); windowToImageFilter->SetInput(renderWindow); windowToImageFilter->Update(); // Damn it, there be something wrong with WriteToMemoryOn, do something to fill the gap std::string fn = "/tmp/"; fn += randomString(24, true, true, false); fn += ".jpg"; vtkSmartPointer<vtkJPEGWriter> writer = vtkSmartPointer<vtkJPEGWriter>::New(); //writer->WriteToMemoryOn(); writer->SetFileName(fn.c_str()); writer->SetQuality(65); writer->SetInputConnection(windowToImageFilter->GetOutputPort()); writer->Write(); std::string command64 = "base64 "; command64 += fn; char fc[100]; strcpy(fc, command64.c_str()); std::string datauri = "data:image/jpeg;base64,"; std::string encoded = exec(fc); datauri += encoded; // return my data:uri std::cout << "" << datauri << std::endl; // remove the temp file remove(fn.c_str()); //return encoded; return EXIT_SUCCESS; } /* * A method to generate a random string in C++ * Author: Danny Battison * Contact: gabehabe@googlemail.com */ std::string randomString(int length, bool letters, bool numbers, bool symbols) { // the shortest way to do this is to create a string, containing // all possible values. Then, simply add a random value from that string // to our return value srand(time(NULL)); std::string allPossible; // this will contain all necessary characters std::string str; // the random string if (letters == true) { // if you passed true for letters, we'll add letters to the possibilities for (int i = 65; i <= 90; i++) { allPossible += static_cast<char>(i); allPossible += static_cast<char>(i + 32); // add a lower case letter, too! } } if (numbers == true) { // if you wanted numbers, we'll add numbers for (int i = 48; i <= 57; i++) { allPossible += static_cast<char>(i); } } if (symbols == true) { // if you want symbols, we'll add symbols (note, their ASCII values are scattered) for (int i = 33; i <= 47; i++) { allPossible += static_cast<char>(i); } for (int i = 58; i <= 64; i++) { allPossible += static_cast<char>(i); } for (int i = 91; i <= 96; i++) { allPossible += static_cast<char>(i); } for (int i = 123; i <= 126; i++) { allPossible += static_cast<char>(i); } } // get the number of characters to use (used for rand()) int numberOfPossibilities = allPossible.length(); for (int i = 0; i < length; i++) { str += allPossible[rand() % numberOfPossibilities]; } return str; } std::string exec(char* cmd) { FILE* pipe = popen(cmd, "r"); if (!pipe) return "ERROR"; char buffer[128]; std::string result = ""; while (!feof(pipe)) { if (fgets(buffer, 128, pipe) != NULL) result += buffer; } pclose(pipe); return result; }

So let’s render something with it:

# let's render a model $ ./RenderObj somemodel.ply -3 21 39 0 0 0 0 0 0 30 600 600 true somemodel_texture.jpg # ... # lot's of base64 data # ...

That above command will result in a big dump of base64 data (which we’ll use in a bit to send to a browser), but you’ll note I kept the temp file so we can look at it now:

Our off screen render...success!

We’re cookin'. Let’s prep for the browser.

Captain, I don’t have the RAM

So now we’ve got this huge model on our backend, sitting on some instance, waiting to be rendered and return frames. We need to load a smaller version of the model into the browser. These days, Three will support VTK ascii data, so that could be one way to go, or you could use some other object type. But before you get there you need a smaller version of the model and that requires decimation.

Now, decimating models is as much art as science. Crush it too hard, and you end up with that horrible creature at the end of The Fly. You can do this a number of ways, with a number of programs (example: MeshLab). When you deal with the big models that require decimation, you will run into a problem no matter which program you use: RAM.

You think that little 16GB workstation is going to be enough? Unlikely. The reality is, big models take up lots of RAM, and when you’re doing decimation, it’s going to hurt. We have machines that do this sort of thing in our animation department, but let’s say to nah, I have Amazon, show me some other way. Just so happens, VTK can do that too:

#include <vtkPolyData.h> #include <vtkQuadricDecimation.h> #include <vtkPLYWriter.h> #include <vtkPLYReader.h> #include <vtkSmartPointer.h> #include <vtkPolyDataMapper.h> #include <vtkProperty.h> #include <string> bool hasEnding (std::string const &fullString, std::string const &ending) { if (fullString.length() >= ending.length()) { return (0 == fullString.compare (fullString.length() - ending.length(), ending.length(), ending)); } else { return false; } } int main(int argc, char *argv[]) { std::string filename = argv[1]; char* fnoutput = argv[2]; double percReduce = atof(argv[3]); vtkSmartPointer<vtkPLYReader> reader = vtkSmartPointer<vtkPLYReader>::New(); reader->SetFileName(filename.c_str()); reader->Update(); vtkSmartPointer<vtkPolyData> inputPolyData = vtkSmartPointer<vtkPolyData>::New(); inputPolyData->ShallowCopy(reader->GetOutput()); std::cout << "Before decimation" << std::endl << "------------" << std::endl; std::cout << "There are " << inputPolyData->GetNumberOfPoints() << " points." << std::endl; std::cout << "There are " << inputPolyData->GetNumberOfPolys() << " polygons." << std::endl; vtkSmartPointer<vtkQuadricDecimation> decimate = vtkSmartPointer<vtkQuadricDecimation>::New(); decimate->SetTargetReduction(percReduce); decimate->SetInputConnection(inputPolyData->GetProducerPort()); double test = decimate->GetTargetReduction(); decimate->Update(); vtkSmartPointer<vtkPolyData> decimated = vtkSmartPointer<vtkPolyData>::New(); decimated->ShallowCopy(decimate->GetOutput()); std::cout << "After decimation" << std::endl << "------------" << std::endl; std::cout << "Entered % target: " << argv[3] << " ." << std::endl; std::cout << "Decimate % target: " << test << " ." << std::endl; std::cout << "There are " << decimated->GetNumberOfPoints() << " points." << std::endl; std::cout << "There are " << decimated->GetNumberOfPolys() << " polygons." << std::endl; vtkSmartPointer<vtkPLYWriter> plyWriter = vtkSmartPointer<vtkPLYWriter>::New(); plyWriter->SetFileName(fnoutput); plyWriter->SetInputConnection(decimate->GetOutputPort()); plyWriter->Write(); return EXIT_SUCCESS; }

You’ll be quick to note, this is very similar to their example. Didn’t I tell you VTK was great?

How much RAM this consumes is based on how big your model is. I’ve run it against a 2GB model with 95M poly’s, and it just about maxes out 64GB of RAM when it’s working.

What I’ve found works best is actually creating a series of lower poly count models. Let’s say 1M, 3M, 10M…and so on. This allows us to get higher resolution frames but much faster render times on the server (which I will explain later). I then drop a lower res model into MeshLab (say the 1M), and then I hand decimate it to 100K polys and then convert that to Three’s json format for easy loading.

Could you automate all of that? Of course.

Three.js and a little AJAX for good rendering

So you’ve got a remote renderer, you’ve decimated your huge model into smaller models, and you’ve converted to your choice for loading in Three.js. Now what? Let’s do hook to our web model in WebGL and then load that remote rendered frame!

I’m not going to go into the basics of Three.js, there are already tutorials and lots of examples about the in’s and out’s of loading a model. Let’s jump to where the good stuff:

// other classy things this.ajaxOut = function (camera, control, auto, polycount) { console.group("Prepping to get RemoteFrame"); if (camera != null){ var viewportCamera = camera; } else { var viewportCamera = this._viewport.getCamera(); } if (control != null) { var viewportControl = control; } else { var viewportControl = this._controls; } // let's initially be sane and only autorender 1M frames if (auto){ clp = _loadModelPath + "1"; } else { // burn those CPU cycles! clp = _loadModelPath + polycount; } var data = { cameraPositionX : viewportCamera.position.x, cameraPositionY : viewportCamera.position.y, cameraPositionZ : viewportCamera.position.z, cameraRotationX : viewportCamera.rotation.x, cameraRotationY : viewportCamera.rotation.y, cameraRotationZ : viewportCamera.rotation.z, cameraFOV : viewportCamera.fov, controlsX : viewportControl.target.x, controlsY : viewportControl.target.y, controlsZ : viewportControl.target.z, canvasW : document.getElementById("canvas_viewport").width, canvasH : document.getElementById("canvas_viewport").height }; console.info("Set up data array for POST", data); var postData = JSON.stringify(data); console.info("Stringify JSON object", postData); var httpRequest = new XMLHttpRequest(); httpRequest.onreadystatechange = function () { if (httpRequest.readyState === 4) { if (httpRequest.status === 200) { console.info("Looking good, AJAX returned 200!"); var newDiv = document.createElement("div"); newDiv.setAttribute("data-img", httpRequest.responseText); // meh var content = "Rendered Frame <button class=remoterender-view>View</button> | <button class=remoterender-save>Save</button>"; newDiv.innerHTML = content; _renderResponseBlock.appendChild(newDiv); _renderResponseText.innerHTML = ""; if (auto) { var myCanvas = document.getElementById('canvas_viewport_renderoutput'); var ctx = myCanvas.getContext('2d'); ctx.canvas.width = window.innerWidth; ctx.canvas.height = window.innerHeight; var img = new Image; img.onload = function() { ctx.drawImage(img,0,0); // Or at whatever offset you like }; img.src = httpRequest.responseText; $("#canvas_viewport").hide(); $("#canvas_viewport_renderoutput").show(); console.debug("rendering automagically!"); } } else { _renderResponseText.innerHTML = "Could not get a remote frame at this time. Please try again later."; } } }; httpRequest.open('POST', clp); httpRequest.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded'); httpRequest.send('data=' + encodeURIComponent(postData) ); console.info("Fire in the hole"); console.groupEnd(); }; // other classy things

Pretty straightforward right? We’re polling the current viewport camera, controls, and field of view so that we can send that to our remote renderer. The remote renderer throws back some base64 data that we then use to draw on a secondary canvas that we then bring to the front (note, we don’t draw on our WebGL context canvas, that will not do what you want it to).

The example, in action

The following video shows the concept in action. Nifty.

And yes, that’s a shorter clip from a video I authored for a conference.

Where can I play with it?

At the moment, the demo is not online. Remote rendering large models on Amazon or your cloud provider of choice can be expensive.

If you were to write use a proper C++ service that can be autoscaled on Amazon (which is what cough I would do), you have a number of things to handle beyond that. How do you get fast access to the files? Do you mount them off of S3? Do you shuffle them off of RAIDed EBS volumes? How do decrease latency across regions?

Once you go down this road, things get complicated and performance can get pricey.

In real life, with a lot of testing, I’ve found that rendering 1M to 3M poly’s using binary data and RAIDed EBS volumes will get you renders in the 0.3 to 1.1 second range. Taking into latency, you’re looking at a 1.2 to 4 second round trip from request to response. This is why you’ll note that the autorender flag in the JavaScript above was set to 1M; it’s about as high as you can go for onCameraMovementStop based rendering. Beyond that, there is a noticeable delay on frame return.

When you start rendering the big files, anything above 10M polys really, you’re going to get vastly different render times. 95M polys can run anywhere from 50-90 seconds (which if you start to think about it, is not terrible given its size…but for a user that is not used to big 3D assets, it’s slower than dirt). In these sorts of cases, you’ve got to job queue.

Conclusion

In a perfect world this would be a rather turn key process, but it really depends on the type of models and data you’re dealing with. Open source tools such as ParaView have a similar set of functionality, including something more generally useful such as point cloud library support (see PCL and ParaView – Connecting the Dots), but maybe you’re only dealing with polys. If that’s the case, then you can translate that to something workable on the web, using little more than the tools I’ve described above.