Wednesday, October 10, 2012

Bullet Native Client Acceleration Module


Native Client Acceleration Modules


90% Web App
Native Performance Where You Need It

Native Client (NaCl) Acceleration Modules (AM) are a new way of using Native Client. In a nutshell they allow you to expose a JavaScript API to C/C++ code. The C/C++ code runs at native speed inside the NaCl sandbox. The application logic runs inside JavaScript.

For low level details on how this works check out my GDL talk here.

To help show this technology off I've created a demo that shows the Bullet Physics engine simulating 400 rigid bodies in real time. On my machine simulation takes only 2 milliseconds. The time it takes to transfer the data from NaCl to JavaScript is negligible. All drawing is done using Three.js.




What's really fun about the demo is that you can create your own scenes and watch them run. Grab your text editor and paste this into it:


{
"shapes": [
{
"name": "box",
"type": "cube",
"wx": 1,
"wy": 1,
"wz": 1
}
],
"bodies": [
{
"shape": "box",
"position": {
"x": 0.0,
"y": 1.0,
"z": 0.0
},
"rotation": {
"x": 0.0,
"y": 0.0,
"z": 0.0
},
"mass": 1.0,
"friction": 0.5
}
]
}


First let's discuss the "shapes" array. It contains an array of shape objects. Each shape object must have a name and a type. There are four shape types:


  1. cube
  2. cylinder
  3. sphere
  4. convex

The "cube" shape type has the following parameters:

  1. wx
  2. wy
  3. wz

The three together specify the width, height, and depth of the cube.

The "cylinder" shape type has the following parameters:
  1. radius
  2. height

The radius specifies how thick the cylinder is and the height how tall it is.

The "sphere" shape type has the following parameters:
  1. radius

Which sets the radius of the sphere.

The final shape type "convex" has the following parameters:
  • points
Which is an array of points like the following:

points: [
[0.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0],
[0.0, 0.0, 1.0],
[2.0, 5.0, 1.0],
[1.0, 1.0, 1.0]
]


A convex hull is created out of these points, allowing for any convex shape to be created.

After specifying the shapes, the bodies array specifies the actual physical bodies in the scene:

{
"shape": "box",
"position": {
"x": 0.0,
"y": 1.0,
"z": 0.0
},
"rotation": {
"x": 0.0,
"y": 0.0,
"z": 0.0
},
"mass": 1.0,
"friction": 0.5
}


Each body specifies the shape name. Position has x,y,z specifying where the origin of the shape is. The rotation is euler angles of yaw, pitch, and roll. Mass and friction specify how the object will move around in the world.

Once you've created your JSON scene description, pick the "Choose File" demo and load it. If you want to replay it, pick the 'Reload Scene' button.

Sunday, June 17, 2012

Bringing SIMD Accelerated Vector Math to the Web Through Dart

Recently, I have been working on a vector math library for Dart. Boringly, I named it Dart Vector Math. The latest version can be found on github. My two biggest goals for Dart Vector Math are the following:

  • Near 100% GLSL compatible syntax. This includes the awesome vector shuffle syntax, and flexible construction of vectors and matrices.
  • Performance in terms of both CPU time and memory usage / garbage collection load.

Aside from a couple quirks, Dart Vector Math is GLSL syntax compatible. It is possible to copy and paste GLSL code into Dart and after making a couple tweaks have it compile with Dart Vector Math. This makes debugging shader code easy.

Since Dart is a garbage collected language, to be optimal in terms of space you want to avoid creating lots of objects. In order to facilitate that, Dart Vector Math offers many functions that work directly on already allocated vectors and matrices.

This weekend I started to look at CPU performance of Dart Vector Math versus glMatrix.dart (a port of glMatrix from JavaScript to Dart, the current champ of JavaScript vector math libraries). The initial results are heavily in favour of Dart Vector Math:

=============================================
Matrix Multiplication
=============================================
Avg: 14.59 ms Min: 10.161 ms Max: 22.927 ms (Avg: 14590 Min: 10161 Max: 22927)

=============================================
Matrix Multiplication glmatrix.dart
=============================================
Avg: 283.353 ms Min: 272.062 ms Max: 287.988 ms (Avg: 283353 Min: 272062 Max: 287988)

=============================================
mat4x4 inverse
=============================================
Avg: 28.289 ms Min: 21.019 ms Max: 34.891 ms (Avg: 28289 Min: 21019 Max: 34891)

=============================================
mat4x4 inverse glmatrix.dart
=============================================
Avg: 318.909 ms Min: 315.435 ms Max: 325.831 ms (Avg: 318909 Min: 315435 Max: 325831)

=============================================
vector transform
=============================================
Avg: 4.324 ms Min: 2.811 ms Max: 14.859 ms (Avg: 4324 Min: 2811 Max: 14859)

=============================================
vector transform glmatrix.dart
=============================================
Avg: 144.431 ms Min: 138.263 ms Max: 153.798 ms (Avg: 144431 Min: 138263 Max: 153798)

The code for 4x4 matrix multiplication in Dart Vector Math and glMatrix are practically identical, so on closer inspection the above numbers didn’t make much sense. There is one key difference- Dart Vector Math uses a native Dart object to store the matrix while glMatrix uses a Float32Array as storage. Digging into the disassembly I discovered that indexing into a Float32Array is a slow path for the VM right now, skewing the results against glMatrix.dart. Not that big of a deal, Dart is a new language and the VM needs time to mature.

Once the performance issue with Float32Arrays is fixed I want to have Dart Vector Math use them for two reasons. First, they take up 50% less space (single vs. double precision floats). Second, WebGL needs Float32Arrays for uniform data which means the matrix is going to eventually end up inside a Float32Array, might as well keep it in one the whole time. There is no CPU performance benefit from using Float32Array as storage because all operations result in the floats being promoted to doubles, operated on, and then stored back as floats.

My intention to move to Float32Array got me thinking and I ended up asking myself: Why doesn’t the browser offer an API for common vector math operations on Float32Array implemented efficiently with SIMD instruction sets? Well, I’m not sure why it is not offered, but I ended up spending the weekend implementing it for the Dart VM.

The API follows:

class SimdFloat32Array {
 static matrix4Inv(Float32List dst, int dstIndex, Float32List src, int srcIndex, int count);
 static matrix4Mult(Float32List dst, int dstIndex, Float32List a, int aIndex, Float32List b, int bIndex, int count);
 static transform(Float32List M, int Mindex, Float32List v, int vIndex, int vCount);
}


I do not want anyone to get hung up on the specific API or naming convention (let’s avoid bikeshedding). My three biggest goals for this API are the following:

  • Offer the important operations used by vector math libraries
  • Operate directly on floats instead of promoting to doubles
  • Design for bulk processing

So far I have exposed three of the important operations, but there are many more. Each of those functions is backed by an SSE implementation that operates directly on the Float32Array data. Notice that each of the methods take a count variable, this allows a single call to do bulk work.

The results of my implementation were very encouraging:

=============================================
Matrix Multiplication SIMD
=============================================
Avg: 8.702 ms Min: 8.475 ms Max: 9.217 ms (Avg: 8702 Min: 8475 Max: 9217)

=============================================
mat4x4 inverse SIMD
=============================================
Avg: 7.107 ms Min: 6.89 ms Max: 7.754 ms (Avg: 7107 Min: 6890 Max: 7754)

=============================================
vector transform SIMD
=============================================
Avg: 6.415 ms Min: 6.204 ms Max: 7.006 ms (Avg: 6415 Min: 6204 Max: 7006)

Aside from the vector transformation operation (I think my SSE vector transform code is just slow), I got speedups between 2x and 4x.

Does this have legs? I hope so, but it’s not my call. If you see value in exposing this acceleration architecture into the browser, speak up.

Anticipating some questions:

What about JavaScript? The API would be easy to expose in JavaScript.


What about hardware without SIMD instruction sets? Probably not an issue since ARM, x86, and PPC have excellent SIMD instruction sets. Other platforms can implement the API using scalar floating point instructions.

What about other browsers? Again, this API would be easy to expose if it gained support.

Fast vector math operations are a requirement if we are going to start writing amazing games in the browser, I hope my proposal can make this possible.

Sunday, June 10, 2012

Dart Vector Math version 0.8


The Dart Vector Math Library is for game or WebGL programmers needing 3D vector math in Dart.


ChangeLog:

  • Serialization between Float32Array and Vectors/Matrices
  • Added many self* methods to matrix and vector classes:
    • selfAdd
    • selfSub
    • selfMultiply
    • etc...
The self* methods avoid allocating new instances- reducing immediate overhead and, later, garbage collection overhead. Because of the avoided work they are faster than overloaded operators. Please use them whenever possible.
  • Matrix inversion
    • 2x2,3x3,4x4
    • Upper 3x3 of 4x4 (rotation submatrix)
  • Matrix rotation constructors
  • Matrix adjoint
  • Many bugs fixed
  • Specialized branchless constructors
  • Updated to support the latest version of Dart

Features


  • Almost 100% GLSL compatible syntax. This includes the awesome vector shuffle syntax, and flexible construction of vectors and matrices. Aside from constructor syntax (Dart requires the keyword new before the class name where GLSL does not), if you can’t copy and paste your GLSL source code and have it compile, you’ve found a bug, please report it.
  • A quaternion class. Quaternions are a must when dealing with rotations in 3D. This library offers them as well as functions to convert between quaternions and rotation matrices.
  • OpenGL camera projection and look at matrix utilities.
  • Dart makes operator overloading possible and Vector Math takes full advantage of it.
  • Fully documented.
  • Somewhat tested (quickly expanding on this).

Types



  • vec2
  • vec3
  • vec4
  • mat2x2
  • mat2x3
  • mat2x4
  • mat3x2
  • mat3x3
  • mat3x4
  • mat4x2
  • mat4x3
  • mat4x4
  • quat

Coming Soon



  • Planes
  • Axis Aligned Bounding Box (AABB)
  • Orthonormalization and inverse of square matrices
  • Fully tested

Documentation

Dart Vector Math is fully documented and uses dartdoc to generate HTML documentation. Check out the documentation in the git repository.

Download

Dart Vector Math is available from the following github repository:

https://github.com/johnmccutchan/DartVectorMath

Feature Requests / Bug Reports

File feature requests and bug reports here:

https://github.com/johnmccutchan/DartVectorMath/issues

License

Dart Vector Math is licensed under the ZLIB license.

Authors

John McCutchan <john@johnmccutchan.com>

Sunday, April 1, 2012

#AltDevPanel Optimization

On Saturday March 31st I was invited to participate in a panel on optimization for #AltDevBlogADay.




Saturday, March 24, 2012

Control, Configure, Monitor and View Your Game Engine From the Web

Back in February I gave a talk at AltDevConf. The video stream (slides & my audio) is now available



Controlling Your Game Engine Over WebSocket

In my previous article I introduced WebSocket and detailed building your own WebSocket server. I also explained that I have embedded a WebSocket server into my engine and use a web application to control, configure and monitor my engine. Check out my previous article for some concrete examples of what I have been doing. In this article I will show you how I do this by explaining the remote procedure call system I designed and implemented.

RPC

What exactly is a remote procedure call system? Simply put, it’s a function call from one process to another. The processes do not have to be running on the same machine or share the same architecture. To help understand, first consider a local procedure call in C/C++.  The function arguments are pushed on to the stack, then the program branches to the address of the function being called. When the called function returns the result is stored on the stack for the calling function to access. This works because the caller, callee and data live in the same address space and agree on calling conventions. In a remote procedure call system, the caller, callee and data do not share an address space and thus can not share data by memory address. In order to make a remote procedure call the parameters must be marshalled for transmission over the network and packed into a message for the remote system. The return value(s) from an RPC are returned in a similar way. Still with me? Don’t worry, I will not spend too much time in the plumbing for this article.

The first step in building a RPC system is to pick the data format that messages will be exchanged in. Since we are talking about the web, naturally, I chose the Java Script Object Notation, better known as JSON, as the data format for my RPC system. JSON is a simple, human readable data format. It supports numbers, strings, booleans, arrays and key value maps. Here is an example:

{
  "type" : "command"
  "command" : "Memory.Stats"
  "id" : "50"
}


Another benefit of JSON is that web browsers natively support it. Your browser can convert almost any Javascript data structure into JSON with a single function call. You can also go the other way, from JSON to a native Javascript data structure just as easily.

Now that the data interchange format has been settled, we need to come up with the RPC message framing. That is, the mandatory portion of the RPC message. Each RPC is a JSON map with two required keys, the message type and the message serial number.

{

“type” : “<type>”
“id” : “<id>”
}

The type field is used to indicate the type of message. My system supports three types, “command”, “result” and “report”. I will explain these shortly. But first, the id field is used to connect related messages together. For example, this command has an id of “4”:

{ “type” : “command”, “id” : 4”, … }

The result of the command sent back from the engine also has an id of “4” allowing to easily connect calls with their return values.

Getting back to the type field:

“command” : Commands initiate some sort of action or state change.
“result” : Result of a command. Linked by the id field. Commands are not required to reply with a result.
“report” : A regular, repeating message from a subscription. Clients can subscribe to a data stream and set an interval between reports.

The other, non-mandatory, fields in a message are type specific.

Examples

Some example commands that my system supports:

“Echo” - Replies with a copy of the “message” field.
“Subscribe” - Subscribes the caller the “reporter” field.
“Unsubscribe” - Unsubscribes the caller from the “reporter” field.
“Memory.Stats” - Replies with the state of all registered allocators.
“Config.Set” - Sets “variable_name” to “variable_value”.
“Config.Delete” - Removes “variable_name” from the configuration variable set.
“Config.Get” - Returns the value of “variable_name”.
“Config.Dump” - Returns all registered configuration variables.
“Object.Create” - Creates a game object
“Object.Set” - Sets “property_name” to “value”. “render_model” is an example property.
“Message” - Sends a message to another connected client.

I have two reporters:

“Memory” - Regularly reports the state of all registered allocators.
“Config” - Sends configuration variable change notifications.

Here is a screenshot of a live graph of memory allocators:


Quake Style Console

What developer interface would be complete without a quake style console triggered by the ~ key? For this, I used JQuery Terminal. It is a great library which makes adding custom commands to the terminal window really simple. I have also made it possible for the user to construct messages by hand from within the terminal, making the terminal window very powerful. The following screenshot shows a user sending a custom Echo message by using the “dcc” terminal command. It also shows the response from the server.


C++ implementation notes

Connection Management

I started with the explicit goal of supporting multiple connected clients. My previous article discussed the importance of decoupling the code that waits for a connection over TCP from the code that manages a WebSocket connection. Because of that decoupling, supporting multiple connections was practically free.

Commanders

Support for commands is added by  implementing the commander interface:

class doCommandCenterCommanderInterface {
public:
 
doCommandCenterCommanderInterface() {};
 virtual ~
doCommandCenterCommanderInterface() {};

 virtual const char*
CommanderName() const = 0;
 virtual bool
CanProcessCommand(const doCommandCenterCommandPacket* packet) const = 0;
 virtual void
ProcessCommand(doCommandCenterConnection* connection, const doCommandCenterCommandPacket* packet) = 0;
};


A commander can process one or many commands (CanProcessCommand). Each commander is registered with the central command center, which does the routing of messages from connected clients to the appropriate commander. doCommandCenterCommandPacket just contains a parsed JSON object and doCommandCenterConnection has the WebSocket connection and various buffers in it.

Reporters

Support for reports is added by implementing the reporter interface:
class doCommandCenterReporterInterface {
public:
 
doCommandCenterReporterInterface();
 virtual ~
doCommandCenterReporterInterface() {};

 void
Subscribe(doCommandCenterConnection* connection);
 void
Unsubscribe(doCommandCenterConnection* connection);
 bool
ShouldRefresh(palTimerTick tick);
 void
ChangeRefreshDelay(doCommandCenterConnection* connection, palTimerTick delay);

 virtual const char*
ReporterName() const = 0;
 // Update internal state
 virtual void
Refresh() = 0;
 // Report state to all subscribed connections
 virtual void
Report() = 0;
};


Each reporter is responsible for generating a single type of report. Similar to commanders, reporters are registered in the central command center.

Client commands to subscribe and unsubscribe are processed by a commander, like all other commands.

Javascript implementation notes

Connection

My web application has a communication class ‘Connection’. The interesting methods are:

FireAndForget(command) - Send a command. No callback function is registered.
Fire(command, callback) - Send a command. Register a callback to be called when the result arrives.
Subscribe(reporter_name, report_callback) - Subscribes the client to the reporter. Registers a callback to be called whenever a report comes in.
Unsubscribe(reporter_name) - Unsubscribes the client to the reporter. Unregister report callback.

Message Processing

When a message arrives the behaviour depends on the type of message. If it is a result and the caller registered a callback, the message is passed to the callback and the callback is removed from the callback table. If it is a report the report callback is called.

Modules

My web application consists of many modules. Each module covers a set of related tasks. For example, the Memory module can query for all open allocations and updates a live graph of memory used by each allocator. Each of the modules registers subscriptions and command callbacks with the connection to update itself when new messages arrive. Each module is given an HTML <div> element to render itself into.

UI

As you have already guessed, the UI for my web application is written with the help of JQuery. A couples months ago I only had a vague idea of what JQuery is, but after using it I say- it is awesome and makes working with the DOM a cinch.

Why a web application?

There are many advantages to building development tools as a web application in concert with an actual instance of a game. First of all, deployment is trivial.  Rolling out updated tools to the team happens transparently. Tools are platform independent running on Mac, Linux, Windows and even smart phones. Separating the tools UI from an actual game engine will force a less coupled design and require a clean design on the tools communicate with the engine. Tools UI can be used with multiple games, so long as they agree on the RPC mechanism and a common set of commands. Forcing the tools to communicate with the engine over the network allows for a developer to connect to an instance of the engine from anywhere, view the output, and make changes at run-time. When working with console development, the game is not running on your computer and interacting with it may be cumbersome.  Moving the UI into the web browser solves the problem. Imagine showing a colleague the latest feature you have implemented, but she is all the way across the building. With a remote viewer that runs in a web browser, she can just connect to the engine running on your development kit and see what is going on. Finally, as I have discovered, modern web development is a great platform for developing applications. Did you know that Chrome has a debugger, profiler and other tools built in? HTML5, WebGL, WebSocket, and Canvas make rich web applications possible. These are just some of the advantages to using a web application.

Conclusion

In order to control my game engine with a web application I chose to use WebSocket as the medium. I built an RPC system that supports commands and subscriptions to updates using JSON as the data interchange . I chose JSON because it is the native web data structure. The UI was built with JQuery and some off the shelf JQuery libraries. My engine offers a rich set of commands and allows for multiple simultaneous connections. I am very happy with this system because it is really easy to add new commands and reporters on the C++ side and Javascript is so easy it feels criminal. Next week I will be back with my final article in this series explaining how I was able to add simple streaming video to my game engine. Below is a screenshot of a screenshot of my game engine:


I gave a related talk at AltDevConf on Saturday, February 11th at 14:00 PST. You can watch it here

Tuesday, March 20, 2012

Announcing Dart Vector Math

The Dart Vector Math Library is for game or WebGL programmers needing 3D vector math in Dart.


Features


  • Almost 100% GLSL compatible syntax. This includes the awesome vector shuffle syntax, and flexible construction of vectors and matrices. Aside from constructor syntax (Dart requires the keyword new before the class name where GLSL does not), if you can’t copy and paste your GLSL source code and have it compile, you’ve found a bug, please report it.
  • A quaternion class. Quaternions are a must when dealing with rotations in 3D. This library offers them as well as functions to convert between quaternions and rotation matrices.
  • OpenGL camera projection and look at matrix utilities.
  • Dart makes operator overloading possible and Vector Math takes full advantage of it.
  • Fully documented.
  • Somewhat tested (quickly expanding on this).

Types



  • vec2
  • vec3
  • vec4
  • mat2x2
  • mat2x3
  • mat2x4
  • mat3x2
  • mat3x3
  • mat3x4
  • mat4x2
  • mat4x3
  • mat4x4
  • quat

Coming Soon



  • Planes
  • Axis Aligned Bounding Box (AABB)
  • Orthonormalization and inverse of square matrices
  • Fully tested

Documentation

Dart Vector Math is fully documented and uses dartdoc to generate HTML documentation. Check out the documentation in the git repository.

Download

Dart Vector Math is available from the following github repository:

https://github.com/johnmccutchan/DartVectorMath

Feature Requests / Bug Reports

File feature requests and bug reports here:

https://github.com/johnmccutchan/DartVectorMath/issues

License

Dart Vector Math is licensed under the ZLIB license.

Authors

John McCutchan <john@johnmccutchan.com>