When last we left off, I had an OpenGL kinematics simulator for my current OCD fulfillment object, the Novint Falcon. I'll get to the math behind all that in the next post, but for right now, suffice to say, it just works. The kinematics loops can easily hit 40000 iterations per second, which is more than enough for my first few basic projects, including mouse control. However, that really only mattered on windows, because when I finished the simulator, windows still had a 1000hz update rate, while linux and os x were still running at 70-150hz update rate on a good day with the wind going in the right direction. I spent last week trying to get linux and mac up to the same speed as windows. What follows is the story of that week.

But first off, let's rewind to August of last year. Before Novint put out their own SDK, I was bound and determined to get my own out, just because it seemed like a fun challenge. I managed to document out the firmware download process, but not completely replicate it. My first program worked by starting up one of their test programs to load firmware, then using my own code after that to communicate with the falcon. The hack ended up being successful, if ridiculously unsafe to look at in a normal workplace. The video of the results are NSFW:

Anyways, while doing this, I took some notes on the startup sequence for firmware loading. Here's a picture of those notes:

Comm Trace notes for the Novint Falcon

Now, fast forward back to last week. Time to make everything equally fast.

To start, how the falcon communicates. Using the test firmware that came with the disc in the falcon box (currently the only thing I have mapped, and I honestly don't even know if there's other firmware out there), it receives 16 bytes input, and upon receiving that, gives back 16 bytes as output (it's actually 18 bytes, due to ftdi modem status being tacked on the front, but those are negliable). That's it.

Let's look at the stats that test programs were putting out for each platform (thanks to people on the libnifalcon-devel list for helping out on this):

Linux:

Time between I/O Iterations: 0.015991s

Average Frequency: 62.534351 per second

Windows:

Time between I/O Iterations: 0.000942s

Average Frequency: 980.382853 per second

Yeah. Bit of a difference, eh? So something was causing I/O iterations on linux to take around 16ms each, while on windows, we were getting 1 iteration per millisecond, and achieving near the 1khz that's mentioned by Novint all the time. This is running on the EXACT same code, up to the point of the FTDI library being used. The Windows version was using the ftd2xx drivers, while linux was using libftdi. ftd2xx on linux seemed to run decently fast, though I don't have the number handy at the moment.

Now, I'd read a few things around the net about libftdi being ridiculously slow, mostly in terms of bitbanging though. Either way, I figured it was something in either libftdi or libusb that was causing the slowdown. Rebuilt everything with -gp and let gprof at it for a while, just to see that, nope, it was just sitting there waiting during the I/O loop, for 16ms at a time.

At this point, I start wondering if it's not the synchronousity of libusb-0.1 that's slowing me down. Luckily, libusb-1.0 is in development right now, which enable asynchronous transfers for usb. Pulled the dev branch of that, tried it out. Asynchronous sends, writes are superfast, reads... 16ms.

Damnit.

So, something in the read is taking 16ms. It's time to start playing with our read transfer and see what else we can change. First off, changing the read request size to 64 bytes, the maximum packet size for the endpoint.

Time between I/O Iterations: 0.000919s

Well, that made something happy. The problem there is, we're now sending 1000 input packets, and only getting back ~250 output packets. This means that there was something I was missing about sub-maximal packet sizes.

Much googling insues. No information found. Finally out of frustration, I just google "ftdi 64 bytes".

And I find the FTDI Addendum on Data Throughput, Latency, and Handshaking for the FT232 Series Chips (PDF)

There it is, clear as day. There's a latency timer on the chip that will send bytes to the host assuming one of three conditions:

  • A serial status line (DTR, RTS, etc...) is flipped
  • The buffer reaches maximum capacity (Thus our results with the 64 bytes
  • The latency timer overflows

We're obviously not playing with the first one. The second one we've seen the effects of, but we don't want to have to wait for 62 user bytes at a time.

What's the default starting value of the latency timer?

16ms.

If you scroll back up and look at my notes again, you'll see there's a line there that says "0x9 (latency timer?) 0x1". This was a control message sent over by the Falcon test program. 0x9 is the control message index, and 0x1 is the value. They set the latency timer to 1ms during the initialization stage. I totally skipped over that when transcribing the code for the initialization sequence, and it meant I was sitting on slow code for many, many months.

I added that single line of code to the libusb-0.1.12/libftdi based libnifalcon libraries.

Time between I/O Iterations: 0.002019s

Yyyaaaaaayyyyyyyy!!waitaminute. 2 ms?

And thus, the synchronous call issue comes back to bite me in the ass, except this time, it's actually a correct diagnosis.

You see, when you send a USB message, it's packed into a USB frame. Each USB frame can carry multiple messages, but at a rate of 1 frame per millisecond (we're at USB 2.0 fullspeed here). libusb-0.1.12 only packs one request per frame, so we have 1ms input, 1ms output. 2ms overall, locking our I/O loop to 500hz. DAMNIT.

So, the solution to this is to either figure out a way to get both input and output in the same frame, which may be possible with libusb-1.0. That remains to be seen, and that code is still very, very alpha. I'll keep working with it, though.

Anyways, this still leaves a couple of questions. First off, when I was checking the ftd2xx drivers for linux, I decided to check their symbol table for the dynamic library...

...

         U time




         U tolower




         U toupper




00013673 T usb_bulk_read




0001357d T usb_bulk_write




00017d30 B usb_busses




000132ff T usb_claim_interface




00014309 T usb_clear_halt




000120e6 T usb_close

...

HEY! Those are libusb calls! So ftd2xx is at least partially based on libusb. How they're managing the superfast I/O, I'm not sure. Could be threading, could be they've got their own asynch thing going on.

Secondly, I obviously didn't have the latency timer set on the ftd2xx version of the drivers, either. Why did my drivers run so fast on windows without that? I'm guessing there could be a config file I was missing somewhere, or maybe their drivers just do it themselves on connection or something.

Anyways, the moral of the story: Read the god damn spec sheet. And all the addendums. And pay attention to your own notes.


OpenGL Falcon Direct Kinematics Solver

I swear this makes more sense if you actually see it moving, but nonetheless, I now have a neat little DK simulator for the Novint Falcon, actually controlled by the Falcon itself. The buttons on the falcon can rotate/zoom the camera on the model. The triangle in back in the fixed frame (which I have NO idea on the size on, since the falcon is weirdly setup. That's partially why I created this simulation, so I could just change numbers until things seem right.), the three big white spheres are the knee positions, and the one out in front is the end effector origin.

Outside of the fact that the simulator can't reliably pick which solution it wants to use (meaning sometimes it thinks the end effector ends up behind the fixed frame, oops), and that it's not mirroring the exact wonky angles the Falcon axes come out at, it works! And all in under a day's work. Not bad for being a little rusty on my OGL.


So, in my eternal quest to understand every single god damn thing about the Novint Falcon before ever using it for anything interesting, I've now dug up a bunch of information on Direct/Inverse Kinematic analysis of DELTA style parallel robots. Of course, at the end of all that digging, I found one paper that explains it pretty thoroughly.

Descriptive Geometric Kinematic Analysis of Clavel's Delta Robot, P.J. Zsombor-Murray, McGill University

I'll be doing a "unwind the retroencabulatoresque wording" post later, but for those of you used to reading engineering/math papers, check out the "Rationale" section. It contains a really awesome abstract algebraic version of "go fuck yourself".

Anyways, this paper has one problem. The pictures at the end are... engineery. Very, very engineery. Unreadably so. So, I decided to spend a little time working in that whole Second Life thing that I spend all of my day job time on, except actually doing something creative in it.

Diagram of IK/DK Derivations for a Clavel DELTA Robot

And thus, my reworking of the IK/DK derivation, which is a hell of a lot easier to understand visually (basically: Expand everything out to constraint spheres, find the meeting points of 2 of those spheres, logic out which is the correct point of the meeting of the 3 to save yourself the unnecessary processing). This can be seen at http://slurl.com/secondlife/Hyperborea/160/45/23/

Oh yeah. And actually, two problems with the paper, now that I think about it. Bricks of BASIC are not helpful as code. Ever. And this is literal brick, too. Whitespace is for bitches.

For anyone else wondering about the Falcon; the paper above expresses the direct kinematics in terms of hip angles. However, the falcon's a little different. It gives you back encoder values from the motors, so you're stuck with a single integer value that relates to the extension distance of the thigh. This is due to one of the nice features of the Falcon, though. The bent extension thigh (versus usual static rotation-only bar thingy, or at least, that's my made up mechanical engineering term for it) gives you a smaller footprint for the workspace than usual haptics deltas would give you, which is important since this was made to go on a gamer's desk. It's probably a bit more rugged too, since those nice rotational sensors in $30,000 equipment probably weren't made to take cheeto dust either. So, good for you on that one, Novint.

Anyways, what that means is the encoder values refer to some point on a 4" arc that the knee traces on the hip constraint sphere, instead of an angle. You can either translate between those to get your angle back and plug into his code, or wait for me to just post my finished code here. The encoder value is probably a faster way to do this, too, since I'm betting you can avoid floating point math at some point with it.

This, of course, will all happen as soon as I decide to stop spending time making pretty pictures and posting about how proud I am of said pretty pictures.