Grant Olson

You Should Be Using Pogo Pin Receptacles with Pogo Pins

2023-09-08T00:00:00+00:00

There are many articles, videos, blog posts, etc, showing you how to use pogo pins to make programming and test fixtures for your electronics projects. But almost all are missing one critical component to make pogo pins work reliably: the Pogo Pin Receptacle. If you’re trying to build a fixture or jig that uses pogo pins, you should be mounting the receptacle part to your fixture, and not mounting the pogo pin directly.

For the generic pogo pins available on AliExpress, eBay, etc, sold with designations like P50, P75, and P100, there are corresponding parts R50, R75, and R100 that you will want to use. Unfortunately these can be harder to find than the pogo pins, but it’s well worth tracking the parts down and ordering.

It is a little easier to demonstrate how they work with a live video, so I made this one to explain how to dramatically improve both the ease of construction and the reliability of fixtures and jigs that use pogo pins to make contact with your PCBAs.

Elegoo Saturn 2 - Do I remove the "Please press firmly on the build plate when tightening the screws" film?

2023-05-25T00:00:00+00:00

TLDR: Do you remove the film that says “Please press firmly on the build plate when tightening the screws”? YES

I started up a new firmware job at PomSafe a few months ago. Our products are small and have some detailing that isn’t suitable for my normal FDM 3D printers. I decided to finally break down and get a resin printer, which I had been putting off because I wasn’t sure how messy and dangerous the resin would be.

I went with an Elegoo Saturn 2. When I first got it the instructions were good about indicating what plastic needed to be removed on the vat, but there was a piece of plastic on the build plate that said “Please press firmly on the build plate when tightening the screws”. I wasn’t sure if this was supposed to be there or not. I tried picking at it a little with my finger and thumb to see if it was supposed to come off. It didn’t budge and it looked like it had a precision fit so I left it on.

I had some success with printing: the Rook worked, the Cones of Calibration and Ameralabs test print worked. But I had strange issues where my custom STLs weren’t quite sticking to the build plate. I was able to get good prints if I, for example, used a raft and tilted the parts at 45 degrees, and set the early layer burns to 60 seconds, and all kinds of tweaks. To complicate things more, I’ve been trying to print out Orange Pi cases and consumer products, and a lot of the wisdom out there for these is aimed more at the tabletop miniatures crowd.

While trying different fixes I noticed that this film was starting peel off at the edges. So I got back to wondering if this film was supposed to come off even though it wasn’t documented anywhere. Infuriatingly I couldn’t find any pictures of the base of the build plate on Elegoo’s site to see what the installed plate was supposed to look like. Searching on the exact term eventually got me to a few pages on reddit forums where people were talking and saying the film should come off.

So I’m creating this page to hopefully create something definitive that will show high up in search results for others who have this problem. It’s a pretty simple fix, but I thought I should at least write up a story like those annoying recipes so that the search engines take it seriously!

Hope this helps another new Saturn 2 owner out there.

I Built a Robotic Aluminum Can Crusher

2022-11-29T00:00:00+00:00

I build a robotic aluminum can crusher from scratch, designing all of software and firmware components, doing 3D printing and laser cut aluminum, etc. Here is a video covring the entier project and some of my design philosophies.

Enjoy!

I also wrote detailed blog entries thorught the process:

All cad files, electronic files, source code, etc, referenced in this post series is available on my github page.

Can Crusher Part 5 - Mechanical Updates

2022-11-11T00:00:00+00:00

Now that my software is in good shape and I can easily test out the can crusher, I’m starting to see a lot of problems with the initial design. This is expected. I just dove in and built the first prototype without much analysis. It’s time to take another pass on the mechanical design and get things whipped in to shape.

Let’s take a look one problem at a time:

The Can Will Pop Out of the Device

The first problem was relatively simple. When the crushing element applied pressure to the can it could come flying out of the front of the device!

The fix: I made a tray that holds the can.

Acrylic Structural Elements Still Have Too Much Flex

The next problem was that the acrylic still flexed a lot when pressure was applied to the can to crush it. It was much, much more rigid than a 3D print would be, but it was flexing enough to cause problems with the lead screws.

The fix: I got the structural elements made out of 1/4 inch thick laser cut aircraft aluminum.

I’d never done this before but decided to give OSHCut a try. I’ve always felt guilty for not using their sister service OSH Park to get my PCBs made, but the bids always ended up being an order of magnitude higher than the Chinese PCB vendors. OSHCut worked great! They have a nice site where you can drop in a .step file and get an instant quote. This is great if you’re like me and don’t know what the hell you’re doing. Other companies wanted me to talk to an engineer without even a ballpark quote!

The price here was a little at the high end for a hobbiest project and will probably be out of some people’s budgets. For the frame top and bottom, and crusher element I probably needed a 9 inch by 9 inch piece of stock. They have a back-of-the-line price that means it takes a few weeks to get your parts, vs 2 or 3 times as much to start production the next day. Great for someone on a hobby budget. Still, at $126 for a single run, this will probably be the single most expensive part of my project. But at this point I’m committed to getting results.

That price would be frustrating if I ordered something and I got the dimensions wrong and it was unusable and needed to order again. If I want to do aluminum parts in the future I think I’ll probably still do a test run on the CNC with cast Acrylic first, and make sure it’s perfect before risking a bad run on the laser cut aluminum.

Side-by-side: 3D printed version, acrylic version, and final aluminum version. And the installed platform.

The Steppers Don’t Create Nearly Enough Crushing Force

After getting a solid platform that was didn’t flex, the next problem was that I didn’t have nearly enough crushing force. I expected things to be difficult, and anticipated having to add something the crushes the can sidewalls before applying pressure from the top. But even a can partially crushed by hand wasn’t getting anywhere.

One option to fix this would be to just buy bigger an bigger stepper motors until we had power. I’m currently using NEMA 17 sized motors. But that seemed a bit excessive.

After some research I learned about the different threading available on lead screws.

The More You Know…™

I was always confused that these lead screws were referred to as trapezoidal when they are clearly round. It turns out this refers to the shape of the thread. Rather than coming to a sharp point like a normal screw these are flattened for a good amount of the thread. 1 mm on my 2 mm threads. This creates a more robust mating with the nut that moves the platform up and down.

My normal 3D printer lead screws had a 2mm pitch, but they had 4 ‘starts’. This means there are 4 sets of threads instead of 1 like a normal screw. Each of the thread are intertwined like the stripes on a candy cane. (Not sure if that analogy is useful?) So a single rotation of the stepper motor causes the screws to move the platform up or down 8 mm instead of 2. They do however sell lead screws with 1,2, or sometimes even 3 ‘starts’. I was able to buy some screws with a single start, meaning a rotation now moves the platform 2 mm.

This should give 4 times the force from the same motors, with the downside being that the crusher element moves at 1/4 the speed. Installing these finally got me to my first crush of a can that was even close to acceptable! If I want to get even more power there are screws with a 1mm lead.

Note the image on the right. These rods have the same thread size, but the one on the left has a steeper angle. This means it has more starts and goes further per revolution.

The Crushing Platform Has Many Alignment Problems

The next problem was that the crushing power would often cause the screws to come out of alignment, and caused the crusher element to stop being level. With the 2 mm screws this would become so severe the system would bind. On the 8 mm screws I could power down the stepper motors and spin the screws manually to re-level the element. But with the 2 mm screws I had to actually remove the steppers from there holders to release enough stress to allow me to align things several times. That’s obviously not going to work.

One design element on my 3D printers started making a lot more sense. The higher end ones have straight rods of hardened steel that guide and align the parts. Bearings reduce friction. The screws then have a lot more leeway in terms of their positioning. They are moving the platform up and down, but they are no longer responsible for part alignment.

I added a straight rod on each side on the back of the structure. Then I 3D printed out some holders for some vertical bearings. This meant that I needed a new crusher element that didn’t interfere with the rods and had mounting holes for the bearing holders.

Which unfortunately meant another order from OSHCut a week after the first batch came in! After some test 3D prints and acrylic, I was back to OSHCut for a redux in aircraft aluminum.

The Lead Screws Have Alignment Problems

The last big problem was that it was difficult to get proper alignment of the lead screws. I imagine this was always a problem, but it was particularly obvious once the straight rods were helping keep the platform straight.

As I moved down the platform it would get harder to spin the lead screws. Just a little bit on most of the way down, but it got especially hard at the very bottom. This was a big problem because my stepper drivers use current values to determine if they have stalled out. The difference was enough that the value changed depending on whether I was at the bottom of the platform, in the middle or at the top, making it impossible to do stall detection or auto-home reliably.

I was getting in to an issue where we were dealing with tight tolerances. It seemed like I was < 0.5 mm from getting the lead screws straight. I could have kept moving things around just a little, printing, and testing the range of motion. That would be time-consuming and wouldn’t really fix the problem. There are lots of things that introduce tolerance errors:

My 3D prints only have 0.4 mm resolution. Parts my be microscopically off from a different 3D printer.
My professionally cut laser parts seem to have more accurate and slightly different sizing.
I’m not using precision assembly or measurement. Parts are held together with T-Slot screws. Having the left and right side servo holders vertical alignment off by 0.1 mm or less, might change the proper location of the motors.
I could get everything perfect, then drop the thing on a concrete floor, lose a small amount of alignment. and be back to printing new variations.

I needed a solution where I can manually square the device, rather than assuming all my parts are printed at perfect sizes, assembled in the same way, and aligned, to account for all these various factors.

I changed the mounting holes for the stepper motor in slots and came up with a procedure to square the machine. I would do an initial install of the steppers, keep the screws loose, then manually lower the crusher platform as far as it would go. This would move the steppers in to proper alignment and I could tighten them. Then I could do a few manual tests and make sure that I felt the same tension across the range of motion, and slowly lock the screw down.

That introduced the problem that the rod holders I placed on the top of the unit had a strict value that couldn’t be adjusted. And slots don’t make sense because they may need some play on both the X and Y axis’. Looking at my 3D printer for inspiration I noticed that it didn’t really even hold lead screws the lead screws in place at the top. The lead screw holders at the top of the lead screws have several millimeters of clearance and are there to just be there to catch the screws if something goes really, really, really wrong and the machine is falling apart. I decided to borrow that approach.

With all these fixes I was able to go back to running my test script to find appropriate stall current settings. Things were much more reliable. I produced the same values no matter where I was vertically on the device. And the results were reproducable. The downside is that I may need to occasionally re-square the device if it gets a lot of use.

And Back to the Rod Lead Size

Once my new batch of aluminum for the crusher plates arrived I installed it. I haven’t mentioned this before, but I currently have two can crushers. The first one lets me do a lot of rough testing and the second has a more stable configuration. One is the alpha and the other is the beta version of the product.

I upgraded my alpha crusher first. This one still had the fast-yet-less-powerful 8 mm lead screws. It turns out that with all the rigidity in place, and all the changes I made, these were actually much, much better at crushing cans than they were in earlier tests. The slower-more-powerful lead screws on the beta crusher had been extremely slow, maxing out at 8-9 mm per second, and very difficult to dial in the stall settings.

Based on that I decided to go back to the original 8 mm lead screws and use those as I move forward on the project.

Next Up

I now have a mostly working system but I’m running it via a USB cable on my computer manually. I originally expected to have a high level control computer talking to my low level Pico PCB. This would allow the high-level computer to handle the user interface and more complicated calculations. I’ve decided it’s time to take a first pass at that. My goal get to the point where I have a self-contained unit that can run at the press of a button, and not a CLI script. Then I’ll have my first full version of the entire stand-alone can crusher.

All cad files, electronic files, source code, etc, referenced in this post series is available on my github page.

Can Crusher Part 4 - Towards Production Software

2022-10-31T00:00:00+00:00

At this point I have a working dev board that makes it much easier to write code than it was on the breadboard. I’m able to deploy code faster and the can crusher is less precarious, but development is still slow going. I’m going to start working on higher quality firmware, focusing on a control interface that allows me to send commands over UART to the board. This will make it much easier to try things out as I develop the system.

There’s always a trade off between abstractions and system control. Python is a great language for development because I can write code and test very quickly. But it’s not such a good language for embedded systems because I lose a lot of predictability and speed that’s required for motor control, sensors, etc. For that you need a lower level language which slows development time.

The plan is to keep a smaller core of low-level code that’s easy to follow, and then add an interface so that I can user higher level code while testing various things out. When all is said and done, then I either have a pretty good reference implementation of the higher level activities to port to C, or I can just keep the higher level code running on another control board that isn’t as sensitive to real-time requirements.

While I’m working on the software I also want to see if I can take advantage of the PIO feature on Pico, but I’ll get in to those details later in the post. For now, the…

Command Interface

Basic Design

Send commands over UART.
Should be extremely simple language so I don’t waste time writing an elaborate parser in C.
Although I will be able to control directly from a terminal, it’s anticipated I’ll end up using some sort of middle-ware in python to build and send commands. The language doesn’t need to be pretty.
The interface should be able to report success or failure.
The interface be able to return results as data when needed.

Getting real. I’m doing this project for fun and to stretch my skills. I’m pretending to be each and every member of a team building out a full product.

If this was a real product, there’s a perfectly good motor control language called Gcode that probably covers all the features listed above and more. Even better there’s a great open source implementation called Marlin. It’s targeted at 3D printers but has been customized to run a lot of machines. And on top of that there are many control boards, some as cheap as $10, that can run Marlin.

Realistically, I should just be using both that hardware and software to control my device. It would dramatically simplify development time. But since this is a learning experience we’ll pretend none of that exists.

Control Interface

The goal here is to keep things as simple as possible. We won’t have anything like variables, conditionals, loops, turing-completeness, etc. We’ll just send a message and get something back. We will also make it very easy to parse.

At the core the format will be either COMMAND to do something or COMMAND ARG1 ARG2 if we want to send data. There may be a return value in the form of COMMAND: VALUE There will always be an indication of success or failure with either OK or ERR 132

An example communication session would be something like:

> WAKE
OK
> HOME
OK
> POSITION?
POSITION: 150
OK
> MOVE 50 10
ERR 109
> POSITION?
POSITION: 173.9
OK
> MOVE -125 20
OK
> SLEEP
OK

In this session I:

Wake up the device so the motors are powered. Stepper can run hot and shouldn’t be left on 24/7.
Auto-home the device so we know where the crushing platform is.
Check the current position, in millimeters, from the base after homing.
Move the platform up 50 mm at 10 mm per second, but we encounter an error indicating that the motors have stalled, meaning we tried to go higher than is possible and hit the ceiling.
Check to see how fare we really moved. 150 - 173.9 means I moved 23.9 mm before failing.
Move the platform down 20 millimeters at 10 mm per second, so I’m not touching the ceiling.
Go back to sleep to save power.

This is extremely simple to parse but I can still do everything I need to build up a complex system.

Property bag

The interface dramatically improves things. While testing it quickly becomes apparent that I still need to recompile to change default values. Now it’s time to throw all of those values in to a property bag.

I create an interface that allows you to get and set all the properties in a centralized location. It allows you to do this with either a C enum or the name of the property. Then I add a quick interface to the language PROP= PROPERTY_NAME PROPERTY_VALUE to set and PROP? PROPERTY_NAME to get.

I add this and move all the appropriate properties to this system and things are looking good. Saving this to the Pico’s flash so it survives reboots should be simple. I just need to:

Choose a region of flash.
Add some magic value to the beginning so I know if it’s been initialized yet.
Add a version number so the code can be smart when we add more values.
Save the raw values.

The Pico as usual has a great API and documentation on doing this but I ran in to a few problems.

Memory Offsets

First I didn’t realize that to read data you are supposed to read from an absolute address, and to write data you are supposed to write to a relative address on the external flash storage:

#define PROP_OFFSET (1024 * 1024 * 2) - FLASH_SECTOR_SIZE
#define PROP_ADDRESS ( XIP_BASE + PROP_OFFSET)

// Read access - USES ABSOLUTE ADDRESS
const uint8_t* flash_bytes = (const uint8_t *) PROP_ADDRESS;

// Write access - USES RELATIVE OFFSET
uint8_t data[FLASH_PAGE_SIZE] = {0xFF};
flash_range_program(PROP_OFFSET, data, FLASH_PAGE_SIZE);

Note in the last line you send the address of PROP_OFFSET, not PROP_ADDRESS to write to memory. I spent way too much time figuring this out. It makes sense though. A CPU will map various chunks of memory, be them ROM, RAM, etc, to certain offsets. But the chips that actually hold values are only internally aware of their local addresses. Internally everything will map starting at address 0, and externally the CPU will decide to route (for example) all addresses from 0x02000000 - 0x02010000 to that memory bank. Trying to write to the absolute address caused my program to crash hard, essentially segfaulting, which made difficult.

Flash programming is blow-fuse style

In general the Pi Pico has great documentation and example code. The best around. I would have saved myself a lot of time on the previous problem if I would have run the example code earlier. But here I ran in to a problem that was a little too subtle to show up in the docs I read.

If you’ve ever worked on a device with OTP (One-Time Programmable) Memory, you know that the name isn’t entirely accurate. All the bytes are filled with 0xFF and to change any bit to zero you blow the fuse. Once that’s done you can never set the fuse back to the 1 state; you’re stuck at 0 for eternity. You can take advantage of this feature to do some neat tricks. For example you can reserve a bank of memory for later use as long as you don’t write any values initially. It will just stay full of bytes of 0xff. Then you can have some marker in the main program that checks to see if that memory has been burned later. If it’s 0xFF run the normal program. If you write an upgrade, and blow the first bit so that it reads 0xFE, then run the program starting at memory address 0x2000, Blow the next bit so that it reads 0xFC then run the program at 0x4000, etc. Neat trick. The only problem is you can’t roll back to running the code at 0x2000 if there’s a problem with the new code.

Although not OTP, the flash memory on the Pico works the same way. This means you must run an erase operation before reprogramming a chunk of memory. If you don’t you end up munging numbers together. Most obviously, if you previously wrote 0x0 you will never be able to write anything to that memory address. More confusing if you don’t understand fuse blowing: If you have written 0xAA (binary 10101010) and then try to write 0xF0, you’ll end up with contents of 0xA0 as the various fuses are blown.

Moral of the story, always erase memory before re-programming on a Pi Pico.

Python testing framework

Now my control interface is getting sophisticated and I can do a lot. But I’m still annoyingly typing commands in to a terminal session with basic capabilities, it doesn’t like special keys, I can’t up-arrow to run the last command, etc. This still isn’t quite where I want to be to develop quickly. I need a higher level interface.

I’m able to whip one up in python. I can take advantage of some of python’s advanced hooks so I don’t need to update my code every time I add a command, and get to the point where I can run the can crusher through some sophisticated programs.

Now I can easily test how fast I can safely go up and down:

from serial_cli import *

cli = SerialCLI("/dev/ttyUSB0")

cli.home()
start_position = cli.position()

# See where our speed maxes out. incrementing speed
# by 10 mm per second each run.
for i in range(10,100,10):
  try:
    # Jog up and down
    cli.move(50, i)
    cli.move(-50, i)
  except SerialException as ex:
    print("Failed at speed %d with error %s" % (i, ex))
    current_position = cli.position()
    print("Moving to start position at safe speed.")
    mm_to_start = start_position - current_position
    cli.move(mm_to_start, 10)
    

Real World Example - Better tuning of crash detection

Let’s return to a problem that was difficult when I was exercising the stepper motor drivers a few weeks ago. I chose these stepper drivers because they have stall-detection. This allows me to detect when the crusher platform hits either the top or bottom of the structure, as well as when it hits a can. This is done by setting a value for the current drawn per step. If our current draw drops below that value then the system decides the motor can’t advance.

Unfortunately the actual value is a bit of a magical number. It’s a current draw value, but because these drivers can be used with a variety of motors with different specifications that are used for a variety of purposes, there is no one-size-fits-all way to determine what the value should be. The datasheet isn’t able to include any formulas, for example to have torque threshold of X, use formula Y. The correct values must be obtained experimentally and tuned for your particular application.

Early on when I was writing unabstracted C without a control language this was extremely time consuming and frustrating. I would need to:

Set a test value.
Recompile.
Reset the Pico and deploy code.
Have some sort of test action that hits the limit of motion.
Hope the motors don’t keep running forever when they encounter resistance.

I did manage to find a value for movement that worked but surely wasn’t ideal. I also ran in to problems because the value seems to change as we change the speed, so really I will need to come up with some sort of function to calculate something along the lines of at speed X mm per second, use value Y. Additionally, if this was a real product, we would probably want some sort of field calibration in case the unit gets knocked around or performance changes with age.

I was able to use python to write a much better system to tune the numbers than throwing virtual darts in code. It works by:

Setting the motors to a given speed.
Maxing out the threshold so the motor will stall.
Trying to move the platform up and down 10 mm.
Lowering the threshold value until the platform can complete the range of motion.

This is much, much better, but its still slow. It can run up to 256 times while narrowing in on the value. So I added code to find approximate ranges and narrow in on them. First in groups of 64, then 16, then 1, to speed things up. For example:

Test 255, stall. Test 191, stall. Test 127, PASS.
Test 192, stall. Test 176, stall. Test 160, PASS.
Test 176, stall. Test 175, stall. Test 174, stall. Test 173, stall. Test 172 PASS.

I can run the test 5 times, come up with an average, and add in a little bit of a leeway (say 5%) and use that as the value. On top of it I can run the code over a series of speeds quickly, say from 5 mm per second to 30, in intervals of 5 mm per second.

Now I could write all of this in C, but it would be time consuming. In fact I might really want it in C later for field calibration. But at this point I’m still not sure that this is the ideal algorithm and what other problems I’ll encounter. It’s really nice to test quickly, lock down the procedure, then either leave as is or re-write in C.

I’m able to write a quick python script to do this in less than 15 minutes and less than 100 lines of code:

#!/usr/bin/env python

from serial_cli import *
import statistics
from time import sleep
import sys

cli = SerialCLI("/dev/ttyUSB0")

def retry_wake():
  try:
    cli.wake()
  except SerialException:
    cli.wake()

def narrow_sg_range(bad, good, step, speed):
  sys.stdout.write("Trying ")
  for i in range(bad, good-1, step):
    cli.set_prop("STALLGUARD_THRESHOLD", i)
    cli.sleep()
    sleep(1)
    sys.stdout.write("%i. " % i)
    sys.stdout.flush()
    retry_wake()
    try:
      cli.move(-10, speed)
      cli.move(10, speed)
      print()
      return (i - step, i)
    except SerialException as ex:
      pass # Later check for stall
  print()
  print("Failed to find inflection point!")
  return (0,0)


def find_range_once(speed):
  bad, good = 255 , 0
  bad, good = narrow_sg_range(bad, good, -64, speed)
  if bad == 0: return 0
  bad, good = narrow_sg_range(bad, good, -16, speed)
  if bad == 0: return 0
#  bad, good = narrow_sg_range(bad, good,-8, speed)
#  if bad == 0: return 0
  last_good = good
  bad, good = narrow_sg_range(bad, good, -1, speed)

  # If we didn't get it on 1 we might be on the very edge of stall
  # detection. Try again along that range.
  
  if bad == 0:
    bad, good = narrow_sg_range(last_good+3, last_good-3, -1, speed)
  print("SPEED: %d BAD %d, GOOD %d" % (speed, bad, good))
  return good
  

def find_range(speed):
  results = [find_range_once(speed) for x in range(0,5)]
  print("RAW RESULTS: %s" % repr(results))
  results = [x for x in results if x != 0]
  if len(results) < 3:
    raise RuntimeError("BAD DATA POINTS!")
  average = statistics.mean(results)
  print("AVERAGE: %f" % average)
  safe_average = average * 0.95 # give an extra 5%
  safe_average = int(safe_average)
  print("FINAL: %d" % safe_average)

values = []

# 5 - 84
# 7 - 109
# 10 - 132
# 12 - 144
# 15 - 157
# 17 - 158
# 20 - 171
# 22 - 151
# 25 - 135 ?
# 30 - ???

for i in range(28,36,5):
  cli.set_prop("STALLGUARD_THRESHOLD", 171)
  cli.sleep()
  cli.wake()
  cli.move(10,20)
  cli.move(-10,20)

  res = find_range(i)
  values.append( (i,res) )

  ten_speed = 131 #values[0][1]
  cli.set_prop("STALLGUARD_THRESHOLD", ten_speed)
  cli.sleep()
  cli.wake()
  cli.home()
  sleep(1.0)
  cli.move(50, 10)
  
print(repr(values))

Uncovered problems

Now that my testing is much more systematic and less ad-hoc I identify a few problems:

The stall value seems to change as I get lower and lower on the platform. This makes me think that the threaded rods aren’t properly aligned and are at slight angles that I can’t see. I’ll need to investigate and redesign the holders.
We can’t travel nearly as fast as I expect. I suspect it’s because at higher speeds we need to accelerate, and my current algorithm is either on-at-full-speed or off-at-zero. The TMC2209 datasheet does indeed indicate that to move swiftly you need some acceleration algorithm, and this is up to you to write.

PIO Stepper Control

Since I’m in software mode and still have a few days set aside in my make-believe sprint, I move on to another feature I wanted to get working on the Pico. One of the major reasons I wanted to use a Pi Pico was to get an opportunity to play with the Programmable I/O (PIO).

General High Level PIO Justifications

The RP2040 has two dedicated sub-processors that are optimized for dealing with input and output. They have a very small footprint, memory, and set of assembly instructions, and are very specialized. But the advantage is that they run completely independently of the main CPU, and each instruction takes exactly 1 clock cycle, so the execution time is extremely fast and predictable.

That’s the high level explanation that is given by the Pi Foundation. After working through the datasheet explanation and SDK examples, it becomes apparent that these processors are extremely optimized to turn bytes in traditional memory in to signals on GPIO lines, and vice-versa. I think that’s the best way to think about how to take advantage of them. How do I turn bytes in to signals, and 1 or 2 signal lines in to bytes?

One SDK example is UART control. I think this is a really good one. If you’ve played around with GPIO pins you’ve likely played around with few standard interfaces like SPI or I2C that are easy to ‘bit-bash’. They don’t have really tight timing requirements and since you control the clock you can just flip things up and down to make things happen:

def send_bit(bit):
    bit_pin.set(bit)
    clock_pin.set(1) # force it high
    clock_pin.set(0) # force it low.

But UART is actually extremely timing sensitive. The first clock signal of a byte tells you what the clock frequency is, and you need to be there to pick up the data on that exact timing. Similarly you need to send data with very exact timing, which is difficult to do even in a low level language like C.

Another SDK example is WS2812 LED light strips. These are connected in serial and you need to send an extremely specific set of highs and lows to to set possibly hundreds of lights to the correct color. The exact algorithm is:

Send high signal followed by low.
To send 0, go high for 0.35 uSec and low for 0.8 uSec.
To send 1, go high for 0.7 uSec and low for 0.6 uSec.
All signals expected to have +/- 150ns accuracy.
Repeat 100s or thousands of times to set whole light strand.

We’re certainly not easily bit-bashing that! I’ve actually tried to do this for a single WS2812 LED on an under-powered 16 Mhz processor using NOP commands to get the timing exactly right, and it was just plain impossible to get accurate timing. But since you can set and independent frequency for your PIO controller, and calculate the exact time it takes for each instruction to execute, since it’s one clock cycle, it’s really easy to get that timing dialed in.

But that’s enough with the SDK examples.

My PIO based stepper clock signal

What I want to do is drive a square wave generator to spin the stepper motors. Depending on both the speed in mm per second I want to go, and the mm per step, I can calculate an exact clock frequency. I can also immediately detect a stall because we have an instruction JMP PIN that will immediately respond to a pin going high in the code.

I have 4 registers to work with:

OSR - Output shift register - Send data from normal memory to PIO.
ISR - Input shift register - Send data from PIO to normal memory.
X - scratch register
Y - scratch register

This isn’t much but it’ll do.

I can send two bytes to the PIO:

Number of steps.
Number of clock cycles to wait to achieve the proper frequency.

This is a little different than the PIO wants. Remember I said it’s optimized to turn bytes directly in to GPIO, and GPIO directly in to bytes. Here I’m sending intermediate values. But I am able to work within the confines of the minimal provided assembly language to get what I want.

Things are also a little complicated because I didn’t think to put the step pins from the left and right motors next to each other. The PIO can deal with up to 4 pins, but wants them to be sequential. Luckily the language includes a ‘side pin’ feature for cases like this.

I also have a different pin for stall detection, but the ‘jump pin’ is also treated as a different bank of pins. It is a problem that I can only test one pin so I’ll need to fix that in hardware with an OR gate later, so either the left or right motor stalling will abort the code. For now I’ll just pick one.

The basic algorithm:

C program pushes number of steps and pre-calculated timing.
PIO waits until it receives both.
PIO goes in to a loop making a square wave, waiting pre-calculated number of instructions after both setting pin High and then low.
PIO does another loop consuming number of steps.
If the stall detection crashes we exit both loops.
PIO sends the remaining number of steps (-1 if done, X if stalled) so the C program knows how far we actually moved, and if we completed the requested movement safely.

Here’s a quick listing of the code. You’ll need some familiarity with assembler to follow along. Here are some specific PIO instructions to help you along:

pull block grabs data that the main program put in to the OSR, waiting for the data.
out y, 32 copies 32 bits from the OSR to the scratch y register.
pull noblock an important hack. Try to grab data for the OSR, but if it’s not there use whatever is in the X register. This effectively allows me to save the X register for reuse later.
set pins ... update GPIO pins with values, optionally using the ‘side pins’ I needed due to my pin assignment.
jmp x-- lp1 Decrement the register, jump UNLESS register was 0 then fall through.
jmp pin lp0 Jump only if the jump pin has transitioned from low to high, else fall through.

;
; Drive stepper motors with PIO so the clocks are consistent and on time.
;
; Push in a number or steps to take, and the number of cycles to burn
; to get the correct frequency, then pull out the number of remaining steps
; so we can see if we stalled.

.program step_both
.side_set 1 opt

.wrap_target
    pull block             ; Get number of steps
    out y, 32
    pull block             ; Get clock cycles to burn to obtain correct frequency

lp0:
    out x, 32              ; save to x
    pull noblock           ; copy x back in to OSR to use each loop
    set pins, 1 side 0x1   ; Clock ON
lp1:
    jmp x-- lp1            ; Delay for (x + 1) cycles, x is a 32 bit number
    out x, 32              ; grab saved copy of burns
    pull noblock           ; copy x back to osr
    set pins, 0 side 0x0   ; Clock OFF
lp2:
    jmp x-- lp2            ; Delay for the same number of cycles again
    jmp pin lp3		   ; Abort if we report stall
    jmp y-- lp0            ; count as one full cycle
lp3:

    mov isr, y             ; Move remaining cycles in to isr
    push block		   ; Send off to main program
.wrap                      ; Wait for next set of instructions

Then all we need to kick things off in C:

uint32_t step_clocks_for_frequency(uint frequency) {
  uint32_t clocks = (clock_get_hz(clk_sys) / (2 * frequency)) - 11; // 11 to account for control clock cycles
  return clocks;
}

void step_x_times(PIO pio, int sm, uint steps, uint frequency) {
  pio->txf[sm] = steps;
  pio->txf[sm] = step_clocks_for_frequency(frequency);

}

And to get the result to determine if we stalled:

int32_t remaining_ticks = pio_sm_get_blocking(pio, sm);

Now we have an extremely well timed square wave that will look perfect on an oscilloscope! Better than anything we could just do directly in C. For now we block waiting for results. In the future we can run some minimal housekeeping code in the foreground while waiting for results.

Next Steps

Now I’m finally at the point where I can easily explore the can crusher as a full unit. I can test the entire system quickly and see what’s going on. As expected there are several problems with the initial design. There always are. The biggest problems are:

There seem to be alignment problems as the motors stall more easily as the platform gets closer to the steppers.
We’re not getting nearly enough power to crush cans easily.
When the system is stressed the motors can get out of sync, and move each side of the crushing platform far enough apart that things seize and I can’t manually reset the plate without disassembly.

It seems like all of the problems are in the mechanical design. I’ll focus on upgrades with that next.

All cad files, electronic files, source code, etc, referenced in this post series is available on my github page.

Can Crusher Part 3 - Development PCBA

2022-10-18T00:00:00+00:00

At this point I’ve built out the frame for my can crusher, and I wrote a proof of concept stepper motor control program with a Raspberry Pi Pico and TMC2209 stepper driver boards. But the board was getting very delicate and fragile and didn’t have all the features to make it easy to develop better software and do testing. I decided it was time to do round one of a proper PCBA to make it easier to move forward.

If I was actually building this for a company, there would be a good chance the hardware person would pass off the development to a firmware person, mechanical people would want test units, etc. A PCBA makes that a lot easier for all parties involved. In this case it’s just me, but it still makes my life much easier.

I’ll use KiCad for all the PCBA work.

Basic Design

I want a PCBA that could fit either in the base or top of the machine. That limits the size. In addition to that I want it to:

Have a plug for the 12 Volt power supply that works with a normal power adapter instead of a bench supply.
Have an easy way to reboot. I’m currently unplugging and plugging the USB cable to reboot and that is annoying.
Optional Pico power supply pins to run without USB cable.
Explicit enable of 12 volt power to the stepper subystem.
Provide a UART control connection.
Provide a very simple user interface with feedback.
Expose any unused GPIO pins for additional features, such as a can sensor.
Mostly use through hole parts.

Soft Reboot

This is easy enough. You just need to pull the RUN pin to ground. I added a button that can be pressed to reboot.

12 Volt enable

The stepper controllers and motors draw a lot of current and can generate a lot of heat. I don’t want them doing that continuously if I accidentally leave the crusher plugged in while going away for the weekend.

I’ll build out a MOSFET switch that defaults to keeping the 12 volt supply OFF. It can only be explicitly enabled when a properly running program on the Pico does so. Rebooting, deploying bad code, sitting in bootloader mode, etc, should default to turning the power off.

UART Control

I’m testing a lot of different things at this point. I want to test the hardware, software, and the actual functionality of crushing a can. For the latter I thought it would be nice to have a mini language where send over a UART connection with things like UP 20mm and HOME. It can also return status updates like the current position of the crushing platform, and if the motors have detected stalls. This will let me test various crushing algorithms without having to recompile code.

It also provides the opportunity to provide an advanced control computer later that can use the same control port to interact with that system. I could have a Single Board Computer with a fancy display, touch screen, bluetooth access, etc. Separating those functions out from the Real-Time functions of controlling the motors is also good system design.

Primitive UI

Once again I want help testing things out without having to either:

Recompile.
Or read log entries to get feedback.

I’ve added a very simple UI consisting of an RGB led and a single button. I can try to crush a can when the button is pressed. It can show a green light if things are good and red if the motors have stalled. Pressing the button again could cause the platform to lift itself back to the original position.

Nothing too sophisticated, and I might not even use it, but might as well throw it on since it’s cheap and I have GPIO pins to spare.

Expansion Port

Here I’ll just add a header with access to every unused GPIO pin, ground, and 3.3 volt power supply. This will allow me to add one or two more devices without having to make a new board. I still haven’t decided how the machine knows a can has been inserted.

Through hole parts

Lets keep things easy to assemble, and more importantly make it easier to modify a board if needed. I can handle the bigger SMT parts just fine on assembly, but often the first draft of a board will have one or two minor problems. Nice big holes on the board will make it easier to jury rig fixes.

Basic non-design

For now this is simple. In a final product I’ll want:

A buck converter to take 12 Volts to 4-5 volts to power the Pico directly. But I don’t know what chip I want to use so I’ll hold off.
A reverse polarity protection diode. This protects the board With a mis-wired power supply.
Capacitors, capacitors, capacitors? We should problably have a decoupling capacitor on the power supply. I’ve noticed 3d printer boards that take these stepper boards all have pretty big decpoupling capacitors as well. I need to review datasheets, do math, look at best practices etc, to pick the right ones.
A hard power switch.

I’ll deal with all that later.

From Idea to PCB

Getting a physical PCB made is surprisingly easy and affordable. There are manufacturers that get as low as $2-3 dollars for multiple copies of a simple circuit board. The shipping often costs more than the actual boards if you’re trying to get them quickly. If you’re a hobbyist and can wait a few weeks, there’s always the slow boat from China Post.

To get a board I need to:

Design a schematic of the circuits you want.
Export the netlist from the schematic in to the PCB editor.
Lay out the parts in the PCB editor.
Export ‘gerber’ files which tell machines how to do the board layout.
Upload the gerber files to a manufacturers site, get a quote, pay.
Wait for delivery.

The Schematic

I’ve whipped together some quick schematics before to get some simple boards built. Now that I’ve been professionally working with hardware for a bit, I’ve come to appreciate that the schematic isn’t just a prerequisite for PCB generation. It is the primary source of documentation of the hardware. If there’s an equivalent to source code for PCBs, as far as I’m concerned that’s the schematic.

“Programs must be written for people to read, and only incidentally for machines to execute.” Harold Abelson

And just like software, I’ve seen schematics that the author believed were self-documenting and intuitive when in actuality it wasn’t easy to understand the logic, what a sub-component was trying to accomplish, and why some little tricks were performed.

Although I feel this is a very simple board I went out of my way to try to focus on making the layout of the schematic clean and neat. I also used plenty of gulp words to explain functionality, even though that’s ultimately totally irrelevant to laying out the PCB.

In general things went smoothly with only a few complications.

Schematic Complication 1 - Missing symbols and footprints.

KiCad has a very large set of symbols for all sorts of resistors, capacitors, transistors, and chips. But it of course can’t have every part in existence. It was missing footprints for the Pi Pico (because it’s so new) and the BigTreeTech TMC2209 board (since it is a custom board). I needed to deal with that.

Luckily the Pi Pico is popular enough all the files I needed were available on github. But is still wasn’t quite perfect so I had to fork the code. My version is here.

There wasn’t even a starter for the BigTreeTech boards. I had never built out a symbol and footprint from scratch before so I was a little intimidated. Luckily it was pretty painless to do this. KiCad was set up to make it very easy to deal with any sort of chip or board that has a moderately normal configuration, such as the 2.54mm headers that the BigTreeTech board had.

Schematic Complication 2 - 12 Volt Enable Circuit

I wanted a circuit to explicitly enable the 12 volt power that gets to the stepper drivers and eventually the motors. As I mentioned above, I wanted it to be a safe switch, one where undefined or unexpected conditions caused it to not power the unit.

To complicate matters the stepper driver boards have two source of power coming in:

The variable higher-voltage high-current motor supply.
VDD for the logic levels.

Because of this I didn’t want to put the switch in the normal position (between the the boards and ground) because then there could be times where the chip was getting 3.3 volt VDD but wasn’t properly grounded. Instead I put the switch in front of the load (between +12 volts and the boards).

This meant using a P-Channel Mosfet. And since the Pico runs at 3.3 volts, which is a little too low to trigger a power mosfet, I needed to add another N-Channel mosfet switch. The basic design is:

The Pico sets a GPIO pin to desired state.
This activates a BS170 N-Channel mosfet which is happy with 3.3 volts.
This in turn opens up an open drain circuit, dropping the voltage from 12 volts via a pull-up to ground.
This opens up the P-Channel mosfet and the full current flows all the way to the stepper motors.

To make sure I got things right I did a mini test of this circuit only on a breadboard, two mosfets, one resistor, and a jumper wire.

Schematic Results

Here’s the final schematic. Keep in mind that once I was in the PCB layout phase the process of PCB and schematic design was iterative. I’d realize I needed to change a pin layout and would go back to the schematic and edit and re-import to the PCB editor. This schematic isn’t the rough draft, it’s the first draft. For example, I inverted the Right Z-Axis Stepper Control schematic symbol to accommodate a good PCB layout and to keep my schematic clean.

PCB Layout

General PCB Layout Goals

In general there are a few goals any time you lay out a circuit board. In no particular order:

Minimizing PCB Layers. Traces connecting components can’t touch. They can’t cross paths. Eventually you get painted in to a corner and can’t connect two parts because there are traces in the way. The fix for this is to move the trace to a different PCB layer. The most obvious example is moving the trace from the top of the board to the bottom. But eventually that bottom layer might get filled up and you need to move to a 4 layer PCB design (or more) which increases production cost and board complexity. Based on the simplicity of my design I felt confident I could stick with a two layer board.

Organization of electronic components. If a datasheet for a chip recommends a capacitor on the power line you’ll want that close to the actual chip, not on the other side. If you have a differential pair of traces, like the D+ and D- signals of USB, should be next to each other. There may also be requirements to make traces between two components as short as possible. All these things need to be taken in to consideration when laying out the board.

Organization of other components. The power plug should be on the edge of the board. If you have light that shows that the unit is on, or a USB connector to program, those parts will also need to be on the edge.

Laying Out the Can Crusher Board

When you start laying out the PCB it creates a rats-nest of parts. The program just drops them on the board with little lines indicating where parts need to be connected, but they’re a mess and they overlap. It’s up to you to figure out where to go from there.

The first issue that became apparent was that the pins I used to connect my Pi Pico to the TMC2209 boards on my breadboard were not ideal for the PCB. There were way too many crossed traces. I suspected this was going to be a problem on early drafts of the schematic, but decided not to sort it out until I could visualize the layout. Once I saw the parts and could position them on the board, it was easy to reassign the pins on my schematic and get a much cleaner bard layout.

The second issue was that I’ve never run power through a PCB before. KiCad has great defaults for microcontroller projects, but each stepper motor I have can draw up to 1.5 amps. That’s up to 3 amps total coming from the power supply. So what’s the problem?

A trace is just a flat wire. Just like wire, the size is only rated to carry a certain amount of power. If you go over that limit for too much for too long, the wire will heat up, burn up, or even melt! I found a reference chart and it recommended 0.76mm traces for 2 amps and 1.25mm traces for 3 amps. I increased the appropriate wire sizes. Then I had to reroute because the thickening the existing wires caused them to touch other wires and parts.

ProTip™: In the past I got very annoyed when I needed to move a trace in the PCB editor. KiCad treats each line segment as its own trace. I would always have to delete 5 line segments and would always miss the very small sub-trace that made the final connection to the component pad.

This popped up again when I was increasing trace sizes; selecting a trace, right clicking properties, etc, was annoying. After some google-fu I learned that I could select a sub-trace and then hit the u key a few times to expand the selection to include the rest of the traces in the network.

The last issue was that I wanted to take advantage of the silkscreen to make the board self documenting. Explaining which pins on the power header were positive and negative, explaining what GPIO pins we hooked in to on the expansion header, etc. It turned out KiCad has a pretty nice solution to this. If you choose Edit Footprint on an individual component in the PCB Editor if modifies that part only. You don’t need to edit the original footprint, or create a new official footprint in your library, just to have the right labels on an eight pin header.

In spite of all the issues it wasn’t too difficult to get things laid out, pass the DRC checks, and get a final version of revision 1 of the board. I was proud that I only had one trace that needed to jump from one side of the board to the other to avoid hitting other traces.

Ordering the PCB

I ran a simple export of files from KiCad and sent them off to the vendor Wednesday afternoon EDT time. About 3-4 A.M. in China. They were able to manufacture the boards on Thursday and Friday their time, get things sent off to DHL, and somehow amazingly I received the boards on Monday by noon. Less than a week turnaround. Five boards total. Price of $3 for the boards and $19.05 was for the expedited shipping. It’s a great day and age to be a maker!

The unpopulated boards:

Assembly

Assembly was straightforward. All the components were through hole. The footprint for the Pi Pico was nice. It was set up to use either headers, or solder directly to the board. For my test unit I added headers so I can swap the boards. This also allowed me to test the 12 Volt Power circuit one last time without inserting either the Pico or the TMC2209 boards.

Once testing was done it was easy to plug in the final components and start using the board.

Next Steps

The new board is working well and makes it much easier to push new versions of the firmware. However, it’s still pretty slow when I’m trying to exercise the range of motion, test sensorless homing, etc. My hard-coded scripts of action are too basic and then when I hit the base or the motors lose sync, I need to write a new hard-coded script to fix it.

To make things easier I’ll focus on adding a mini control language accessible via the UART port. Then I can test the actual movement without having to hard code a sequence of events in code base. This should make it easier for me to focus on algorithms for running the motors, find the right sensitivity settings for sensorless stalling, etc.

All cad files, electronic files, source code, etc, referenced in this post series is available on my github page.

Can Crusher Part 2 - Stepper Drivers and Controls

2022-10-09T00:00:00+00:00

Now that I’ve built a frame for my can crusher, the next big task is to write out the drivers to control the stepper motors and test things out. At this point everything will be on a breadboard. The goal is to make sure I know how the driver boards work, and test out the motors. It is not the production-level implementation yet.

Basic Design

Two independent stepper motors are on each side of the unit.
On boot the device should home by going down until the base of the unit is identified by both stepper motors.
From there move up X mm to make space for a can.
When can is detected, lower and crush the can.
There will probably be additional logic when we first touch the can and feel resistance.

Motor Controller boards

I decided to use some TMC2209 driver boards manufactured by BigTreeTech. These are built for 3D printers and CNC machines and have a configuration that allows you to plug them in to the existing control boards for these machines. I’m going to use these driver boards but create my main control board from scratch.

The TMC2209 was chosen because it has built in stall-detection, and I’m hoping to get auto-homing, like you see on higher end 3D printers like Prusas. I’ll detect the end of the range of motion when stall detection kicks in.

The cheap solution is to install limit switches that are just some bits of ribbon wire that gets physically pressed to complete a circuit. These are okay, but I find them annoying because they usually act as a proxy for the actual limit, aren’t so accurate, and can move a little bit over time. The stall detection approach is much more elegant.

I’m also anticipating a stall when the plate initially hits the can so if that’s the case I’ll also be able to detect when we are held up on a can and not actually hitting the base of the unit.

Initial Verification

I wanted to test out the boards before hooking up to my Pi Pico. The simplest possible working configuration should be:

A stepper motor is hooked up.
Power is applied to both the control portion and the motor portion, probably at different voltages.
A variety of pins need to be set to either GND or VCC to enable desired behavior.
A steady clock signal should be applied to the STEP pin and we should see the motor move.

This is a good example where accumulating gear over the years helps out. In my earlier days I would have skipped this step and gone straight to hooking up to my Pico (or Arduino, etc) and would bang my head against the wall until things worked. I also would have done something silly to deal with the fact that the motors have different power requirements than the logic portion of the board, like connecting a nine-volt battery. And then when things didn’t work I would never know if the problem was on the software side or hardware side. There would be too many variables to make debugging easy.

With a bench power supply and a signal generator I was able to create an initial test rig that didn’t require any microprocessor or any coding. I sent 5 volts to the VCC, 9 volts to the VM (Voltage Motor) pin, and did some quick math to determine that I wanted my signal generator to send out a square wave at 1600 Hz to rotate the motor at one Revolution Per Second. (Test motor specs indicate 1.8 degrees rotation per step, 200 steps for a full circle, and the TMC2209 defaults to 8 sub-steps, so 200 * 8 = 1600)

Annoyingly the BigTreeTech boards have a pin configuration that won’t let you plug them directly in to a breadboard without shorting 3 pins. Luckily the pins in question are also accessible from top side so I was able to cut off two pins on the bottom side:

Even with the simple test setup, I managed to fry a board (two if I’m honest) before getting the proper configuration. One thing that complicates these stepper boards is that they motor side of the chip can want to draw 1 or 2 amps or power, at a higher voltage than the logic side supports. A problem with the wiring can send way too much voltage and current to the logic side, frying it. Normally I would avoid this by keeping the current limits on my bench power supplies low, but in this case high current is required to spin the motors.

In the end my test setup worked and the rotational speed looked good to the eye.

Raw UART Control

The TCM2209 chips have a an unusual UART setup for use of more advanced features. It uses a single wire shared among the TX and RX lines, and all communication is accomplished by getting or setting register values. That makes the operation very similar to an I2C device, but instead of SDA and SCL we have a bidirectional TX/RX pin.

I was proud of myself for doing a good test setup on the basic stepper control and decided to do something similar with the UART. I set up the chip on my breadboard and interfaced it with a generic FTDI UART controller. In this configuration I didn’t even hook up a motor or the Motor Voltage. I decided there was no point in having all that current risking damage when I didn’t even have motors plugged in.

That turned out to be a mistake! I wasted a lot of time with the device infuriatingly not responding at all. Combing over the datasheet again and again, I decided my problem was the order in which either the bytes or the bits were set, LSB vs MSB. I even broke out my DSO Labs logic analyzer and tried to read the signals. And after exhausting all possibilities I decided to try one last thing, and added back in the Motor Voltage power from my external power supply. And things suddenly worked as expected! Reviewing the datasheet it looks like there is a 5 volt regulator on that side that powers some of the internals, and the VM power doesn’t simply feed directly to the motors and nothing else as I thought.

Next up was sorting out how to read and write all the values. I did make a stupid bit order mistake here. Here is the datasheet entry:

It clearly the bit order, but my mind still interpreted the picture incorrectly. I also think because I kept comparing this UART protocol to a poor-man’s I2C interface where the read/write bit is indeed the Least Significant Bit, it added to my confusion. I decided that I should send the read address with (addr << 1) and the write address with (addr << 1) + 1. Looking at the middle entry this seemed correct reading left-to-right, but looking at the section listing with bits, this is clearly wrong. The correct read address is just addr and the correct write address is addr + 128 to set the HIGH bit to 1.

Stall Detection

With the UART enabled I was ready to tackle stall detection. This took a lot of trial and error to get right. One thing that’s annoying about the datasheets that come with many chips, is that they have 100’s of pages, and are very exhaustive, but they still don’t tell you how to do the things you actually want to do.

In this case I want a stall to throw the DIAG pin high so I can catch it. I was left with this chart:

And it seemed like I needed to just set 0x40 to an appropriate value and it would magically work. I tried high values, I tried low values, I tried middle values, still nothing. After some googling I found some working reference code. I learned I also needed to set the first register in the section (0x14) to enable all the StallGuard capabilities. My reading of the section made it sound like you only needed to set 0x14 to disable functionality in some cases, but you also need to set it to enable it in other cases.

In any case I was able to get working stall detection with some randomly picked values for both registers. Once I’m further along I’ll go back to the datasheet and try to calculate some smarter values for those registers.

UART Mode - Device ID assignment

The UART mode does support having up to 4 TMC2209 chips on the same bus. In theory you set each one to a unique device ID that is included in the register requests. Unfortunately this chip reuses the same two pins that are used to determine the amount of sub-steps that the driver provides.

If you want two motors to have the same stepping speed on the same UART bus, you need to either:

Add in some sort of external switching network to activate and deactivate connections to the UART pins. Or,
Make sure the motors aren’t enabled and getting step signals, change the pin states on each to give them different addresses, send the appropriate commands, then restore the old state back to the desired step level.

I went with option two which meant I needed to use 4 more GPIO pins on the Pi Pico. That took me to a total of 12 GPIO pins just for these two chips GPIO requirements, and 2 more for the UART connection.

Pi Pico software

I started running my tests on a Pi Pico somewhere in the middle of testing out UART. I just wrote simple test code to exercise all of the functionality. I was able to hook up two steppers with lead screws and do some basic exercising of the motors and stall sensing.

Currently the code sets up the stall detection and then provides a function to move up or down X number or mm at a rate of Y mm per second. It’s good enough to write proof of concept homing/leveling code and test movement.

Next steps

I’ve gotten as far as a Pico program that provides enough control to test. But my test setup is getting really messy.

There are many problems with my current breadboard:

My breadboard is getting to be a real mess.
Having a bench power supply for the motors is annoying.
I had to hot-glue down the JST-XH plugs that hold the servo connectors since they only sort of fit in to the breadboard and would pop out.
I needed to plug and unplug the USB on my Pi Pico to reboot the device to deploy and run code again.
Wires everywhere, afraid I’m going to somehow mess up the setup when I don’t notice a wire coming loose.

This is all getting in the way of working on the software development. I want to build out a dev PCB to eliminate most of these problems. This will make things a lot less fragile, and will add some features to make it easier for me to redeploy and test code quickly.

All cad files, electronic files, source code, etc, referenced in this post series is available on my github page.

Can Crusher Part 1 - Building the Frame

2022-10-03T00:00:00+00:00

I have some free time so I decided to do a project for fun where the goal was to build something from start-to-finish, doing as much as possible. All my public projects are software-only. Now that I’m doing more stuff with robotics, hardware, and consumer products, I thought it would be nice to do something to showcase all of those skills.

The project is an aluminum can crusher. Not just any crusher, but the most technologically advanced robotic can crusher the world has ever seen!

I hope to:

Design all the structural elements myself in a CAD program.
Fabricate it all in-house (literally my house) with 3D printing, CNC, etc.
Design a custom interface to control stepper motor drivers from a RP2040, aka Raspberry Pi Pico.
Design a PCB myself to hold the Pi Pico, Stepper Drivers, and associated electronics.
Get the PCB manufactured and assemble it in-house.
Provide a complete slick industrial-yet-commercial looking product.

My biggest concern is wondering if the thing will actually have enough power to crush cans. But we’ll worry about that later. For now, the frame…

Basic Goals for the Frame

Do a serious CAD project in FreeCAD.
Don’t just make everything a 3D print:
- Use Aluminum Extrusions for the frame.
- Use CNCed Acrylic for the frame ends for additional strength, and to get experience with a home CNC machine I own but haven’t really used yet.
Use affordable over-the-counter hardware, nothing too extravagant.
Provide files so someone else can print and build.

FreeCAD

In that past I’ve used OpenSCAD for personal projects but had mixed feelings about it. As things got more complex it was hard for me to visualize and build out objects in code. I decided I would give FreeCAD a try for this project.

I’d briefly tried using it without much success in the past. The recommended tutorial didn’t click and it was unclear how I’d turn the result of a sketch in to a working project. I was probably too hasty. This time I found an excellent series of video tutorials from Flowwie that were enough to get up to speed and hit the ground running. I was surprised how quickly I was able to build out all the parts and generate STLs for 3d Printing.

Aluminum Extrusions

I bought 3 400mm pre-cut 2040 Aluminum Extrusions to provide most of the strength for the frame. These are easy to find on Amazon/eBay/Ali-Express and I think I’ll be using them more in the future.

The only trouble is that the holes at the ends of the extrusions aren’t pre-threaded like they would be in a 3D printer kit. That’s easy enough to fix with a $10 screw tapping kit with a M5 sized tap. On my 2040 extrusions the hole was already a decent size. There was no need to drill a pilot hole before using the tap.

It requires surprisingly little force to tap the ends. The tap will be a little loose at first since it’s tapered to make it easy to get started. Once the tap is past the tapered part and not wobbling, I just kept twisting one full revolution to thread, then one half revolution back to clear out the aluminum you just cut. The aluminum is surprisingly soft and can even clump together in balls almost as if it’s melted together.

ProTip™: If you start to feel more resistance while tapping, don’t power through it! Unscrew the tap a few times. If that still feels stiff go back-and-forth in the already-threaded area quickly a few times. That should dislodge any fragments of aluminum and let you resume tapping.

3D Prints

The 3D Printing was straight-forward for me. I’ve done plenty of that before. I used PETG to get better strength than PLA. Really the only trick was to make test prints when testing things like screw hole size and position. Then I could test in 20 minutes instead of waiting 3 hours to find out things were misaligned by 1 mm.

CNC

I bought a Genmitsu 3018 PROVER a bit ago that I hadn’t put to much use. I decided I would use that to mill out a few plates from acrylic instead of 3D printing. 3D printed parts aren’t always the strongest. I wanted something more solid than a layered 3D print.

The CNC machine takes a .gcode file, but it’s different enough from 3D printing that I needed to use a different program to generate it than I use for my 3D prints. FreeCAD has a Path Workbench that is supposed to generate good code for CNC purposes. I decided to go with that.

Unfortunately, I lost a day trying to use FreeCAD 20.1 to generate the required gcode. There were all sorts of problems on multiple machines, hard crashes, etc. This was a real shame because in general I can’t recommend FreeCAD enough. Its a great product. Hopefully the bugs will get worked out. Until then, I was able to use FreeCAD 19.04 to keep moving. I exported .step files and imported them in to a clean FreeCAD 19.04 project on a different computer.

The next problem was that some test parts had dimensions that were off by a half millimeter or so. Since this controls the positioning of the aluminum extrusions, it’s critical that the dimensions be correct. After some debugging I determined the CNC simply required manual calibration since it still had its out-of-the-box settings. After measuring expected vs actual movement I needed to change the steps-per-millimeter settings from the default 800 to about 794.5 for both the X and Y axis. That’s about 0.5% error but it does add up over 100 or 200 mm.

After that it was smooth sailing and my CNC was just big enough to profile out all the parts. And the screw holes lined up perfectly with a 3D printed base with feet that went under the bottom frame holder.

Assembly

Assembly was straight forward. Just put the parts together and secure with M5 bolts, either directly in to the tapped aluminum extrusions, or with some T-Slot Nuts to position things on the length of the rails. Having a printed base under the structural acrylic with the bolts running through both worked better than expected. I attached some NEMA 17 stepper motors temporarily to the motor holders with some M3 bolts to verify the design.

The results:

Next up…

A proof of concept stepper motor driver powered by a Raspberry Pi Pico and some TMC2209 driver boards. These should drive the stepper motors. I chose TMC2209 drivers so I could try to create a self-homing machine. This avoids annoying mechanical limit switches that are difficult to position correctly.

All cad files, electronic files, source code, etc, referenced in this post series is available on my github page.

KC3MLC's Magloop for Amateur Radio

2019-11-12T00:00:00+00:00

I finally built my own MagLoop and wanted to share plans, build tips, theory, and performance with everyone else who is thinking about trying to make one.

The design is moderately portable and defaults to a 4 foot diameter loop. In addition, the loop can be swapped out for a smaller 2 foot loop to work higher frequencies up to 10 meters, and a larger 8 foot loop to get better efficiency on 30-80 meters.

I’ve made plenty of FT-8 contacts from 15 to 80 meters on four continents (mine included) and have been happy with the results. I hope someone else can find it useful too.

Want to skip ahead?

Design
Build
Installation
Tuning
Performance

Basic Magloop Theory

One problem I had while learning about magloops was that things just didn’t make sense! Everything I’ve read before said loop antennas need to be 1 wavelength loops, 1/2 wavelength dipoles, and 1/4 wavelength verticals with ground radials. How on earth can a loop that’s only 1/8th to 1/4th a wavelength, fed by another element that’s 1/5th the size of that (so 1/40th to 1/20th of a wavelength) even work at all?

Lets get the very brief theory of operation out of the way as painlessly as possible:

Start with the main loop. You have a single loop of wire attached to a capacitor opposite the driven element. We know that a coil of wire creates an inductor. In this case the loop is a coil with exactly one turn and is indeed an inductor! Combined with the capacitor, we now have a LC circuit which will resonate at appropriate frequencies.

Next up is the driven element, the loop of wire connected to your transmitter and placed inside the main loop. Once again, even though this is a single loop, it’s also a 1 loop coil of wire, making it another inductor. And what happens when you place two coils of wire next to each other? You get a transformer. The driven element puts out the RF energy, where it is transferred to the loop, and we have radio waves.

Hopefully that makes things seem slightly less mysterious.

Design Goals and Decisions

It All Begins With an Absurdly Large Capacitor!

As I read up on literature and ran through online magloop calculators, a recurring theme is that a magloop can generate extremely high voltages, 3000-5000 volts and more! A standard air variable capacitor, where you rotate two sets of metal plates, can’t handle those voltages with even moderate power. This isn’t just a case of the literature being overly conservative. A previous magloop I build with such a capacitor would generate blue electric arcs when my transmitter hit even 15 to 20 watts.

I’m only planning to run 100 watts, but needed a capacitor that could handle more power. I needed a vaccum-sealed capacitor which removes the easy path for high voltage electric arcs between plates. The best way to get one that can do this and is affordable to the hobbyist is to order old Soviet-Era surplus vacuum variable capacitors off of eBay.

As I browsed the various listings from sellers in Ukraine and the Russian Federation, I kept up-selling myself. Only $10 more to handle 10,000 volts! Sold! Only another $20 for another 100 picoFarads! Done! And before I knew it I had purchased the biggest monstrosity of a variable capacitor that the finest minds in Soviet engineering could produce: A 10-500 picoFarad capacitor with a 10,000 kV rating. I didn’t realize how big I had gone until the thing arrived. It was huge! 10 inches long, five inches wide each way, and weight in the range of 6 pounds!

This forced me to change my initial design. Most loop builders indicate that they’ve gotten better results with the capacitor at the top of the loop and the driven element at the bottom. That was out based on the weight. And I originally hoped to get some height off the ground with a simple mast or tripod. Things would also be too top heavy for that.

I decided to turn the size of the capacitor from a weakness to a strength. Instead of attaching to a mast, I would create a base unit that could stand on its own. The weight of the capacitor itself would help stabilize the antenna. The base could then be placed on a picnic table, the roof of a parked car, or even an upside down bucket for actual usage.

Portable

I wanted my design to be portable in several ways. Many designs out there involve making an 8 or 16 foot high octagon out of copper pipe that has been braised together! I wanted something that I could throw in to my car and hopefully try a POTA excursion one day. And I at least wanted to be able to get the thing inside my house or garage without having to take a cutting torch to it!

Based on this, the main loop is just good old RG-213 coax with the shield acting as the loop. I can coil up the loop when not in use. As an added bonus, this allowed me to make different swappable variations of loop sizes easily.

Temporary

This antenna is intended to be used on site temporarily. It’s not intended to be mounted permanently. As such I tried to make it a little weather resistant in case there’s some light rain, but made no attempt to make it fully water or wind proof. If storms are coming the antenna goes inside.

No Motors

Many designs include a motor that spins the tuning capacitor at a distance. Some include a rotator to take advantage of the antenna directionality.

I had serious problems with a motor attached to a smaller capacitor on a previous antenna. It caused all kinds of stray capacitance and bizarre changes to SWR at random inexplicable times. I didn’t know if it was the control cable for the motor, the coils of wire in the motor, the connection to the capacitor, or what.

For now I will tune the antenna by hand, and position it by hand. This can be a little annoying, but I don’t intend to hunt-and-seek contacts. I plan to sit on a frequency, such as an FT8 frequency, or possibly calling CQ on a single frequency in a future POTA excursion. And running FT8 has worked just fine.

I’ll probable reconsider this at some point, but at least then I will have a good handle on my baseline expectations for the antenna and a better feel for if and what problems are caused by a new motorized attachment.

Build

Capacitor Mounting and Base

As I mentioned, when my capacitor from Ukraine arrived after six long weeks, it was bigger than I expected by far! But I was ready to get to work. So I went to my local big box hardware store and found a 12 inch by 12 inch plastic electrical junction box. If I had been more patient, I probably could have found something cheaper.

To make the basic base first mount the SO-239 adapters:

Mark the centers of the adapters on the outside of the base.
Drill pilot holes.
Drill out the large holes until the threaded part of the SO-239 can fit through them.
Insert the adapter through the hole backwards, so it faces inside the case. Mark the location of the four mounting holes in the adapter. Remove the adapter.
Drill the mounting holes.
Place the adapters in the correct way and mount them with nuts and bolts.
Use one extra long bolt on each adapter so you can eventually wire up the capacitor.

ProTip™: If you’re starting to make a bunch of amateur radio gear and you don’t have a STEP DRILL BIT you’ll want to get one as soon as possible. It allows you to drill large holes for things like SO-239 adapters without having to switch out bits repeatedly and without melting ABS plastic. A set is a few dollars and probably available at your local Harbor Freight.

To make the support holder:

Drill three sets of holes: top, middle, and bottom, that will each fit a 1 inch U-bolt.
Insert the U-Bolts and thread them in from the inside.

To attach the capacitor:

Cut two 8-inch lengths of 14 Gauge insulated stranded wire.
Strip a 1/2 inch or so off of each wire.
On the long bolt on each adapter:
- Add one washer.
- Wrap the exposed wire around.
- Add another washer.
- Add another bolt and tighten to get a good connection.
Take two hose clamps and put them on the ends of the capacitor. Screw them down until there’s a half inch of slack.
Place the capacitor in the base and position it.
Trim and strip the other ends of the 14 gauge wires so that they will reach the capacitor and have enough stripped wire to wrap around the hose clamps twice.
Remove hose clamps from the capacitor, wrap the wire around twice, re-attach and screw them down to get good contact between the wire and capacitor.

This completes a usable base.

This was enough to start using things, but it eventually became frustrating to tune the capacitor as it wobbled around. To reduce the wobbling, I bought a piece of 6 inch wood, and cut it it fit inside the box. I then used some old nylon straps to secure the capacitor to the board with a few screws.

ProTip™: Often times when you’re reading antenna plans you’ll be presented with a detailed manifest and list of parts. This is nice to be exhaustive, but often times leads to the impression that certain parts were carefully spec’ed out and tested, rather than just being materials available to the author.

The wooden base for my capacitor is just such a thing. It was wobbling, I wanted to stop it, and I didn’t want to use metal or drill holes in the exterior case. This was the solution I came up with. If I had a 3D printer I probably would have come up with something better. I’m hesitant to give detailed instructions since it was hacked together.

In short, feel free to improvise with any element of the plans presented here, and particularly with the capacitor support which I just threw together in a weekend. Even now I find myself wondering if I could make a base out of a beer cooler that’s big enough to hold all the cables and supports.

PVC Supports

I made a set of composable PVC support pipes that would allow me to easily set up loops of either 2, 4, or 8 foot diameters. My big box hardware store had pre-cut 2 foot sections of pipe, and I went for this rather than cut them myself. I also bought adapters to get the following combinations. The pipes were 3/4 inch and had an outer diameter of 1 inch.

One pipe with a four way adapter attached to one end which I’ll refer to as the crossbeam.
Two pipes with T adapters to support the sides of the RG-213 loop which I’ll refer to as the side supports.
One pipe with a modified T adapter to support the top of the loop which I’ll refer to as the top support. A slot was cut in to this so the main loop, along with the driven element could be hung on the adapter without having to pass through it. To do this I put the adapter in a vice and cut it with a hacksaw before attaching it to the pipe section.
Four pipes with a standard coupling attached to one end which I’ll refer to as the extensions. extenders.

The attachments were glued on with PVC primer and glue so that when I disassemble the supports the right parts stay together and the wrong parts don’t get stuck.

ProTip™: DO THIS OUTSIDE! I made the mistake of doing this on our enclosed porch, and fumes still managed to pervade the entire first floor of our house. We had to open the windows and air it all out. It is highly recommended that you have more ventilation.

Loops

I will generally refer to the loops by diameter which is also the height of the PVC supports. These are nice round numbers where the actual size of the loop is diameter times Pi. In addition there is some amount subtracted from the ideal cable length to account for the area of the loop where the capacitor sits and there isn’t any wire.

The loops are made of RG-213 coax with PL-259 plugs attached at each end. This allows me to insert the loop into the support frame and screw it in to the base.

The actual cable length was:

2 Foot Loop - 5 foot, 9 inches
4 Foot Loop - 12 foot, 8 inches
8 Foot Loop - 23 foot, 6 inches

The 8 foot loop ended up being slightly shorter because it was drooping more, and I went with more of a diamond shaped loop to avoid this.

While building your own antenna, rather than measuring out to these lengths,you should attach a PL-259 adapter to one end of the coax, feet it through the supports and screw it in to one side of your base, and then mark off the appropriate place to cut the other end of the cable. This will give a better loop if your project box or SO-239 placement is different than mine.

Driven Elements

The driven elements are made from RG-8X and should be 1/5 the size of the main loop coax. I did not try to factor in the full loop size including capacitor and just used the physical coax length divided by 5. There also needs to be a connector to hook the driven element up to your feed line, so when you cut the cable add 6 to 8 inches or more depending on how confident you are about getting the coax stripped and adapter installed on the first try. I used a female BNC adaptor but feel free to use whatever adapter you want.

There are many confusing designs available¹ for the driven element. In the end I chose the least confusing one. This involved soldering the center conductor of the coax to the shield where the loop will be the appropriate size.

To make one:

Cut a cable with an extra 6 to 12 inches to mount a connector to the feedline.
Strip one end of the coax with a standard stripping tool providing enough exposed center conductor to wrap around the cable.
Cut away the exposed copper shield at the end so it doesn’t accidentally contact anything it shouldn’t.
Measure back to the appropriate length and carefully remove 1 cm of casing without damaging the copper shield.

I found It was best to score the jacket on each side creating visible cut lines without cutting all the way to the copper, cut a slot out between the two lines, and then peel the rest of the jacket off.
Wrap the exposed center conductor around the exposed shield and solder it in place.
Use silicone self-amalgamating tape to seal the connection. I imagine heat shrink tubing or electrical tape instead would also be fine.
Connect a BNC Female connector or adapter of your choice to the exposed end.

Installation

At this point you should be ready to do an initial smoke test. I would recommend using the 4 foot antenna and setting the assembly on a table in your shack before trying to use it outside.

Setting up the antenna

I would suggest starting with the 4 foot loop, which can get good reception on 20, 40, and 80 meters. I would also suggest using a workspace such as a table in your hamshack since it will take some time to perform initial configuration of the driven element. Once the driven element is configured it will be easier to set up in a more desirable location.

To assemble in 4 foot mode:

Add the two side supports and top support to the crossbeam.
Insert the crossbeam in to the U bolts and tighten the wing nuts to hold it in place.
Hang the loop cable inside the top support, and thread the ends through the side supports.
Screw the PL-259 connectors in to the base unit.
Attach the feed line to the driven element.

Initial configuration the first time only:

Insert the driven element in to the top support under the main loop.
Use an antenna analyzer to roughly adjust the tuning capacitor to the desired frequency.
Experiment with positioning as described in the tuning section securing the driven element with temporary tape.
Once you find a workable position, permanently tape it the driven element to the main loop in a way both pieces of coax can easily be placed on the top support.

It should loop like this:

After initial testing you may want to set up the two foot loop. The procedure is basically the same, but you’ll use the Top Support only on the base:

Positioning

To get lowest SWR, the loop must be off of the ground. While testing on my back porch, I just sat it on a bar-height table. When I use the antenna outside, I set it on an upside-down utility bucket which is about 18 inches high. Any lower than that and the SWR started creeping up again. I suspect even higher is still better, as you’ll get better angles for DX takeoff, but the bucket was adequate to hit Europe and South America from Pittsburgh.

The antenna also has some directionality, with the strongest signal shooting off of the ends of the loop. It’s only somewhat directional, so I generally just point it either North/South or East/West and don’t try to dial in on an exact bearing.

Tuning

Tuning the Capacitor

Initial tuning is done by adjusting the variable capacitor. I attached a key ring to mine to make it easier to turn. When the capacitor is fully retracted and has the least amount of capacitance, you’ll be closest to the ideal efficiency² for the given loop size. At this point, very small changes in capacitance will have a large impact on the frequency sweet spot. As you add more capacitance and the frequency goes down, you’ll find that the bandwidth decreases and that you’ll need to move the capacitor more to move the optimal frequency.

Tuning the Driven Element

When you initially tune the antenna, you’ll probably have a poor best-case SWR. Once you’re within range of the frequency you want to transmit on, you’ll need to adjust the driven element positioning to get the best possible SWR. This is done through experimentation and in my experience relies on two factors:

How much of the top of the driven element contacts the loop.
How far up or down the bottom of the driven element is from the main loop.

You’ll need to experiment for yourself. Here are some notes on what worked for me. These are not intended to be prescriptive; you’ll need to find your own positioning. They will just give you a starting point for adjustments to try:

The driven elements should be close to the plane of the main loop, but don’t worry about the support pipes preventing you from getting that last 1/2 inch.
My driven element for the 2 foot loop most resembles what you’ll see in diagrams and is a nice round circle attached to the top of the antenna. Even still, some movement up and down on the bottom half helps as I switch between various bands.
My driven element for 4 foot worked best with the most surface area possible attached to the main loop, and the bottom half located extremely high, making a crescent shape.
The driven element for the 8 foot worked best in a kite shape, with barely any contact on the top of the loop. This also benefited from adjusting the position of the bottom on different frequencies. In general as the frequency goes down, pulling the bottom down helped on both this and the two foot antenna.

As pictured, you can see velcro holding the loop in a position which is best for 40 and 80 meters, and I move it up for 30 meters.

Tension lines

The 8 foot loop requires both tension lines to hold the PVC supports in place. Drill 1/4 inch holes in the support element that could pass through paracord. Holes should be drilled through both sides of the pipe so the cord can be run through the pipe.

Hole Placement:

Side and Top Supports A hole close to and parallel with the T Adapters.
Crossbeam Support Two sets of holes, near the base with enough room for the extension element, perpendicular to each other with a half inch space between them.
Three Extensions One hole drilled all the way through near the coupling.
The Base Extension Two sets of holes like the crossbeam support, but located so that they are above the capacitor housing so it doesn’t interfere with the tension ropes.

Review the picture of the assembled antenna if my placement directions don’t make sense.

Cut appropriate lengths of rope and melted the ends shut with a grill lighter. If you smell burning plastic and see smoke, you’ve melted them too much. Wrapping the cord around the pipe takes surprising amounts or cord, so start off with more cord than you think you’ll need and trim it back after your initial assembly.

First assemble the inner set of supports:

Attach the crossbeam and three extensions with normally placed holes.
Tie a knot in one end of the paracord and insert it through one set of holes at the base of the crossbeam.
Work through the other three supports running the line through the hole, pulling it tight, and tying a clove hitch.
Once these three supports are secure, run the line through the second set of holes on the crossbeam, tie a tautline hitch, and pull it tight.

The end supports can be attached and tied with the same procedure. It is best to (1) do this outdoors, and (2) only insert the supports one at a time as you’re ready to tie them in place.

Next mount the support in to the base and tighten down the U Bolts. Although you’ll need guy lines for long term installation, this will usually sit fine on flat land without tipping over. Still, release the supports slowly as you let go the first time.

ProTip™: Install the main loop and attach the driven element to the feed line before inserting the support in to the base.

Guy lines

The 8 foot loop also needs guy lines. To prepare the guy lines, cut paracord to three appropriate lengths, melt the ends closed, and tie a bowline knot on one end.

To install the guy lines loop them around the center mast, placed the antenna some sort of support, secure the lines to stakes with tautline hitches, and apply tension.

A helper is extremely useful here especially when the land is not flat, but you can usually find one line to secure first while holding it tight and then go back and get the other two lines in to place after the fact.

For the initial driven element configuration, you will probably also want a step stool to find optimal placement.

Performance

ProTip™: pskreporter.info provides a great way to see how your antenna is doing when you’re working digital modes. Various stations send signal reports to a centralized location where you can view how well things are propagating, even when people are ignoring your CQ calls.

Note that if you want to help out others and send your FT8 reception reports, you must enable this in WJST-X settings under Reporting.

All results shown are from grid square EN90 running a Yaesu FT-857d at 100 Watts, operating in November 2019 in the deepest darkest reaches of the end of Solar Cycle 24.

Band
15	2 Foot / North-South / Morning QSOs with Germany, France, and Italy, along with a new DXCC entry at Bosnia-Herzegovina.
20	4 foot / North-South / Afternoon First Alaska contact and a couple hits in Brazil.	2 foot / North_South / Dusk. Not as bad as expected.
30	8 foot / East-West / Afternoon. QSOs with Italy, Croatia, Crete, and the Balearic Islands.	4 foot / NorthEast-SouthWest / Afternoon. Pointing directly to London helps us getting to Europe, but still not as far inland as the 8 foot loop.
40	8 Foot / East-West / Morning. We are getting further than ground-wave, but only high angles are making it back, giving an effective radius of somewhere between 750-1000 miles with a few outliers.	8 Foot / East-West / Sunset Now we're seeing good results to Europe!
80	8 Foot / East-West / 10 PM

Reference

Inductive Coupling Designs This article has a wealth of information on all things magloop, with a particularly detailed description of various ways to drive the magloop. ↩
Online Efficiency Calculator and AA5TB’s Excel Application allow you to estimate how much of your power makes it to the airwaves for a given loop and frequency. ↩

Using MiniCom with FlashForth on OSX

2019-09-04T00:00:00+00:00

I’ve been spending some of my free time lately playing with Arduinos and Raspberry Pis. They are dirt cheap, have GPIO port exposed, and there are plenty of peripheral devices available from China for as low as 1 or 2 dollars. Controlling the hardware is different enough from my day job it makes a great engineering hobby.

The Arduino IDE is nice if you’re trying to bring up some hardware quickly, but I was a little annoyed at the way all the actual hardware access was hidden behind library code that you don’t see in day-to-day development. My adventures took me to FlashForth, an implementation of Forth targeted for the microprocessors that run the Arduinos. Forth has always been on my languages-to-learn list. It’s supposed to be a high-level language that can be implemented in a small footprint (as small as 8K!), has interactivity, and gives you direct access to the bare metal. In the past I’d just be using it in a sandbox without touching hardware, but now I’ve found a good excuse to play around with it on these devices with only 32k of memory and utilize the bare metal access.

Things had been going well as I worked on tutorials and got the basics of Forth down, but came to a screeching halt when I tried to start developing a real project (an sd card driver) and was writing my code in files and sending various code blocks to the device as fast as my computer would let me. FlashForth would just start printing a bunch of ||||| and wouldn’t receive the input. This problem is documented on the homepage:

Normally communication with the PC and writing to flash works very reliably, but…

If to you see a vertical bar | output from FlashForth, it means that the UART RX interrupt buffer has overflowed.

It is usually caused by the PC reacting slowly on XOFF. setserial /dev/ttyS0 low_latency improves the situation on Linux.

On Windows, disabling the UART buffers improves the situation. Another alternative is to use TeraTerm with an intercharacter delay of a few milliseconds.

Unfortunately the fixes were for Windows and Linux, but nothing for OSX. The setserial command isn’t included on OSX. I eventually decided I needed to find a terminal program that ran on OSX and allowed you to set the above listed intercharacter delay of a few milliseconds. This proved easier said than done. After spending time trying a plethora of programs, both Open Source and commercial demos, I finally found out that minicom could do what I needed, and was available via brew.

Configuration is a little tricky though. There were a few settings I needed to change to make things work. And unfortunately the pause-between-characters setting doesn’t get saved in configuration, so I need to set that up every time I fire up the app. But once I’m up and running it works well. Here are the two phases of setup:

Initial Configuration

Install minicom: brew install minicom.
Start minicom: minicom -b 38400 -d /dev/cu.YOURDEVICE
ESC-Z brings up menu.
1. O for cOnfigure Minicom.
2. Down arrow to Serial port setup hit ENTER.
  1. E to set baud rate.
  2. Optionally A to set serial device, but mine changes enough I use the -d command line flag.
  3. ESC to go up a menu level.
3. Down arrow to Screen and keyboard hit ENTER.
  1. P to set Add linefeed to No so you don’t double space.
  2. R to turn Line Wrap on so a command like words doesn’t run off the page.
  3. ESC to go up a menu level.
4. Down arrow to Save setup as dfl hit ENTER.

At this point all of the savable settings are stored in your config for later. We still haven’t solved the original problem of adding a delay though. This will need to be done every time at minicom startup.

Add intercharacter delay

ESC-Z brings up menu.
T for Terminal Settings.
F for Character tx delay (ms).
I went with 10 ms for best results on my system. Lower numbers may work for you.
ESC to leave menu and return to main screen.

At this point you should be able to safely paste large chunks of code for processing by the FlashForth interpreter.

Have Fun!

Now I just need to work on direct Emacs integration instead of using Ctrl-C Ctrl-V. If anyone has tips let me know.

Help! Google Adwords API Keys Stopped Working August 22nd!

2018-08-23T00:00:00+00:00

[TLDR? Fix is at bottom of page.]

We just spent two days debugging a problem with our google adwords API keys and finally got things working. We’re not sure if this problem is affecting people globally, but it was particularly difficult to debug so I wanted to get this information out there. Let me know if it helped you out.

We’ve been using API keys to generate google ads for several years now. If you’re using them, you have a basic understanding of how things work in ruby-land. There is a configuration file adwords_api.yml that you set up with basic values, then run the setup_oauth2.rb script included in the github repository for Google’s gems. This has you do browser-based authentication, and then the file will have a refresh token. When this file is re-used, the refresh token is used to request an access token as needed, and this access token is used to generate ads.

Yesterday morning, things blew up horribly and all of our processes generating ads failed! We went into major fire-fighting mode did some basic debugging. We decided that a three year old refresh token might be the problem, and regenerated one locally, and things seemed to be working again. But then, after an hour, all our processes would blow up again. We could fix the tokens every hour, but this obviously wasn’t a full-time solution.

As we tracked things down and went through many red herrings, we came to realize that normally Google’s code would figure out that the current access token was expired, and would request a new one via the refresh token. As we dug in to the google code, we finally made our way to the oauth2_handler.rb file in ads_common. In particular the get_token method on line 89:

# Overrides base get_token method to account for the token expiration.
def get_token(credentials = nil, force_refresh = false)
  token = super(credentials)
  token = refresh_token! if !@client.nil? &&
      (force_refresh || @client.expired?)
  return token
end

Using bundle open google-ads-common we were able to go in and change force_refresh to default to true. Lo and behold, authentication was working! So forcing the creation of a new token solved the immediate problem, but we still wanted to have a better idea of what was happening, and we were reluctant to monkey-patch Google’s gem only to have things break in future versions.

There was one interesting thing we noticed as we looked through various SOAP output. Getting back to the original inputs, there are two values attached to the access token, the creation date, and the time, in seconds, until it expires. As we looked at the output, we noticed that it was returning expires_in values that were actually counterintuitively increasing rather than decreasing. We would have expected a key that started at 3600 to, when called a minute later, to return 3540. Instead it was returning 3660, and climbing up until hitting 7200 one hour later, at which point the access token would be expired, but our code would not generate a fresh token, and our app would start blowing up.

Unfortunately, we were unable to tell if the value always worked this way, or if there was a new breaking change introduced yesterday morning when we first encountered errors.

We were reluctant to monkey-patch the core google libraries and have to deal with that. So armed with the knowledge that our gems weren’t calculating expiration date correctly, and we wanted them to know that they needed to generate the access token, we tried modifying our config files and found a fix that worked without having to patch google’s gems.

The fix:

Original adwords_api.yml, as generated from setup_oauth2.rb:

---
:authentication:
  :method: OAuth2
  :oauth2_client_id: REDACTED
  :oauth2_client_secret: REDACTED
  :developer_token: REDACTED
  :client_customer_id: REDACTED
  :user_agent: WebKite_Radius
  :oauth2_token:
    :access_token: REDACTED
    :refresh_token: REDACTED
    :issued_at: 2018-08-23 13:10:13.299601000 -04:00
    :expires_in: 3600
    :id_token: 
:service:
  :environment: PRODUCTION
:connection:
  :enable_gzip: false
:library:
  :log_level: INFO

Note the pertinent information is the :issued_at: and :expires_in: The fix is to switch :expires_in: to 0:

---
:authentication:
  :method: OAuth2
  :oauth2_client_id: REDACTED
  :oauth2_client_secret: REDACTED
  :developer_token: REDACTED
  :client_customer_id: REDACTED
  :user_agent: WebKite_Radius
  :oauth2_token:
    :access_token: REDACTED
    :refresh_token: REDACTED
    :issued_at: 2018-08-23 13:10:13.299601000 -04:00
    :expires_in: 0
    :id_token: 
:service:
  :environment: PRODUCTION
:connection:
  :enable_gzip: false
:library:
  :log_level: INFO

After that, things worked perfectly! The gems assumed the current access_token was already expired since it had a lifetime of 0, and it forced generation of an up-to-date access key.

Let me know if this helped you out.

This was a really unusual bug for us, and we were surprised that it wasn’t all over StackOverflow or the Google forums since it completely took our production services down and was extremely difficult to resolve.

I’m still curious if we just have something really odd going on in our local setup, or if this was a more widespread problem. So please shoot me an email if you found this information helpful, or if you can shed any additional light on the sudden change in our production environment.

Thanks!

-Grant

Universal History of Bitcoin Infamy

2018-04-06T00:00:00+00:00

Presented at the Crypto For The Community Conference, Pittsburgh,

April 2018

Slides for presentation..

Help! I fried my postgres install on homebrew!

2017-05-08T00:00:00+00:00

Did you:

Get an error about readline when running psql?
Quickly do a brew upgrade postgres?
Think everything seemed fine until you rebooted your computer?
At that point learn that postgres wasn’t running because of incompatible file versions?
Enter a world of hurt when you started reading up on the fact that an upgrade from postgres 9.3 to postgres 9.4 required a manual DB upgrade?
Experience shock to learn that you couldn’t even install postgres 9.3 after brew installed postgres 9.6?
Curl up in a fetal position, covered in cold sweat, wondering how the hell you’re going to have time to rebuild your complicated, er… you mean sophisticated, development environment from scratch when there’s so much work to be done?

If so, I feel your pain. Hopefully I can help.

Fix to migrate your old postgres 9.3 databases to 9.6 in homebrew.

mv /usr/local/var/postgres/ /usr/local/var/postgres.old

# Get old versions

brew tap petere/postgresql
brew install postgresql@9.3
brew install postgresql@9.6

# Install 9.6 db
initdb /usr/local/var/postgres/


# Stop the running 9.6 instance
sudo brew services stop postgres

# Migrate to new version
# may need to look in to /usr/local/Cellar to get exact directories
pg_upgrade -b /usr/local/Cellar/postgresql@9.3/9.3.16/bin/ -B /usr/local/Cellar/postgresql/9.6.2/bin/ -d /usr/local/var/postgres.old/ -D /usr/local/var/postgres

# Drink a coffee (or beer) or three or six while there's a migration

# Start up the server
brew services start postgres

# Verify server is running
psql

But now it blows up because of postgis!

I had two old databases that used PostGIS. They caused the migration to fail. Attempts to get postgis to install on the 9.3 version of postgres failed. Unfortunately I don’t have a fix for that, but can tell you how to at least delete the offending databases if you don’t care about them, like I didn’t. If you do need the postgis enabled databases, the documentation for the tap indicates that you can use a utility called pex to install things, but I didn’t bother figuring that out.

To delete the old offending tables:

brew unlink postgresql@9.6
brew link -f postgresql@9.3

# Need to start manually beacuse the files aren't where we expect
pg_ctl -D /usr/local/var/postgres.old/ -l /usr/local/var/postgres/server.log start

psql # Do DROP DATABASE etc

pg_ctl -D /usr/local/var/postgres.old/ stop

brew unlink postgresql@9.3
brew link -f postgresql@9.6

A little annoyed at brew right now.

Brew has given more than enough I can’t be too mad at it for too long, but I’m a little disappointed that:

It silently upgraded readline which introduced a bunch of errors to old versions of software. I actually thought this blew up when I did an OS upgrade to OSX.
It makes no attempt to warn me that upgrading from 9.3 requires some serious manual intervention, the second time it’s silently updated a version of software to a version that’s incompatible with everything I have installed.
It allows no way out-of-the-box at this point for me to install the 9.3 binaries to do the upgrade from 9.3 to 9.4.
It seems to have broken old ways of installing old software by checking out an old commit of a particular brew file. Even though I tracked down the commit for 9.3, we now seem to autoupgrade and always install 9.6.2.
It’s inexplicable (to me) deprecations of vast swaths of homebrew commands. I’m sure the developers have their reasons and if I was on top of things it would make sense, but it’s frustrating to find four alternate solutions on StackOverflow that should magically fix your problem only to be told politely by brew, “Sorry, that command just doesn’t work anymore. Try again!”

It would be nice if there was some sort of --force or Are you sure?(Y/N) prompt for these more disruptive upgrades, and wish the 9.3 version of postgres would have floated around a bit more so I could have fixed the problem without resorting to third-party taps.

And annoyed at myself.

For not properly investigating the broken readline stuff I’d been dealing with off and on for a bit and ‘fixed’ by rebuilding my rubies in rvm.
Running brew commands nilly willy.
Not having a set up where it wasn’t a problem to blow away my dev dbs and start from scratch. I should have either had backups, or been able to work from clean databases without affecting my productivity.

And thankful to Peter Eisentraut

Who’s homebrew tap saved the day. Thanks Peter!

And four hours day later on a monday afternoon

That twelve character bug fix worked! I’m off to my next coding adventure. Maybe now would be a good time to upgrade to Sierra. What’s the worst that could happen?

Update 2018-06-05 - Brew does it again!

I just tried installing pg_top, a utility that lets you view active connections to your database. It should be a simple tool, but brew decides to:

Upgrade from 9.6.2 to 10.4 without saying anything!
Restart the service instead of leaving the old background one in place!
Makes no attempt to migrate existing databases to the new version!

Fortunately, this time brew was at least kind enough to keep the old version around. I was able to fix it with:

brew switch postgres 9.6.2
brew services postgres stop
brew services postgres start

Brew, please quit auto-upgrading services in a way that leaves things in an inconsistent state. If I need to manually migrate between major postgres versions, you shouldn’t just automatically update when I’m installing a small utility that has postgres as a dependency.

Update 2019-01-09

I just built some new OSX systems from scratch and had the opportunity to think about swapping brew out for Mac Ports or other solutions. I’ve been happy with brew 99.9% of the time, but still occasionally encounter problems with background servers. I continued to stick with brew for all my libraries, but used Postgres.app for my database install. It integrates extremely well with brew and the rest of my tool chain, and gives me control over when and where to upgrade.

Certificate Chains, Amazon EC2, and You!

2015-03-19T00:00:00+00:00

Are you getting https sec_error_unknown_issuer Error in Firefox? Did you add https at your EC2 Load Balancer? Well then Amazon lied to you.

We just dealt with a really frustrating error over at WebKite. Our site was suddenly broken with a bad ssl certificate, but only on Firefox. To make matters more confusing, things worked fine on all our Firefox installs in the office, but only blew up on clean installs.

If you’re reading this, and you’re seeing a (Error code: sec_error_unknown_issuer) in Firefox, and you’re hosting your site on EC2, then I can hopefully help you out.

TLDR: Amazon Lied to You When They Said the Certificate Chain Was Optional

Well I don’t know if I’d say they lied exactly, but at the time we set up the new certificate in the EC2 dashboard, Amazon showed us this:

We didn’t add the allegedly optional certificate chain. On some installs of Firefox, we now get the above error. Creating a new certificate for the load balancer that included a valid certificate chain fixed the problem.

What is the certificate chain?

I’ve said it before and I’ll say it again: Crypto is easy, authentication is hard.

It’s easy enough to encrypt browser connections so that nobody can read the traffic going on between you and a server when you, for example, send them your credit card number or social security number or mothers maiden name. But you need some method of authenticating the encryption keys so you know that they belong to the server you’re talking to, and not a hacker or government agency trying to snoop on you. That is: You need to trust the encryption keys you’re using to be able to trust the safety of your encrypted communication, by authenticating them as valid and from a trusted source. If you can’t do that, the green lock on your browser means nothing.

https solves this problem with Certificate Authorities. These are authorities who are trusted to vouch for other certificates by signing them. A browser or other consumer of https decides which authorities it trusts. This initial trust is written in stone. Although some certifications are performed, from the perspective of you the user sitting on your computer, these CAs are trusted because Firefox says they’re trusted, and that’s all that you need to know. In that sense that really makes your browser the ultimate authority, and then it delegates that authority to the various root certificates of Certificate Authorities that it trusts.

These root certificates then vouch for other sub-authorities. There are several reasons to do this, but as a security concern it allows you to compartmentalize damage if part of the system is compromised. That brings us to the real-world analogy I like to use to explain the system of trust I’ve already hinted at by calling it vouching:

Organized crime. A mob boss has lieutenants who work for him. These lieutenants have their crews. These crews might have people working for them. Each step along the way, introductions are made by vouching for someone. You might tell your boss, “I know a guy. He’s a good guy. We can trust him.” and your buddy joins the crew. The mob boss doesn’t need to know about your buddy at all, but if the system has integrity, starting with the mob boss and working the way down, you’ve established a chain of trust that leads all the way down to you from an undisputed authoritative source. Now if the system doesn’t have integrity, and your boss is a rat, you all get killed, but the rest of the system and trust is still in place. And the big boss doesn’t need to know about the intimate details of things going on 3 levels underneath him to maintain the integrity of the system.

So back to the browser, it has several (actually hundreds) of lieutenants who vouch for encryption keys. Someone else has vouched for your encryption key. You’re just not important enough to get to meet the big bad CAs themselves. An indeterminate number of layers between the root certificate and your certificate create a chain of trust that can be followed all the way from the big boss to little old you. This is the certificate chain.

Why didn’t this break on company computers?

Without that chain of trust, things should have blown up everywhere. But they didn’t. They were working just fine on our machines. This was particularly annoying because I use Firefox every day, but it wasn’t broken on my machine. If it had been a Chrome or Safari issue another coworker would have caught the error as well. But we were all working blissfully unaware of the problem until I fired up my backup laptop to do deploy a quick hotfix at home.

This bothered us enough that we decided to reproduce the error in staging. We:

Moved back to the old certificate.
Deleted the existing certificate store per this article.

Boom! Things were broken on previously working machines. So the site would work for people using Firefox, but only if they had previously accessed the site for the first time after we updated our certificate. But that’s still ~11% of web users. Ugh!

I originally thought that a new version of Firefox had locked down SSL security settings. But now I don’t believe that. I was making things more complicated than they were. My current unproven working theory:

Last year’s certificates on EC2 had the proper certificate chain.
When we accessed the site in the past, the certificate chain was stored in Firefox.
When we requested new certificates from the same provider, it had the same chain of signing certificates.
Our installations of Firefox were able to perform validation because they had access to the pre-existing signing certificates in the certificate db.
Installations of Firefox that had never visited the site before did not, and complained.

Back to the organized crime example: Someone tells you, “Hey this is my buddy Vladimir. You guys should talk. I think you could do some good for each other.” You reply, “Vladimir? We go way back. Remember that thing? No the other thing. Yeah, yeah, good times!” There’s no need to re-establish trust for someone you’ve already decided is trustworthy.

Some warning signs in retrospect

Here are some things that should have tipped us off to the problem in advance. They seem like obvious warning signs after the fact, but you know what they say about hindsight.

The directions to get SSL set up on Cloudfront for our static assets in s3 provided a sequence of instructions that required a certificate chain. It wouldn’t let us treat that component as optional.

ProTip™: This provided us with a really quick fix on our end, as the certificates we build for that were now in the drop-downs for the EC2 load balancers, so we didn’t need to figure out how to rebuild new certificates. If you did the same setup for Cloudfront with the same certificate, just use that one in EC2.

In addition, it seems our static assets loaded just fine in versions of firefox where the main site was breaking.
Someone experienced problems accessing the site on a phone. For some reason the errors read like an expired certificate, so we developed elaborate hypothesis’ about their cell phone network caching old pages to save bandwidth. A bad certificate is a more reasonable explanation.
We needed to manually install the certificates on our linux boxes so our RPC calls wouldn’t fail. Our back end services requests between boxes started failing after we got new certificates. We identified this on staging and figured out how to manually install the certificates, and then we no longer got ssl errors as the boxes talked to each other.

The fact that reasonably up to date machines didn’t have the certificates installed should have made us think more about why they weren’t part of the default certificate store on Ubuntu. But we wrote that off as openssl being really flaky and fragile when you’re doing stuff from the command line, so we just added the certs to prod boxes and went along our merry way.

In reality this was the same problem as described in the section above. If we would have had the full certificate chain in our certificate, then our RPC calls would have been able to provide full authentication up to the root certificate that was already in the system’s certificate store. But in this case the system won’t cache intermediate certificates as it gets them, because you need root access to store them in /etc and to run the update-ca-certificates command that generates the system’s master list.

Why didn’t the other browsers complain?

Good question. Firefox has been getting much more strict on these sorts of validations and the process of vouching. In fact, not only did Chrome refuse to complain here, it refused to complain when our ssl certificates on staging expired unexpectedly! That’s right. The certificates were expired and invalid and Chrome kept loading the site without complaining.

As some of my previous blog posts and work have explained, I’m not a huge proponent of the Certificate Authority model, but if you’re going to do it you should do it right. If not, you might as well start trusting self-signed certificates in the browser. A certificate is only valid if you can validate all the certifications all the way up to one that you trust, including things like expiration date and the general authenticity of each signing certificate.

Back to the (growing tired and old) analogy, a stranger walks up to you on the street and tells you his friend Eve is a great safecracker. Why would you trust Eve? You wouldn’t. (Unknown chain.) Or what about the same from your friend Pedro you haven’t seen in 15 years? Even though he’s never done you wrong, some time might cause your trust to expire. (Expired signing certificate.)

Hope that helped

I’m a bit curious if our problem was unique, or if a lot of sites are blowing up because they weren’t configured in a way that worked with the newest versions of Firefox. If you encountered this problem, or a similar one that wasn’t directly related to EC2, please shoot me an email.

Grant

Addendum

A co-worker found this article from 2012, or almost 2 and 1/2 years ago!

Pertinent lines:

Don’t be fooled by the AWS dialog, the certificate chain isn’t really optional when your ELB is talking directly to a browser. The certificate chain is the part that verifies that fully verifies which certificate authority issued the certificate and therefore whether or not the browser can trust that the domain certificate is valid. Different browsers handle things in different ways, but if you are missing the certificate chain and firefox, you get a pretty scary warning page.

Doh!

Reason 938 to Make Sure Your Test Fails Before It Passes

2014-05-13T00:00:00+00:00

Here’s a quick example showing why you want to see your test fail before you see it pass. This verifies that you’re actually testing what you think you’re testing. This rspec test was passing just fine before I realised I didn’t even test to see if the result was true:

it "detects normal zip" do
  Geomancer.zip_code_only?("15217").should 
end

I only noticed it when I wrote the next test which also passed when it should have failed.

it "doesn't detect bad zip" do
  Geomancer.zip_code_only?("123456").should 
end

(x.should is perhaps even more enigmatic than x.should be, which is actually valid and useful rspec syntax.)

I was honestly a little surprised that this didn’t fail with some sort of runtime error, but rspec works in mysterious ways. I still haven’t decided if this is a feature or a bug, but I think it would probably be nice if this threw a runtime error. I can’t think of a case where the above syntax would be useful.

Did Julius Caeser Predict the World Would End in 3268 AD?

2014-04-28T00:00:00+00:00

One of the nice things about dynamic languages like ruby is the REPL. The Read-Evalueate-Print-Loop. Also known as the interactive console. In ruby you fire it up with irb. Sometimes it’s easier to fire this up to learn about the implementation than to actually ugh read the documentation.

I was messing around with dates, and wanted to get an idea of how dates were formatted:

grant@john-icicleboy:~$ irb
2.1.1 :001 > require 'date'
 => true 
2.1.1 :002 > puts Date.new
-4712-01-01
 => nil 

I was really surprised to see that the date given with no arguments provided was 4712 BC. Now I suspected that Date.new was really shorthand for Date.new(0) and that this value was actually the epoch for ruby’s Date class, similar to the way Unix uses an epoch of 1970-Jan-01 and stores dates as the number of seconds relative this. (2 equals 2 seconds after Jan 1, 1970, etc.)

But why does ruby chose year -4712? That seems suspiciously as if ruby assumes the world is only 6 or 7 thousand years old! Instead of using this to troll people about creationism on twitter, I decided to dig in and RTFM. This does indicate that this year was intentionally and specifically chosen, and talks about various calendar systems throughout the ages, but isn’t useful in answering the question at hand. What is so important about -4712?

For this we have to turn to wikipedia. The article on the Gregorian Calendar isn’t particularly useful. Neither is the one on the Julian Calendar. But I add 4712 to my google searches, and I finally get to the page I’m looking for. It’s about the Julian Day. It explains that the Julian Day Number 0 is assigned to 4713 BC. It also goes on to explain that the Julian Period has a interval of 7980 years.

The first thing an attentive reader will notice is that the Julian Period begins in 4713 BC, and I’ve been spouting off about 4712 BC. How could the ruby implementation know about all these details and then get the year off by one? It didn’t. So why is it different? Because there’s no year zero in the calendar system. We go from 1 BC to 1 AD. However, we can specify a the number zero as an offset in the Date class that ends up representing 1 BC. So we need to subtract another year to represent these early dates, and year -4712 becomes 4713 BC.

Now think about all the hype about the Mayans predicting the end of the world on December 21st, 2012? It was the same scenario. This date was actually the date when the 5,126 year long calendar looped around and started over. It wasn’t considered the end of time any more than the end of one year and the start of the next. And yet a bunch of people were still saying the Mayans thought the world was going to end!

I think it’s interesting that the Julian Period also has an end. The period ends in 3268 AD. That’s over a millennium away from today’s date. By then we could be using some new calendar system. Star Date. Metric Time. Who knows? Today’s religions could seem ancient and silly. The Roman Empire itself could seem as distant culturally as the Mayan Empire does to us now. Will someone stumble upon these old articles about the Julian Period. Will they interpret the end of the cycle as the end of the world? Will the headlines read:

Julius Caeser Predicted It! The End Is Near!

We shall see.

Upstart Configuration for God

2014-03-20T00:00:00+00:00

I thought I’d follow up my completely impractical post on god with a practical one. I needed to write an upstart script for god and couldn’t find any examples out there. Here’s what I ended up doing.

Full Script

# /etc/init/god.conf

start on runlevel [2345]
stop on runlevel [06]

setuid webkite
setgid webkite

respawn
respawn limit 10 60

env HOME=/home/webkite

exec bash -l -c 'cd /opt/node/apps/god && exec bundle exec god -c my.god.rb -l /opt/node/log/god.log -P /opt/node/pids/god.pid -D'

The Breakdown

start on runlevel [2345]
stop on runlevel [06]

Run all the time, unless we go into single-user mode or shut down.

setuid webkite
setgid webkite

Run as an unprivileged user. Don’t run as root.

On the negative side: We can’t take advantage of the event driven conditions in god, such as ‘kill process if memory exceeds a half a gig’.

On the positive side: We don’t run as root. We don’t need a system rvm. And we don’t need to run rvm as root.

respawn
respawn limit 10 60

Have upstart respawn the process if it dies unexpectedly, but don’t let it go into death throes and overwhelm the server if it’s just plain broken.

env HOME=/home/webkite

We end up running a bash login shell to load rvm functions, but even that assumes that you have a decent $HOME variable. We don’t without this.

exec bash -l -c 'cd /opt/node/apps/god && exec bundle exec god -c my.god.rb -l /opt/node/log/god.log -P /opt/node/pids/god.pid -D'

Actually start god. This was the really tricky part.

We need bash -l -c so rvm works. rvm use won’t work in sh.
Upstart uses magic to track the process id. If you fork or daemonize, this changes. Upstart provides the two options expect fork and expect daemonize which works in most cases. Or so I’m told. But we still lost the proper process id with god for unknown reasons. So we needed to:
- Use exec so bash doesn’t start its own process.
- Specify -D (no-daemonize) even though we are daemonized, so that god doesn’t fork on its own and upstart gets the correct process id.

Hope this helps someone.

Process Management, Virtualization, Religion and God

2014-03-16T00:00:00+00:00

Sometimes I think about things a little too much.

In this particular case, I was configuring god to watch the components of our new software stack at WebKite. God is process management software that lets you start, stop, and restart programs. Most importantly it will automatically restart a dead process so I don’t get paged at 3:17 am on a Sunday.

Software developers are known for coming up with overly clever names for their creations, and god is no exception. It sits there watching over the world, bringing its children to life, keeping a benevolent eye on them, sometimes killing them dead in their tracks, and sometimes resurrecting and healing them when misfortune arises. Pretty clever, right?¹

technical information and Richard Dawkins keeps showing up in your results.

I had god set up and working, but there was just one problem. God cannot monitor and restart himself when he dies due to some unforeseen misfortune. For example, a server reboot. For this, I needed to configure upstart since we’re running ubuntu. This turned out to be surprisingly time consuming (ever use rvm and bundler on a server?), but like a lot of dev ops stuff it wasn’t particularly intellectually challenging. Tweak one setting, reboot the server, see if it works. Repeat 50 times.

And this is where my mind started to wander. God only thinks he’s all powerful and almighty. But he’s not. He’s just another program. Maybe a little more powerful than most, still just a userland program. He’s not the One True Transcendent God, creator of all, timeless, formless, boundless. He’s the demiurge!

I imagine at this point most readers are unimpressed and asking, “What the hell is the demiurge?” The demiurge is a deity in Gnostic cosmology. Okay, that probably doesn’t help unless I explain what the Gnostics.

There were a wide variety of Christian sects which had drastically different beliefs between Christ’s death and the time some 300 years later that Christianity was established as the official religion of the Roman Empire and the First Council of Nicaea established proper Christian orthodoxy. They all wrote gospels to spread the Word as they interpreted it. There were hundreds of Gospels. The ones we settled on (Mathew, Mark, Luke, and John) and put in the bible weren’t written by anyone who knew Christ directly. They were written somewhere between 40 and 150 years after Christ walked the earth.²

Jesus, they do so word for word. This leads some scholars to think there is a mythical Q source, a single document that the following gospels used as their source. So even after you account for the fact that you’re not reading them in their original language, the quotes from Jesus are a reflection of a reflection of what he might have said.

There’s an open problem in Judeo-Christian belief systems. In simple terms: Why do good things happen to bad people? More explicitly: If the universe was created by a benevolent, loving, supreme and perfect being, how can it possibly contain any imperfection?

The various Gnostic sects found an interesting solution to this problem: It wasn’t! The universe was created by a flawed deity who created the material world. He only thinks he’s the supreme being. He was an aborted creature, left for dead, who managed to survive, and being alone assumed he was the highest power in all of creation. He then went on to create the physical universe which inherited his flaws. This flawed deity is called the demiurge.

This provides an interesting explanation to the dichotomy of the vengeful god that exists in the Old Testament (flooding the world, destroying Sodom and Gomorrah, punishing and vindictive) and the loving god that Christ preaches about. Christ is teaching us about the real transcendent god who exists outside of this realm. He is trying to teach us how to unlock the divine spark that lives inside us, through gnosis or knowledge, so that we too can transcend the cage that is the imperfect physical universe created by the imperfect demiurge, and reunite the spark with the genuine supreme being.

And this gets us back to the god running our EC2 server. He sits there thinking he’s running the show, that he’s in charge, but in reality he’s a prisoner within a virtual machine that exists within one of thousands of physical servers exist within a server room within the world. He’s completely unaware and oblivious of. He sits there managing these lesser processes, making sure they are cared for, even if they do contain bugs and other imperfections. Yet he can’t remove the imperfections. He can’t fix them. He can only keep the applications going.

If this process is the demiurge in this scenario, what is the real supreme being? I don’t know. But what I do know is that I had a few terminal windows open. I rebooted the servers not by running /sbin/shutdown, but by rebooting them in Amazon’s AWS web console. And lo and behold, the following message appeared on the terminals before they shut themselves down:

Someone pressed Control-Alt-Delete.
System rebooting now!

What could reach into the virtual machine and push a Control-Alt-Delete button on a keyboard that never existed? A more powerful monitoring process? One that lives in a heavenly place known only as the cloud? One that was able to reach into a server room, into a physical server, into an imaginary virtual server and touch an untouchable keyboard to initiate a system reboot? One that has no earthly name?

The sad thing is that this supreme process probably thinks he’s omnipotent as he smites the universe that god has been happily monitoring with an unexpected reboot. At least until lightning bolts thrown down from the sky smite this seemingly supreme process as well.

Well at least until you try to search for ↩
In fact, one interesting thing to note is that when the Gospels quote ↩

What Does Yahoo BOSS Really Think of Cleveland?

2014-02-08T00:00:00+00:00

We had an interesting problem at work. Any time a user did a location-based search, we returned the same result set: Lakewood, Ohio. It didn’t matter what the user searched by. “Beverly Hills 90210” returned Lakewood, Ohio. “221B Baker Street” returned Lakewood, Ohio. Lakewood is part of the Cleveland Metropolitan Area, and as such we’ll just refer to it as “Cleveland” for the purposes of this essay.

My initial thought: Something is screwed up with caching. Somehow someone did a search on Ohio, and the results somehow got stuck in one of our many caching layers. Subsequent requests kept returning the same cached result. We do have a couple of employees from Ohio, so it wouldn’t be entirely outlandish for a Cleveland-area search the be the first search performed after a caching bug was introduced.

After digging though the code path, it turned out that Yahoo BOSS, our geolocation provider, was returning the same results no matter what location we submitted. Per the documentation, you perform a search by submitting a request formatted as either placefinder?location=<address> or placefinder?q=<address>. Our code was using the location parameter. I tried changing this to the q parameter, and suddenly geolocation requests magically worked.

I made a quick fix to the code, and did something you should never do: I pushed out an untested release directly to production on 6 PM on a Friday. Our test and staging environments don’t do real geolocation since each request costs money. They simply extract the zip code and look that up. It would take a while to get a test environment with real geolocation setup, and changing the word location to q in the bowels of some obscure code didn’t seem likely to bring down the entire site. At worst, location searches would just still be broken. This wouldn’t be any worse than constantly returning Cleveland. What’s the worst that could happen? So I pushed the code.

And…

Of course…

It worked! Problem solved. I entered the weekend breathing a deep sigh of relief. But then I found myself wondering: Why Cleveland? Why, of all the places on Earth, would Yahoo BOSS use Cleveland as the default location for all invalid queries? There are three options I can think of:

Yahoo BOSS considers Cleveland to be the center of the universe. When you don’t provide an appropriate geolocate-able address, it decides it only makes sense to return the center of the universe, the most important place on earth.
Yahoo BOSS somehow knows that Cleveland is some sort of nexus point between dimensions, a.k.a. the Hellmouth.
Yahoo BOSS, after stripping invalid parameters, is left with the impossible job of geolocating the null address. When provided a set of null inputs, it must return the one area on Earth that most ideally represents pure and unending emptiness, the absence of anything and everything, the state of total nothingness. And in its infinite wisdom, it returns Cleveland.

There are simply no other options. Being a Pittsburgher, I have my own opinions on which of the three options is unlikely, which is possible, and which is probable. But I’ll let you decide for yourself.

(As Yakov Smirnoff once said, “In every country, they make fun of city. In U.S. you make fun of Cleveland. In Russia, we make fun of Cleveland.”)

Using Your IronKey on 64-bit Ubuntu 13.10

2014-02-01T00:00:00+00:00

The linux IronKey executable is 32 bit, so you can’t unlock the IronKey on 64 bit linux. The traditional fix is to install ia32-libs:

sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install ia32-libs

This no longer works. The internet tells us that this meta-package was removed so that you don’t install a bunch of garbage, and you should just install the specific packages needed for your app.

Unfortunately there’s no way to tell what packages are required. The internet says you should just run ldd filename. This produces an error on the ironkey executable.

After some trial and error, I found out you need the 32-bit gcc libraries:

sudo apt-get install lib32gcc1

After this, I was able to mount my IronKey.

I don’t think that I needed to add the i386 architecture for this command to work, but I’m not about to re-install my OS to test that. So if the above command doesn’t find a package, perhaps you should try adding the i386 archtecture and updating apt.

Hope this helps another poor soul out there,

Grant

Mounting Encrypted lvm Volumes From a Live CD

2014-02-01T00:00:00+00:00

I once again fried my system and had to recover some files from an encrypted filesystem via a Live CD. It’s a frustrating experience. I tried both Debian Wheezy and Ubuntu 13.10.

First, after you boot it will show an encrypted filesystem icon. If you click on that the system will prompt for your password. After you enter your password, it will complain that the partition is invalid and can’t be mounted.

I’ve seen this before. In the past I would simply run sudo apt-get install lvm2 and the volumes would appear.

This time around, I was having no such luck. After some surfing, I found this Ubuntu Forums thread.

It basically worked. Here’s the exact sequence I followed on my system:

sudo apt-get install cryptsetup
sudo modprobe dm-crypt
sudo apt-get install lvm2
sudo modprobe dm-mod
sudo vgscan # this outputs the volume name
sudo vgchange -a y volume_name_from_above

After the last step, my encrypted volumes magically appeared on my desktop.

Hope this helps some other poor soul,

Grant

Nobody Cares About Signed Gems

2013-09-29T00:00:00+00:00

This postmortem originally appeared on the rubygems-openpgp-ca.org site that contained a proof-of-concept system to crytographically sign ruby software packages so they could be authenticated on download. The system was developed after rubygems.org, the primary distribution network was hacked and compromised gems were uploaded.

This was by far the most toxic experience I’ve ever had trying to write Open Source software, but here I just tried to focus on how uninterested developers are in software authentication, and how quickly a funnel of tens of thousands of programmers visiting the site resulted in only a half dozen or so actual test drives of the software.

Nobody Cares About Signed Gems

The signing key for the CA expired on August 17, 2013. As an experiment, I decided to leave the key in an expired state and see if and when anyone would notice or complain. Today (Sept 29, 2013) someone finally asked about it on the mailing list. Just over 45 days.

Why would I let the key expire?

I originally wrote rubygems-openpgp a few years ago because I wasn’t happy with the existing signing solution and was looking for a side project to work on. No one paid attention. It was clear that I was the sole user. So the project sat there in maintenance mode.

Then… A few years later… rubygems.org got hacked! There was no way to tell if the gems on the site had been compromised. Suddenly there was interest in signing gems. A few people found my project and submitted pull requests. Now that there were a few users, I dived back in and decided to take the gem from the proof-of-concept stage to a stable piece of software. I spent the next month doing so.

Along the way one-too-many people said that OpenPGP wasn’t useful because most end users couldn’t get into the strong set in the Web-of-Trust, ignoring the fact that distributions systems such as apt silently use OpenPGP behind the scenes. So I created this site as a proof-of-concept CA. The implementation was simple. I didn’t even really need a rails website. The content is exclusively jekyll. Basically a user would sign up. I would manually verify that they had control of their email, signing key, and had published gems on rubygems.org. I would then manually sign off on their key with a smart-card set aside for that purpose. I got a MVP up and running, and started circulating the link.

This brought tens of thousands of users to the site. Plenty of upvotes on reddit. Success, right?

Wrong!

One metric to measure success would be the number of people who requested certification. That number was less than a dozen.

But we need to keep in mind these are just gem authors, right? Those should be orders of magnitude rarer than the actual users, right?

Wrong!

I provide a test gem called openpgp-signed-hola. It is the standard “Hello World” gem with the addition of a digital signature. All the documentation referred users to use this gem to see how things work. Rubygems.org has nice charts that show how many times a version of gem has been downloaded. Of course this number includes bots and other automated retrievals in addition to actual human users testing out rubygems-openpgp. But it does provide an upper bound of the maximum amount of people who tried to verify the test gem.

No more than two dozen people tried to manually verify the test gem. To be honest, I think this number is probably high. I think the number was much lower.

Honestly, I found this disappointing. It takes less than 5 minutes to test gem verification. You would think that number would be at least equal to the number of upvotes on reddit. That people would actually read the site and try things out, instead of hitting the upvote button and going away. You would think the people writing blog posts about how important signing was would take 5 minutes to try out the software. But alas, they didn’t.

But it takes time for software adoption, right?

I had a few interested users. There were finally signed gems on rubygems.org signed by people other than me. I expected that would be enough that I’d get a trickle of signups over the course of the next year. But after the initial burst of interest activity came to a halt. After several months without receiving a single sign-up on the site, inquiries on the mailing lists, or issues in github, I found myself wondering why I was paying $20 a month to heroku to for https hosting. I went ahead and canceled that. And that’s when I decided to let the signing key expire.

Why would I let the key expire again?

The key was setup to expire every 30 days. This was basically a way to enforce a revocation policy. If the key itself was compromised (unlikely, it’s on a smart card), or if I was forced to issue revocations on the CA’s behalf, a periodic expiration would force users to retrieve updated certificates and hence any revocations.

If there was a small community of people who were using the CA keys, I would quickly get an email they started noticing that all their software was expired. It would at least provide some indication that I should continue to maintain the CA.

45 days later someone finally noticed.

That doesn’t prove nobody cares about signed gems, it just proves nobody cares about rubygems-openpgp

True.

But I haven’t seen any activity on the X.509 front either. After the rubygems compromise, things were supposed to change. That was finally the kick-in-the-pants the community needed to fix things and take gem authentication seriously.

The rubygems-trust project was started to setup replacement rubygems with CA capabilities. Activity fizzled out after a month with no visible results.
A few people tried to start signing their gems with X509, but most gave up because it was impractical.
The X509 code in rubygems itself has essentially the same TODO list as it did when the code was initially merged in 2007.

The above points, as the rest of this essay, is NOT an attempt to call anyone out, it’s simply what I’ve observed. Getting X509 signing and verification of gems to actually be used isn’t any farther along than it was before the rubygems.org hack either.

In Conclusion

I’m primarily documenting my experiences with the project so they’re available if/when there is push to start signing gems in the future.

This post is negative, but I hope it doesn’t come across as bitter. I don’t regret any of the time I spent on rubygems-openpgp or the CA. It was fun! And I’ll continue to maintain rubygems-openpgp if it’s needed. (The CA, on the other hand, will probably go away when the domain expires and/or I want to use my free heroku hours for another project.)

I do wish people were more interested in signing their gems one way or another, but then again I wish more people (especially techies) would encrypt their damn emails! Instead they’ll write blog posts and tweet about the importance of doing so, but won’t actually change their habits.

-Grant

It Begins...

2013-08-27T00:00:00+00:00

I decided to move my main site away from Google Apps. I don’t think I plan to blog regularly, but Jekyll is quick and convenient, and the Lagom theme looks nice. Let the migration begin.

Setting up an OpenPGP smartcard and IronKey on Debian Wheezy

2013-06-16T00:00:00+00:00

My computer just died. I threw the hard drive into another computer. Everything looked good until it tried to fire up X and then I just got a blank screen. You know what that means. Time to reinstall the OS. There were a few gotchas that I thought I’d document here.

Live CD doesn’t mount encrypted partitions

I run full disk encryption. After my computer died I wanted to grab a few files and backup the most before re-installing the os. I grabbed the Debian Live DVD image with xfce.

Everything booted. I clicked on my encrypted partition. I was prompted for a password. The password was accepted. But then the GUI complained that it couldn’t mount the filesystem.

After some trial-and-error, I learned that I needed to install lvm2:

sudo apt-get install lvm2

Then I was able to access my encrypted partitions and get the backup files that I needed.

OpenPGP smartcard

After that I reinstalled the OS and all my favorite packages. Gnupg2, enigmail, thunderbird, keepassx, etc. But after that my smartcard wouldn’t work. I run into this problem every time I reinstall my OS!

But after installing gnupg2, I still couldn’t use the smartcard. This happens to me every time I reinstall Debian. One long-standing issue is that scdaemon, the driver for the smartcard isn’t installed unless you install the gpgsm package:

apt-get install gpgsm

I’ve done that before. But I still couldn’t use the card unless I was root. I also needed to install lib-ccid and pcscd:

apt-get install lib-ccid pcscd

After that I was good.

On this install I’m just running xfce. In the past I’ve had problems with gnome taking over the smart card. See my previous post on Using an OpenPGP Smartcard on Ubuntu 12.10 if you’re still having problems.

Getting IronKey working

I also have an IronKey, which is a handy USB drive that has hardware encryption and (like an OpenPGP smartcard) will self-destruct if someone tries to brute force it.

Normally I just use the software included on the drive to mount the partition. But lately I’ve run into problems where the program cryptically doesn’t run. This is because the software is 32 bit and I’m running a 64 bit install.

You’ll need to enable multi-architecture installs for 32 bit software and install the 32 bit software to get the IronKey working:

sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install ia32-libs

After that the provided software should work.

How to Break Rails 3.2 But Not 3.0 on Linux But Not OSX

2013-03-12T00:00:00+00:00

Over at WebKite, we finally got around to updating our rails stack to 3.2 last week. This was of course long overdue. But it’s one of those things that takes a non-zero amount of time, and doesn’t provide any immediately visible features, so we’ve been pushing it back.

Testing went fine. Everything looked good. The code got merged to mainline. And our CI server decided it couldn’t run tests anymore. I’ll spare you the full traceback, but the basic error was:

cannot load such file -- multi_json/engines/Yajl

Soon after that, a developer pushed a feature branch to our staging server so team members could review it. Or tried to. That server blew up with the same error.

We found ourselves facing a problem that would only manifest itself on Linux boxes, but none of the developer’s MacBooks. And one of those nasty ones that doesn’t include any of our application code in the traceback.

And when a search on an error message doesn’t even give a single stackoverflow page, you know you’re in for a long day.

multi_json Looks Suspicious

The traceback did involve the multi_json gem, which dispatches your json calls to whatever library you want to use. A little research on that shows that it prefers the oj gem over the yajl-ruby gem.

I check to see if we’ve explicitly specified yajl-ruby in the Gemfile, or if it’s just a dependency of another gem. We actually have explicitly chosen that gem version, which seems a little suspicious.

I go to the commit for that. Sure enough, this was done after we were forced to upgrade to 3.0.20 for a security issue.

Upgrading to 3.0.20 Broke Our App

The initial upgrade from 3.0.19 to 3.0.20 broke our test suite. I didn’t think that minor-minor releases were supposed to do that. It turns out that to deal with some yaml exploits, the fix swapped out the JSON back end to one that didn’t work with our app.

The replacement back end didn’t properly serialize and then deserialize a string, as demonstrated by the following:

1.9.3p392 :001 > ActiveSupport::JSON.decode(ActiveSupport::JSON.encode("foo"))
ActiveSupport::OkJson::Error: unexpected "foo"
    from /Users/grant/.rvm/gems/ruby-1.9.3-p392@webkite/gems/activesupport-3.0.20/lib/active_support/json/backends/okjson.rb:69:in `textparse'
    from /Users/grant/.rvm/gems/ruby-1.9.3-p392@webkite/gems/activesupport-3.0.20/lib/active_support/json/backends/okjson.rb:47:in `decode'
    ...

Apparently there’s some disagreement about whether a bare string is valid json. It seems clear to me that it is when looking at json.org. And if it’s not, then shouldn’t ActiveSupport::JSON.encode throw an error? But I digress…

The short story is that only yajl-ruby parsed json in a way that was compatible with our app. So we added the following line of code to our initialization:

ActiveSupport::JSON.backend = "Yajl"

And went on our merry way. Now that seems to be causing problems on rails 3.2.

But Clearly Other People Are Using yaji-ruby

This is a widely used gem. And no one is reporting errors. So what’s up?

Well first I need to talk to the developer who did the original fix. He refreshes my memory on why we needed to use a specific json parser to begin with. Then he notes that the capital Y in ActiveSupport::JSON.backend = "Yajl" looks suspicious.

OSX Sorta Kinda Has a Case-Insensitive Filesystem

If you want to drive yourself crazy, start using caps for directories and file names:

johnmudhead:pikimal grant$ pwd
/Users/grant/src/pikimal
johnmudhead:pikimal grant$ cd /USERS/GRANT/SRC/PIKIMAL
johnmudhead:PIKIMAL grant$ cd ..
johnmudhead:SRC grant$ cd ..
johnmudhead:GRANT grant$ cd ..
johnmudhead:USERS grant$ cd ..
johnmudhead:/ grant$ 

If you want to drive rvm crazy, mix it up a bit:

johnmudhead:pikimal grant$ pwd
/Users/grant/src/pikimal
johnmudhead:pikimal grant$ cd ../Pikimal
==============================================================================
= NOTICE                                                                     =
==============================================================================
= RVM has encountered a new or modified .rvmrc file in the current directory =
= This is a shell script and therefore may contain any shell commands.       =
=                                                                            =
= Examine the contents of this file carefully to be sure the contents are    =
= safe before trusting it! ( Choose v[iew] below to view the contents )      =
==============================================================================
Do you wish to trust this .rvmrc file? (/Users/grant/src/Pikimal/.rvmrc)
y[es], n[o], v[iew], c[ancel]> y
johnmudhead:Pikimal grant$ cd ../PiKiMaL
==============================================================================
= NOTICE                                                                     =
==============================================================================
= RVM has encountered a new or modified .rvmrc file in the current directory =
= This is a shell script and therefore may contain any shell commands.       =
=                                                                            =
= Examine the contents of this file carefully to be sure the contents are    =
= safe before trusting it! ( Choose v[iew] below to view the contents )      =
==============================================================================
Do you wish to trust this .rvmrc file? (/Users/grant/src/PiKiMaL/.rvmrc)
y[es], n[o], v[iew], c[ancel]> 

And that was the problem!

OSX could require ‘Yaml’ because it doesn’t think it’s any different than ‘yaml’. However, linux thinks they’re totally different names. A one letter fix magically restored all of our linux boxes to good health:

johnmudhead:pikimal grant$ git log -p 0464fd2acc8d4c38f212dc8376dd0e80795b1cc5
commit 0464fd2acc8d4c38f212dc8376dd0e80795b1cc5
Author: Grant Olson <grant@pikimal.com>
Date:   Mon Mar 11 13:01:14 2013 -0400

    Don't break rails on 3.2 but not 3.0 and linux but not OSX

diff --git a/config/initializers/yajl_as_json_backend.rb b/config/initializers/yajl_as_json_backend.rb
index 3f75b9d..b204999 100644
--- a/config/initializers/yajl_as_json_backend.rb
+++ b/config/initializers/yajl_as_json_backend.rb
@@ -5,4 +5,4 @@
 #
 # See http://weblog.rubyonrails.org/2013/1/28/Rails-3-0-20-and-2-3-16-have-been-released/ 
 # for details as to why the JSON backend was changed.
-ActiveSupport::JSON.backend = "Yajl"
+ActiveSupport::JSON.backend = "yajl"

Now the only question I’m left with: Why did this work correctly on rails 3.0? If you have any ideas I’d love to hear them.

Using an OpenPGP Smartcard on Ubuntu 12.10

2013-03-09T00:00:00+00:00

I’m currently adding a key continuity feature to rubygems-openpgp. It works similar to the way that ssh stores copies of known host keys, and warns you if the key has changed.

This is the first time I’m trying to store any changes locally, and was a bit worried about the directories being created properly on Windows. So I decided to setup a VirtualBox install of Windows 8. My current hard drive was out of space, so that gave me an excuse to buy a nice new SSD drive. And that led to installing the latest version of Ubuntu. And now my Saturday is almost gone.

I had a little trouble getting my OpenPGP smartcard setup, so I thought I’d write about it here.

Problem 1 - scdaemon is in the Wrong Package

This is actually a problem on the Debian packages that has existed for many years. If you want to use gpg2, the scdaemon won’t get installed unless you install the gpgsm package:

sudo apt-get install gpgsm

That one I was expecting. But I thought I’d document it here anyway.

Problem 2 - Can’t Access the Card

This one I hadn’t seen before:

I got the following error with gpg2:

grant@johnicicleboy:~$ gpg2 --card-status
gpg: selecting openpgp failed: Unsupported certificate
gpg: OpenPGP card not available: Unsupported certificate

gpg fails as well:

grant@johnicicleboy:~$ gpg --card-status
gpg: selecting openpgp failed: unknown command
gpg: OpenPGP card not available: general error

There were a few areas where this same issue was reported, but I couldn’t find any resolution to the problem.

After some extensive googling, I was able to find out that the gnome-keyring-daemon now decides to grab control of your smartcard reader. Sure enough, I killed the process and gpg2 --card-status started working:

grant@johnicicleboy:~$ gpg2 --card-status
Application ID ...: D2760001240102000005000009200000
Version ..........: 2.0
Manufacturer .....: ZeitControl

General key info..: pub  2048R/A18A54D6 2010-03-01 Grant T. Olson (Personal email) <kgo@grant-olson.net>
sec#  2048R/E3B5806F  created: 2010-01-11  expires: 2014-01-03
ssb>  2048R/6A8F7CF6  created: 2010-01-11  expires: 2014-01-03
                      card-no: 0005 00000920
ssb>  2048R/A18A54D6  created: 2010-03-01  expires: 2014-01-03
                      card-no: 0005 00000920
ssb>  2048R/D53982CE  created: 2010-08-31  expires: 2014-01-03
                      card-no: 0005 00000920

Now I began the search for ways to disable the smartcard functionality on gnome-keyring-daemon. Couldn’t find anything. There were ways to switch off its ssh-agent replacement, which I wanted to do anyway since I ssh authenticate via my smartcard. There were some other settings about pkcs11 and secrets that seemed promising. So I ran the following commands to disable these features:

gconftool-2 --type bool --set /apps/gnome-keyring/daemon-components/ssh false
gconftool-2 --type bool --set /apps/gnome-keyring/daemon-components/secrets false
gconftool-2 --type bool --set /apps/gnome-keyring/daemon-components/pkcs11 false

But disabling them didn’t do the trick.

Next I went with a hack fix and basically nuked the gnome-keyring-daemon:

sudo mv /usr/bin/gnome-keyring-daemon /usr/bin/gnome-keyring-daemon.bak

This didn’t seem to have broken anything too horribly, and I never liked the gnome keyring or seahorse to begin with. So I decided to write a blog post for the sake of the interwebz.

But Then, A Complication

After all that I went to write things up. I decided to re-break things so I could obtain the error message that gpg --card-status threw. So I moved the gnome-keyring-daemon back into place.

Lo and behold, everything worked! Both gpg and gpg2 were able to access the card just fine.

I thought that maybe after I configured gpg-agent to act as the ssh-agent, it was grabbing my smart-card before gnome-keyring-daemon could. So I commented out the entries for that, and sure enough card reading was broken again.

The Proper Fix (or is it?)

Add this to ~/.gnupg/gpg-agent.conf to enable ssh support:

enable-ssh-support

Add this to ~/.bashrc to use gpg-agent for ssh instead of gnome-keyring-daemon, substituting your host name:

if [ -f "${HOME}/.gnupg/gpg-agent-info-HOSTNAME" ]; then
    . "${HOME}/.gnupg/gpg-agent-info-HOSTNAME"
    export GPG_AGENT_INFO
    export SSH_AUTH_SOCK
fi

Another Complication!

Everything seemed to be working, but then I got this generic error message from Enigmail:

No SmartCard 
could not be found in your reader 
Please insert your SmartCard and repeat the operation.

After enabling a debug log, it turned out the error was the same unsupported certificate error I was getting before, even though signing still worked from the command line. Killing the gnome-keyring-daemon process allowed me to sign emails again.

So, I went back to:

sudo mv /usr/bin/gnome-keyring-daemon /usr/bin/gnome-keyring-daemon.bak

And everything seems to be working… for now.

That’s All for Now

If you’ve encountered the same problem, hopefully this will help.

-Grant