SLML Part 3 - JPEG to SVG to Stroke-3

singleline
machinelearning
slml
Converting JPEG scans to vector paths
Author

Andrew Look

Published

November 7, 2023

SLML Part 3 - JPEG to SVG to Stroke-3

This post is part 3 of “SLML” - Single-Line Machine Learning.

To read the previous post, check out part 2.

If you want to keep reading, here is part 4.

Most computer vision algorithms represent images as a rectangular grid of pixels on a screen. The model that the Magenta team trained, SketchRNN, instead interprets the drawings as a sequence of movements of a pen. They call this “stroke-3 format”, since each step in the sequence is represented by 3 values:

  • delta_x: how much did it move left-to-right?
  • delta_y: how much did it move up-and-down?
  • lift_pen: was the pen down (continuing the current stroke) or was the pen lifted (moving to the start of a new stroke)

First, I had to convert my JPEG scans into the “stroke-3” format. This would involve:

  1. converting the files from JPEG to SVG
  2. converting SVG to stroke-3
  3. simplifying the drawings to reduce the number of points

JPEG to SVG

When I first started converting to SVG, I had trouble finding a tool that would give me a single, clean stroke for each line. Eventually I found a tool called autotrace that was able to correctly do a “centerline trace”.

(a) potrace
(b) autotrace
Figure 1: Comparison of Vectorization Tools.

SVG to Points

Then I used a python library called svgpathtools to take the resulting SVG files, and convert each of the paths to a sequence of points. This step is necessary because SVG paths are often represented as Bezier curves.

One problem I noticed was that the drawings were represented as many separate strokes rather than one continuous line. For example, in the image below, each color represents a separate pen stroke.

separate strokes

Line Simplification

Finally, I’d apply the Ramer-Douglas-Pecker (“RDP”) algorithm on the resulting points, which uses an adjustable “epsilon” parameter to simplify down the drawings by reducing the number of points in a line’s path.

RDP example

This is important because the SketchRNN model has difficulty with sequences longer than a few hundred points, so it’s helpful to simplify the drawings down by removing some of the very fine details while preserving the overall shape.

phase1-sample-0177-epoch-01700-orig

Next in my SLML series is part 4, where I experiment with hyperparams and datasets in training SketchRNN.