Breaking "provably correct" Leftpad

lukeplant.me.uk

60 points by birdculture 9 days ago

michaelt 3 hours ago

Every time I hear people talking about ensuring software correctness, it reminds me of this story: https://horningtales.blogspot.com/2006/09/exhaustive-testing...

A university group in the 1960s finds a vendor-supplied binary-to-BCD conversion function sometimes produces an off-by-one error.

They devise a simple fix - but find it adds an extra 'drum revolution' and so write an even more refined fix, that produces the right answer without taking any extra time.

Then they test it, over the course of several weeks, counting from 0 to 9,999,999, both in binary and in BCD, converting the binary to BCD, and comparing the results.

They proudly send the perfected implementation to the vendor - who sends it on to other users of the machine. Soon after they receive a phone call: "Were you aware that your new routine drops the sign of negative numbers?"

pjdesno 10 minutes ago

When I was an intern in the mid-80s we were cross-compiling for a 68000-based device (on a VAX, no less) but had our own version of the C library, supposedly because of some legal issues with the compiler vendor.
Part of this library was the routine to divide 32-bit integers, since the only native divide was 16x16 -> 32. Our implementation was the textbook shift-and-subtract algorithm, which took something like 1000 cycles on that CPU. Our routing algorithm did enough division that this ended up being a big deal, so they asked their intern to optimize it.
I came up with a version using the hardware instruction that was maybe 5-10x faster; they tested it a bit and sent a software update out to customers, after which the "phone home" monitoring (using real phone lines and modems in those days) started blowing up.
If you divide by zero, naive shift-and-subtract will return MAXINT, while the hardware instruction will raise a divide-by-zero exception. On startup we had a bunch of averages that were default-initialized to zero, and we'd divide by some of them. Getting 32-bit MAXINT wasn't a really big deal - it put a big spike into some moving averages, which decayed pretty quickly. An unexpected exception, though, was a big problem...

daxfohl 3 hours ago

So it's not really leftpad that's broken, it's that each language has a different definition of "String.length", generally corresponding to the definition of the underlying encoding, when dealing with accents, emojis, foreign characters, etc., and which notably does not always correspond to the number of spaces occupied by a string a monospace font.

IOW the proofs are correct: leftpad will result in spaces on the left, input string on the right, and String.length as specified. It's the spec itself that was incorrect: the last requirement should be based on "number of spaces occupied by the string in a monospace font", not "string.length", if that's what the desired requirement is.

That said, I think that's largely the author's point. You can prove that your code meets the spec, but you can't prove that your spec meets what you actually intended.

marginalia_nu 2 hours ago

> The leftpad function provided[1] didn’t take a string, but a char[]. Thankfully, it’s easy to convert from String objects, using the .toCharArray() function. So I did that.

Java's unicode handling is a monumental footgun of the most devastating variety where it works for most common cases, and almost all code that is not written with care to how it is handled will not deal well with code points that require more than 2 bytes to represent.

If you insist on using a char array (which is a bit unidiomatic), you should be using Character$charPointCount[2] to figure out how many code points are in the array, and even then you're probably SOL with regards to non-trivial emojis. String$charPointCount[3] is also an option if you want to use String and StringBuilder to do the padding, which arguably be more idiomatic.

[1] https://github.com/hwayne/lets-prove-leftpad/blob/ea9c0f09a2...

[2] https://docs.oracle.com/en/java/javase/25/docs/api//java.bas...

[3] https://docs.oracle.com/en/java/javase/25/docs/api//java.bas...

mattnewton 4 hours ago

The “random” choice of swift was quite fortunate since what this really seems to be testing is the ergonomics of the Unicode “character” definition in the standard libraries used, and swift has the best defaults of the languages mentioned haha

JimDabell 3 hours ago

> Swift’s string implementation goes to heroic efforts to be as Unicode-correct as possible. […] This is great for correctness, but it comes at a price, mostly in terms of unfamiliarity; if you’re used to manipulating strings with integer indices in other languages, Swift’s design will seem unwieldy at first, leaving you wondering.
> It’s not that other languages don’t have Unicode-correct APIs at all — most do. For instance, NSString has the enumerateSubstrings method that can be used to walk through a string by grapheme clusters. But defaults matter; Swift’s priority is to do the correct thing by default.
> Strings in Swift are very different than their counterparts in almost all other mainstream programming languages. When you’re used to strings effectively being arrays of code units, it’ll take a while to switch your mindset to Swift’s approach of prioritizing Unicode correctness over simplicity.
> Ultimately, we think Swift makes the right choice. Unicode text is much more complicated than what those other languages pretend it is. In the long run, the time savings from avoided bugs you’d otherwise have written will probably outweigh the time it takes to unlearn integer indexing.
— https://oleb.net/blog/2017/11/swift-4-strings/
theblazehen 3 hours ago

The author explicitly notes
> The Swift implementation was indeed written by ChatGPT, and it got it right first time, with just the prompt “Implement leftpad in Swift”. However: Swift is the only language I know where an implementation that does what I wanted it to do is that simple.
gipp 3 hours ago

Reading between the lines that definitely seems like an intentional choice
IshKebab 2 hours ago

> swift has the best defaults
Not really. Swift's defaults happens to match best to these particular requirements. Change the task and you will find other languages have the "best" defaults.

me_again 3 hours ago

Hillel Wayne posted a followup https://buttondown.com/hillelwayne/archive/three-ways-formal... which may be interesting. Essentially the issue is "what does the Length of a string mean?"

eterm 2 hours ago

For anyone else who was curious C#/.NET:

By default string.Length measures characters, but System.Globalization.StringInfo is provided to work with "text elements".

Unlike System.String, StringInfo doesn't have a built-in PadLeft function in it's base library. But it gets the length "right" by the author's standard.

Code to show lengths:

  using System;
  using System.Globalization;
     
  public class Program
  {
   public static void Main()
   {
    var weirdStrings = new string[] {"𝄞","Å","𓀾","אֳֽ֑","résumé","résumé"};
    foreach(var weirdString in weirdStrings){
     var asStringInfo = new StringInfo(weirdString);
   Console.WriteLine($"{weirdString.PadLeft(10,'-')} {weirdString.Length} {asStringInfo.LengthInTextElements}");
    }
   }
  }

Aurornis 3 hours ago

If you skimmed the article and missed the subtle notes, the author deliberately chose the wrong way to use the Rust version to make a point. Even ChatGPT told him it was wrong for his use case:

> As mentioned, the other way to use the Rust version has the same behaviour as the Haskell/Lean/etc versions. ChatGPT did actually point out to me that this way was better, and the other way (the one I used) was adequate only if the input was limited to ASCII.

dfee 3 hours ago

Yes, there was vibe coding throughout.
Should've probably been an article about how vibe coding results in incorrect software.
- Aurornis 3 hours ago
  
  It’s more than that. ChatGPT actually gave the right answer but they chose not to use it to make a point.
  - tracker1 3 hours ago
    
    Not quite right... as there would still be 3 failures in the tests, like Haskel.
    Realistically, I would expect to have a couple different methods available... one that split a string into groups of code segments that represent a single unit, and a display width of units (0-2). Alternatively another method that just counted the display width in numbers of character spaces (assuming mono-spacing). Then you could apply the padding more naturally.
    The grouping of characters would be more for other tests for display beyond left padding.
    
    IshKebab 2 hours ago
    
    They were definitely trying to make Rust look worse than it is. They even sarcastically acknowledge that:
    > I didn’t deliberately pick the one which made Rust look even worse than all the others, out of peevish resentment for every time someone has rewritten some Python code (my go-to language) in Rust and made it a million times faster – that’s a ridiculous suggestion.
    And
    > Rust, as expected, gets nul points. What can I say?
  - whbrown an hour ago
    
    To be fair, you'd have to also import this crate to get the result he wanted:
    https://docs.rs/unicode-segmentation/latest/unicode_segmenta...

josefritzishere 3 hours ago

Further reading on the Knuth correctness quote https://staff.fnwi.uva.nl/p.vanemdeboas/knuthnote.pdf