I’m not sure why I took so long to start using Github Copilot, but I’ve been completely whelmed by it so far. It sometimes saves entire seconds in my workflow!

In contrast GPT-4 has induced more of those “wow” moments and saves me hours of work.

That said, in practice Copilot works surprisingly well for Beancount ledger files. Not super intelligently. But it does work.

Duplicating regular transactions from the same ledger file.

Copilot working on a beancount ledger file

The first major caveat is Copilot appears to only draw from the immediate file and doesn’t respect imports (and definitely doesn’t go up the ledger tree if you’re working in an imported file).

If you have a semi-regular order from a restaurant it will parrot that entry perfectly.

Mostly new transactions

For entries it hasn’t seen before, Copilot does a good job at guessing the account names and memo from just the Payee.

Copilot guessing my credit card accounts to use

At restaurants I regularly use my Amex blue or Gift cards and expense the txn to Expenses:Business:Meals:Restaurants. Copilot is in the ballpark by guessing those (seemingly at random).

Copilot guessing my credit card accounts to use

Similarly, it correctly interprets that I use my Wells Fargo 2% card for unspecific purchases. This ledger has no previous reference to Home Depot, but it’s also a fair guess to expense that to Expenses:Personal:Home (which does exist).

Obviously the amounts for new transactions are unknowable, but otherwise it’s dead on.

The computer knows arithmetic?

On “new” accounts, Copilot always suggests balance entries that are completely wrong. However it can correctly calculate between two balance statements even in my extremely crowded year ledgers.

For example, my Assets:GiftCards:Amazon account has a balance of $0 on March 9th. I assert this with:

2023-03-09 balance Assets:GiftCards:Amazon        0 USD

I then make a couple transactions on this gift card:

$ bean-query ./money/ledger.beancount 'SELECT date, account, position, balance FROM OPEN ON 2023-03-11 WHERE account ~ "Assets:GiftCards:Amazon"'

   date            account          position   balance
---------- ----------------------- ---------- ---------
2023-03-11 Assets:GiftCards:Amazon  42.25 USD 42.25 USD
2023-03-11 Assets:GiftCards:Amazon  21.74 USD 63.99 USD
2023-03-11 Assets:GiftCards:Amazon  29.04 USD 93.03 USD
2023-03-15 Assets:GiftCards:Amazon -24.12 USD 68.91 USD
2023-03-18 Assets:GiftCards:Amazon -17.45 USD 51.46 USD
2023-03-22 Assets:GiftCards:Amazon   2.56 USD 54.02 USD
2023-04-01 Assets:GiftCards:Amazon -51.32 USD  2.70 USD

After those 7 transactions (and no other assertions) the card has a calculated balance of $2.70.

Now on April 2nd I want to assert this balance.

What does Copilot suggest?

Copilot balancing correctly

It does the math correctly!!

It perfectly calculated a running total between the balance, all 7 transactions, and this new balance.

Copilot gets completions wrong 75% of the time, but I find it so incredibly impressive when it’s right.


Copilot hallucinating

As with all LLMs, the cracks start to show when your prompts suck.

With less information it will just make up account names. While Assets:Banks:SF:Checking does exist those two sub accounts do not. Granted it doesn’t know that, since all my account declarations are in a separate file. It did correctly interpret my memo of “Withdraw $40 cash” by writing 40 USD as the amount.

Another example:

When completing a balance for an account I have never asserted before, it really starts to flail in the dark:

Copilot hallucinating

  1. I don’t have an Assets:Banks:Venmo:Checking account.
  2. I don’t have 1k in that non-existent account.
  3. My actual balance in the existing parent Assets:Banks:Venmo is a hot $3.43.

I wonder if there’s a future in which PayPal bribes Microsoft to encourage beancounting Copilot users to make more Venmo $$$ deposits? Surely not…

My Confidence

While the specific account names and transaction amounts are more often wrong than right, in my opinion Copilot is still worth it for autocompleting the structure of entries.

It cannot be trusted to get the details right.

It only has a good grasp on what the ledger file should look like and how to nudge you there.