I’m not sure why I took so long to start using Github Copilot, but I’ve been completely whelmed by it so far. It sometimes saves entire seconds in my workflow!
In contrast GPT-4 has induced more of those “wow” moments and saves me hours of work.
That said, in practice Copilot works surprisingly well for Beancount ledger files. Not super intelligently. But it does work.
Duplicating regular transactions from the same ledger file.
The first major caveat is Copilot appears to only draw from the immediate file and doesn’t respect imports (and definitely doesn’t go up the ledger tree if you’re working in an imported file).
If you have a semi-regular order from a restaurant it will parrot that entry perfectly.
Mostly new transactions
For entries it hasn’t seen before, Copilot does a good job at guessing the account names and memo from just the Payee.
At restaurants I regularly use my Amex blue or Gift cards and expense the txn to Expenses:Business:Meals:Restaurants
. Copilot is in the ballpark by guessing those (seemingly at random).
Similarly, it correctly interprets that I use my Wells Fargo 2% card for unspecific purchases. This ledger has no previous reference to Home Depot, but it’s also a fair guess to expense that to Expenses:Personal:Home
(which does exist).
Obviously the amounts for new transactions are unknowable, but otherwise it’s dead on.
The computer knows arithmetic?
On “new” accounts, Copilot always suggests balance
entries that are completely wrong. However it can correctly calculate between two balance
statements even in my extremely crowded year ledgers.
For example, my Assets:GiftCards:Amazon
account has a balance of $0 on March 9th. I assert this with:
2023-03-09 balance Assets:GiftCards:Amazon 0 USD
I then make a couple transactions on this gift card:
$ bean-query ./money/ledger.beancount 'SELECT date, account, position, balance FROM OPEN ON 2023-03-11 WHERE account ~ "Assets:GiftCards:Amazon"'
date account position balance
---------- ----------------------- ---------- ---------
2023-03-11 Assets:GiftCards:Amazon 42.25 USD 42.25 USD
2023-03-11 Assets:GiftCards:Amazon 21.74 USD 63.99 USD
2023-03-11 Assets:GiftCards:Amazon 29.04 USD 93.03 USD
2023-03-15 Assets:GiftCards:Amazon -24.12 USD 68.91 USD
2023-03-18 Assets:GiftCards:Amazon -17.45 USD 51.46 USD
2023-03-22 Assets:GiftCards:Amazon 2.56 USD 54.02 USD
2023-04-01 Assets:GiftCards:Amazon -51.32 USD 2.70 USD
After those 7 transactions (and no other assertions) the card has a calculated balance of $2.70.
Now on April 2nd I want to assert this balance.
What does Copilot suggest?
It does the math correctly!!
It perfectly calculated a running total between the balance
, all 7 transactions, and this new balance
.
Copilot gets completions wrong 75% of the time, but I find it so incredibly impressive when it’s right.
Hallucinating
As with all LLMs, the cracks start to show when your prompts suck.
With less information it will just make up account names. While Assets:Banks:SF:Checking
does exist those two sub accounts do not. Granted it doesn’t know that, since all my account declarations are in a separate file. It did correctly interpret my memo of “Withdraw $40 cash” by writing 40 USD
as the amount.
Another example:
When completing a balance
for an account I have never asserted before, it really starts to flail in the dark:
- I don’t have an
Assets:Banks:Venmo:Checking
account. - I don’t have 1k in that non-existent account.
- My actual balance in the existing parent
Assets:Banks:Venmo
is a hot $3.43.
I wonder if there’s a future in which PayPal bribes Microsoft to encourage beancounting Copilot users to make more Venmo $$$ deposits? Surely not…
My Confidence
While the specific account names and transaction amounts are more often wrong than right, in my opinion Copilot is still worth it for autocompleting the structure of entries.
It cannot be trusted to get the details right.
It only has a good grasp on what the ledger file should look like and how to nudge you there.