March 2025 (in progress)
Roles: Research, Design, Prototyping
An AI-assisted receipt reader and expense tracker mobile app.
As artificial intelligence becomes more advanced in the past several years, an increase in capability to process non-textual information becomes available to tool makers of all kinds. In personal finance, where a large amount of tool already exists, the prevalence of OCR technology has long enabled basic a level of receipt and invoice parsing. However with the advent of Large Language Models (LLM) has made parsing and interpreting visual sources much easier to achieve. This experimental app is to test the capabilities of LLMs' ability to interpret and process receipts without using specialized OCR services, and discover gaps in user experience when a tool is using an LLM-based technology versus traditional OCR services.
The app itself will be using React Native as the medium for development to ensure cross-platform compatibility between iOS and Android. It will call a webhook on n8n workflow that processes OCR, data parsing, and data backend functions via a ChatGPT LLM model. The initial proof of concept will rely on Airtable for data backend functions and storage.
The logic workflow in the initial proof of concept will utilize an AI agentic approach using n8n. A ChatGPT-based LLM model will act as the "brain" of the workflow that performs the receipt reading and parsing, determines whether this is an new expense item or an update to an existing expense, and perform data processing.
n8n AI agent workflow (work progress)
From personal and work experience using several expense tools and apps, as well as gathering feedback on existing tools, a preliminary UI design has been created.
Initial UI design sketch
Preliminary UI Design
The design is receipt-capturing focused. Upon opening the app the user is immediately taken to the receipt photo-capturing screen where they can take a photo of a receipt right away. The captured receipt is then sent to the cloud for reading and processing. The parsed information for the expense is then automatically populated on the Add Expense form. If everything looks good the user can then save the expense item.
In addition to entering expense items, the user has the ability to note who paid for a particular expense item, which allows the app to calculate how the expense costs should be shared among participants in an expense group. This will be useful for splitting bills among friends or sharing household expenses within family members.
At every step the user is able to take over entering the fields manually. As with any AI-assisted tools, the "human-in-the-loop" philosophy applies here as well, where the AI is only used for intelligently parsing data but does not automatically submit that data without user confirmation. It is a matter of safety that user should always have the final say in whether the data is correct before saving.
With UI design and workflow nearly complete, the next step is to develop the actual mobile app and make connections to the AI process in the backend. While the logic is relatively straightforward and AI-assisted coding can help with mobile app development (particularly for someone like me who only have the most elementary understanding of React Native), there are several potential risks that will have impact on whether the application will ultimately be sucessful:
- Scalability of the n8n as an AI logic platform
- Database complexity for storing large amount of expense data
- The potential need for moving to a more enterprise level backend platform
A feasibility study will need to be conducted before data and logic layers can be finalized.