VoiceHac architecture

Welcome to the high-level architecture overview for the conversational AI system I’m developing to control Home Assistant. In this post, I’ll outline the main components that make up the system and provide a brief description of their roles. Please note that designing the architecture is an iterative process, and I anticipate updating it as the project evolves. This overview will focus on the broader architecture without delving into the specific interfaces.

Stay tuned for updates on the architecture and the project’s progress as I continue to refine the design and implementation. By sharing this journey, I hope to inspire and inform fellow developers and enthusiasts interested in building similar AI-driven solutions.

Mycroft:

Mycroft can be seen as a microservice responsible for several key tasks, such as speech-to-text (STT), text-to-speech (TTS), wake word detection, and handling audio hardware interactions.

Home Assistant:

Home Assistant operates as a microservice that functions as the actuator, interacting with and controlling various home automation devices. It also serves as the centralized data server, storing and providing access to a variety of information like temperature data, to-do lists, etc.

NLP Component:

The text processing module can be split into multiple microservices, each focused on a specific skill or feature:

Named Entity Recognition (NER): Responsible for identifying and categorizing entities in the user’s speech.
Home Assistant Direct (HADirect): This microservice is further divided into modules, each handling a specific type of entity (doors, lights, windows, heating, locks, etc.). Each module would be independently responsible for processing requests related to its assigned entity.
GPT-2.0 Local AI: Used for processing more complex or Home Assistant specific queries that require a deeper level of understanding.
GPT-4.0 Cloud-based AI: Used for processing general knowledge queries that fall outside the scope of Home Assistant.
To-Do Handling Feature: This would be responsible for managing to-do lists and related requests.
Grocery List Feature: Handles requests related to the grocery list.
Media Player Feature: Handles all the media playing requests.

Orchestrator:

The Orchestrator is a crucial component in a microservices architecture. It is responsible for managing the different services, ensuring that they function together seamlessly. It routes requests to the appropriate service and ensures proper communication between services.

This Microservices Architecture will allow each component to be developed, deployed, and scaled independently. This can lead to a more robust system and quicker iteration times as changes can be made to individual services without affecting the entire system.

However, it’s worth noting that a Microservices Architecture also introduces its own set of complexities, including increased difficulty in managing and coordinating services, network latency, and data consistency. These challenges should be considered and addressed in your architectural planning and design process.

Context Manager:

This is an independent component responsible for maintaining conversational context. It interacts with the NLP Component, PyTorch GPT-2 Model, and OpenAI GPT-4 Model to handle follow-up questions and provide more relevant responses based on previous interactions.