Building an AI System for Mental Health

This research explores how a website can be used for mental health professionals working with patients who have mental Disorders. These patients often face significant challenges in expressing their emotions or thoughts, making it difficult for therapists to fully understand their mental state. To address this, the study focuses on creating an intuitive interface that allows professionals to dynamically generate, modify, approve, and save AI-generated illustrations representing patients’ emotions and thoughts. So, the idea behind this project was simple: Give therapists a tool that can create illustrations with emotional expressions based on textual descriptions to support their communication with patients — safely, privately, and in real time.
The project combines a React frontend, a Python backend, and a locally hosted Stable Diffusion Model running directly on a secure server. By running all these components on a controlled server, the platform allows mental health professionals to explore AI-assisted visualization without depending on external cloud services.
The system architecture is composed of three core layers:
1.Frontend Interface (React):
The web interface acts as the primary interaction point for clinicians. It enables professionals to describe patient emotions, thoughts, body postures, or contexts through text inputs. The interface is designed to be intuitive, with clearly structured fields that guide users in formulating descriptive prompts.
2.Backend Processing (Python + FastAPI):
The backend acts as the operational engine of the system. Once a therapist submits a description, the backend processes the prompt and communicates with the AI model. It receives the generated image, encodes it efficiently, and returns it to the frontend.
3.Local AI Model (Stable Diffusion + LoRA Fine-Tuning):
The core of the platform is a Stable Diffusion model optimized for GPU environments. A custom LoRA model, trained on the character «Eir,» enables the system to produce consistent and meaningful representations aligned with therapeutic goals.
When a therapist enters a prompt with a textual description, it typically contains three structured components used during model training:
Body Posture, Facial Expression, and Emotional Expression. This setup makes sure the images are clear and consistent, helping therapists understand details in emotional states.
In the next phase of the project, we plan to conduct tests with a psychologist to evaluate the platform’s usability and effectiveness. The focus will be on UI aspects and on the efficiency of image creation based on textual descriptions.