Starting with a simple question Under the guidance of Prof. Linga Reddy Cenkeramaddi, I spent my project period trying to answer a question: automatic detection of vehicles based on sound. This blog is regarding the students in innovation projects.
Can a machine listen to traffic and tell which vehicle is passing by, using sound
alone? I worked on the Melaudis_vehicles dataset. It contains real recordings of six classes: bicycle, motorcycle, truck, tram, car, and bus. The catch: the dataset is strongly imbalanced (for example, many more cars than trams), so a naive model quickly learns to “love” the majority classes and ignore the rest.
My goal was to go beyond that and build a deep learning system that actually listens carefully to all six classes, bicycle, motorcycle, truck, tram, car, and bus.
Turning sound into pictures the model can “see” Many researchers have already shown that audio can be turned into images and fed to convolutional neural networks. Inspired by that work, I created time-frequency images from the raw waveforms and then pushed the idea further for this dataset.
Instead of only using the traditional Short-Time Fourier Transform (STFT), I systematically compared several transforms and focused on the Wavelet Synchrosqueezed Transform (WSST). WSST gave much sharper and more structured patterns for our vehicle sounds than STFT.
When I trained models on WSST images, they clearly outperformed an STFT-based baseline. Choosing, testing, and validating WSST as the front end was the first major design decision in my pipeline.
Designing the custom hybrid model:
The second big step was designing the model and loss function myself, building on ideas from recent research but tailoring them to this problem.
I implemented a hybrid CNN–BiGRU architecture in PyTorch:
• A convolutional neural network (CNN) extracts local visual features from the WSST
images.
• A bidirectional GRU then learns how those features evolve, treating the spectrogram like a sequence.
• On top, I added a cosine classifier head with a customized LDAM + DRW loss to
fight the imbalance between bicycle, motorcycle, truck, tram, car, and bus.
