sense-music
Turn any audio file into structured analysis and annotated visualizations. Detects BPM, musical key, song structure, genre, mood, and transcribes lyrics. Liner notes for an AI.
Quickstart
from sense_music import analyze
result = analyze("song.mp3")
print(result.bpm.tempo) # 120.0
print(result.key.key) # "A"
print(result.key.mode) # "minor"
print(result.genre) # "electronic"
print(result.mood) # ["energetic", "bright"]
print(result.summary) # Natural language description
result.save("output/") # JSON, HTML, spectrogram, waveform
Analysis Result
The analyze() function returns a result object with all detected features.
| Field | Type | Description |
|---|---|---|
file_info | FileInfo | Source audio metadata |
duration | float | Length in seconds |
bpm | BPMInfo | Tempo detection (tempo, confidence) |
key | KeyInfo | Key detection (key, mode, confidence) |
sections | list[Section] | Structural segments (intro, verse, chorus, etc.) |
lyrics | list[LyricLine] | Transcribed lyrics with timestamps |
energy_curve | list[float] | Per-second RMS energy (0.0-1.0) |
genre | str | Classified genre |
mood | list[str] | Mood tags |
summary | str | Natural language description |
spectrogram | Image | Annotated mel spectrogram |
waveform | Image | Annotated waveform |
Capabilities
Structural Sections
for section in result.sections:
print(f"{section.label}: {section.start}s - {section.end}s")
Lyrics
result = analyze("song.mp3", lyrics=True, whisper_model="base")
Visualizations
result.spectrogram # Annotated mel spectrogram
result.waveform # Annotated waveform
result.save("output/") # Save all outputs to directory
Export
result.to_json() # Structured JSON
result.to_html() # Standalone HTML report
result.render_page("analysis.html") # Save HTML to file
Configuration
| Parameter | Default | Description |
|---|---|---|
source | required | File path or HTTP/HTTPS URL |
lyrics | True | Transcribe lyrics with Whisper |
whisper_model | "base" | Whisper model size |
max_duration | 600 | Max audio length in seconds |
Supported Formats
.mp3 .wav .flac .ogg .m4a .aac .wma .opus
Security
- SSRF protection — private, loopback, link-local IPs blocked
- XSS protection — all HTML output escaped
- OOM prevention — audio capped at 600s and 500 MB
- Path traversal blocked in save/render paths
- Whisper model allowlist: tiny, base, small, medium, large, large-v2, large-v3
- No network access beyond URL downloads
Dependencies
librosa matplotlib Pillow numpy openai-whisper
Part of the huje.tools Ecosystem
sense-music is part of huje.tools — open-source tools for the agentic age. Use it with any AI agent that needs to perceive and understand audio content.