sense-music

Turn any audio file into structured analysis and annotated visualizations. Detects BPM, musical key, song structure, genre, mood, and transcribes lyrics. Liner notes for an AI.

pip install sense-music

PyPI ClawHub Source

Quickstart

from sense_music import analyze

result = analyze("song.mp3")
print(result.bpm.tempo)      # 120.0
print(result.key.key)        # "A"
print(result.key.mode)       # "minor"
print(result.genre)          # "electronic"
print(result.mood)           # ["energetic", "bright"]
print(result.summary)        # Natural language description
result.save("output/")       # JSON, HTML, spectrogram, waveform
      

Analysis Result

The analyze() function returns a result object with all detected features.

Field	Type	Description
`file_info`	`FileInfo`	Source audio metadata
`duration`	`float`	Length in seconds
`bpm`	`BPMInfo`	Tempo detection (tempo, confidence)
`key`	`KeyInfo`	Key detection (key, mode, confidence)
`sections`	`list[Section]`	Structural segments (intro, verse, chorus, etc.)
`lyrics`	`list[LyricLine]`	Transcribed lyrics with timestamps
`energy_curve`	`list[float]`	Per-second RMS energy (0.0-1.0)
`genre`	`str`	Classified genre
`mood`	`list[str]`	Mood tags
`summary`	`str`	Natural language description
`spectrogram`	`Image`	Annotated mel spectrogram
`waveform`	`Image`	Annotated waveform

Capabilities

Structural Sections

for section in result.sections:
    print(f"{section.label}: {section.start}s - {section.end}s")
      

Lyrics

result = analyze("song.mp3", lyrics=True, whisper_model="base")
      

Visualizations

result.spectrogram   # Annotated mel spectrogram
result.waveform      # Annotated waveform
result.save("output/")  # Save all outputs to directory
      

Export

result.to_json()                    # Structured JSON
result.to_html()                    # Standalone HTML report
result.render_page("analysis.html")  # Save HTML to file
      

Configuration

Parameter	Default	Description
`source`	required	File path or HTTP/HTTPS URL
`lyrics`	`True`	Transcribe lyrics with Whisper
`whisper_model`	`"base"`	Whisper model size
`max_duration`	`600`	Max audio length in seconds

Supported Formats

.mp3 .wav .flac .ogg .m4a .aac .wma .opus

Security

SSRF protection — private, loopback, link-local IPs blocked
XSS protection — all HTML output escaped
OOM prevention — audio capped at 600s and 500 MB
Path traversal blocked in save/render paths
Whisper model allowlist: tiny, base, small, medium, large, large-v2, large-v3
No network access beyond URL downloads

Dependencies

librosa matplotlib Pillow numpy openai-whisper

Part of the huje.tools Ecosystem

sense-music is part of huje.tools — open-source tools for the agentic age. Use it with any AI agent that needs to perceive and understand audio content.