Silbo is based on a spoken dialect that is fundamentally non-tonal . This means that the pitches of spoken words or syllables are only used for inflection or to convey emotion—they don’t change the inherent meaning of the word.
The whistled version of Silbo is created by tracking the second formant of the vowel sounds found in the spoken version of the language. This means that the normal shapes made by the vocal tract when pronouncing the spoken language are preserved as much as possible during the whistled version. These whistled melodies retain and mimic the harmonic relationships that characterize the transitions between spoken vowel sounds.
This process of approximation is strikingly similar to the one used by Yale’s Haskins’ Lab to create sinewave speech which I posted about previously. Consonants, physically hard to reproduce during a whistle, are replaced by a variety of techniques, including pauses, rising or falling tones, as well as a complex process of designating transitions between various formants as different types of consonants. As the Haskins’ sinewave speech demonstrates, consonants are not always necessary in the comprehension of sentences if enough formant information is present.
You can read a basic translation of this video here.