[carykh] took a dive into neural networks, coaching a pc to replicate Baroque music. The outcomes are as attention-grabbing as the method he used. As an alternative of feeding Shakespeare (for instance) to a neural community and marveling at how Shakespeare-y the textual content output appears to be like, the method converts Bach’s music right into a textual content format and feeds that to the neural community. There may be one character for every key on the piano, making for an 88 character alphabet used through the coaching. The neural internet then runs wild and the outcomes are turned again to audio to see (or hear because it have been) how a lot the output feels like Bach.
The video embedded beneath begins with a little bit of a skit however hold in there as a result of when you hit the 90 second mark issues get attention-grabbing. These missing endurance can simply skip to the demo; hear original Bach followed by early results (four:14) and evaluate to the outcomes of a full day of training (11:36) on Bach with some Mozart blended in for selection. For a system fully unaware of any bigger-picture ideas resembling melody, the outcomes usually are not solely recognizable as music however may even be nice to hearken to.
The core of issues is that this character-based Recurring Neural Network which is itself the work of Andrej Karpathy. In his phrases, “it takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data.” How did [carykh] truly use this for music? With the next course of:
- Collect supply materials (tons and plenty of MIDI recordsdata of Bach items for piano or harpsichord.)
- Convert these MIDI recordsdata to CSV format with a device.
- Tokenize and reformat that CSV information with a customized Processing script: one ASCII character now equals one piano key.
- Feed the RNN with the ensuing textual content.
- Take the ouput of the RNN and convert it again to MIDI with the reverse of the method.
[carykh] shares an vital query that was raised throughout this complete course of: what was he truly after? How did he outline what he truly needed? It’s a bit fuzzy: on one hand he desires the output of the RNN to copy the enter as intently as potential, however he additionally doesn’t truly need full replication; he simply desires the output to tackle sufficient of the identical patterns with out truly copying the supply materials. The processing of the neural community by no means truly “ends”; [carykh] merely pulls the plug in some unspecified time in the future to see what the outcomes are like.
Because of [Keith Olson] for the tip!
Filed underneath: digital audio hacks, musical hacks