FIR Podcast Network

For Immediate Release: Podcasts for Communicators

  • Home
  • Shows
  • Episodes
  • People
  • Blog
  • Subscribe
  • Sponsors
  • Contact Us
    • Join the FIR Podcast Network
  • Archives
You are here: Home / YouTubular Conversations / Gorillaz’s Humanz, DNN Sound Effects and Human Translations

Gorillaz’s Humanz, DNN Sound Effects and Human Translations

April 2, 2017 by Harry Hawk Leave a Comment

http://traffic.libsyn.com/fir/Gorillaz_Humanz_Sound_Effects_and_Human_Translations-au.mp3

Podcast: Play in new window | Download (Duration: 7:07 — 3.4MB) | Embed

Subscribe: Google Podcasts | Email | RSS

[powereditor]

This week we are discussing A virtual band in VR, deep neural networks (DNN) looking for sound effects and actual humans translating language.

Fake Musicians, Smart Machines and Human Intelligence

Visual Sound Effects

Noah Wang a software engineer at Google in a March 23, 2o17 blog post entitled Visualizing Sound Effects announced a working system for using AI to identify sound effects like “applause, laughter and bells.”

Once identified these “effects” are visualized by adding them to the auto-generated captions available for videos on YouTube.

“So what does this actually look like when you are watching a YouTube video? The sound effect is merged with the automatic speech recognition track and shown as part of standard automatic captions…

Click the CC button to see the sound effect captioning system in action.”

On the same day, Sourish Chaudhuri, who is also a Software Engineer at Google released a related post, “Adding Sound Effect Information to YouTube Captions” on the Google Research Blog. Sourish describes the process, “The DNN looks at short segments of audio and predicts whether that segment contains any one of the sound events of interest – since multiple sound effects can co-occur, our model makes a prediction at each time step for each of the sound effects.”

Visualization of the Viterbi Algorithm

“(Left) The dense sequence of probabilities from our DNN for the occurrence over time of single sound category in a video. (Center) Binarized segments based on the modified Viterbi algorithm. (Right) The duration-based filter removes segments that are shorter in duration than desired for the class.”- Google

Getting It Wrong: Legal, Ethical and Moral Considerations

Since those with disabilities are required by law to given the same material/information and the potential for ML (Machine Learning) systems to introduce errors, the team spent some effort with a study that investigated what would happen when the DNN ML system get’s it wrong.

“This presented a surprising result: when sound effect information was incorrect, it did not detract from the participant’s experience in roughly 50% of the cases. Based upon participant feedback, the reasons for this appear to be:

  • Participants who could hear the audio were able to ignore the inaccuracies.
  • Participants who could not hear the audio interpreted the error as the presence of a sound event, and that they had not missed out on critical speech information.”

Powering Translation with Human Intelligence

Google has long allowed for members of the YouTube community to provide translations for the captions that are found in their videos. In a March 30th, 2017 blog post Aviad Rozenhek a YouTube Product Manager announced that community members would now be able to also translate the title and the descriptions of videos.

While this does represent a possible risk for brands who enable community translations in their videos this is also an amazing way to engage with a broad multi linguistic community.

Gorillaz New Video

The British virtual band Gorillaz released a new music video from their upcoming album Humanz. The video released in both 2D and VR (360) formats gives creators a side-by-side look at the creative decisions and creative options that are possible in a video crafted for VR.

Screenshot showing the different views for Gorillaz's 2D and VR variants of the the Saturnz Barz (Spirit House) video

I highly recommend communicators watch both of these videos. Warning: while pixelated cartoon nudity is contained within, I do feel these are both safe for work. Together these two variants of the same video have recieved in one week over 25 million views.

VR/360 Video

Traditional 2D video

It is worth nothing that this virtual band has for a number of years been actually playing out in RL (real life).

Details on Wikipedia.

[powereditor]

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to email a link to a friend (Opens in new window)

Related Posts

  • album art: the Hobson & Holtz Report
    #812: Algorithms vs. trusted human editors
  • Inside PR podcast
    Inside PR 554: Social Content Meets the Sound of Silence

About Harry Hawk

Harry Hawk is a marketing consultant focused on cross channel attribution, paid social, community building and content strategy. As an Adj. Prof (CUNY + SWCCD) he has a focus in Hospitality, Business, eCommerce & Event Management/Planning.

Harry has a diverse range of clients in Robotics, Logistics and Baking. Harry has worked with many companies including HubSpot Labs, Momentum Machines, Phillip Morris, Paramount Home Pictures, NY Water Taxi, and many other startups.

Filed Under: YouTubular Conversations

Share Your Comments on YouTubular Conversations

FIR Community on Google+Share your comments or questions about this podcast, or suggestions for future podcasts, in the online FIR Podcast Community on Google+.

You can also send us instant voicemail via SpeakPipe, right from the FIR website. Or, call the Comment Line at +1 415 895 2971 (North America), +44 20 3239 9082 (Europe), or Skype: fircomments. You can tweet us: @FIRpodcast. And you can email us at .(JavaScript must be enabled to view this email address). If you wish, you can email your comments, questions and suggestions as MP3 file attachments (max. 3 minutes / 5Mb attachment, please!). We’ll be happy to see how we can include your audio contribution in a show.

Leave a Reply Cancel reply

Social connect:
Login Login with facebook
Login Login with twitter
Login Login with google

Your email address will not be published. Required fields are marked *

Connect with FIR

Support FIR

Become a Patron

Pledge as little as $1 per month to help support the existing stable of shows, make new shows possible, and be part of the growth of the FIR Podcast Network.

FIR Podcast Network Shows

FIR SPONSOR

The FIR Podcast Network is made possible by the generous support of our sponsors, of which we currently have none. Please reach out to us if you are interested in sponsoring FIR. Contact us at fircomments@gmail.com.

Looking for Older Episodes?

Visit the FIR Podcast Network Archives at forimmediaterelease.biz.

Join Our Facebook Community

Facebook Group Icon

FIR Podcast Network on Twitter

My Tweets

We Want Your Feedback

Email comments with or without audio attachments (5 MB/3 minute limit) to fircomments at gmail dot com.

North America Comment Line:
+1.415.895.2971

Europe Comment Line:
+44.20.3239.9082

Skype: fircomments

Promote FIR!

Let your readers and colleagues know about FIR by adding a logo to your blog or web page. Get it here!

FIR Podcast Network Logo

Help Spread the Word about FIR!

GaggleAMP: spread the word about FIR!

License

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

FIR Podcast Network website © 2023 Shel Holtz and Neville Hobson | Community Guidelines | Privacy Policy

Website development by WP Fangirl. Album art and design by Effective Edge Communications.