HOWTO transcribe from MP4 to TXT with Whisper AI

In an era where information is constantly flowing through various forms of media, the need to extract and transcribe audio content has become increasingly important. Whether you’re a journalist, a content creator, or simply someone looking to convert spoken words into written text, the process of transcribing audio can be a game-changer. In this guide, we’ll explore how to transcribe audio from an MP4 file to text using Whisper AI, a powerful automatic speech recognition (ASR) system developed by OpenAI.

What is Whisper AI?

Whisper AI is an advanced ASR system designed to convert spoken language into written text. It has been trained on an extensive dataset, making it capable of handling various languages and accents. Whisper AI has numerous applications, including transcription services, voice assistants, and more. In this guide, we will focus on using it for transcribing audio from MP4 files to text.

Prerequisites

Before you can start transcribing MP4 files with Whisper AI, make sure you have the following prerequisites in place:

  1. Docker: Docker is a platform for developing, shipping, and running applications in containers. You’ll need Docker installed on your system. If you don’t have it, you can download and install Docker.
  2. MP4 to MP3 Conversion: Whisper AI currently accepts MP3 audio files as input. If your audio is in MP4 format, you’ll need to convert it to MP3 first. There are various tools available for this purpose. You can use FFmpeg for a reliable and versatile conversion process.

fmpeg -i 20230523_111106-Meeting\ Recording.mp4 20230523_111106-Meeting\ Recording.mp3

Transcribing MP4 to TXT with Whisper AI

Now, let’s walk through the steps to transcribe an MP4 file to text using Whisper AI. We’ll assume you already have your MP4 file converted to MP3.

Step 1: Clone the Whisper AI Docker Repository

First, clone the Whisper AI Docker repository to your local machine. Open a terminal and run the following command:

git clone https://github.com/hisano/openai-whisper-on-docker.git

Step 2: Navigate to the Repository

Change your current directory to the cloned repository:

cd openai-whisper-on-docker

Step 3: Build the Docker Image

Build the Docker image for Whisper AI with the following command:

docker image build --tag whisper:latest .

Step 4: Set Up Volume and File Name

Set the VOLUME_DIRECTORY to your current directory and specify the name of your MP3 file. In this example, we’ll use “hello.mp3”:

VOLUME_DIRECTORY=$(pwd)

FILE_NAME=hello.mp3

Step 5: Copy Your MP3 File

Copy your MP3 file (the one you want to transcribe) to the current directory.

cp ../20230503_094932-Meeting\ Recording.mp3 ./$FILE_NAME

Step 6: Transcribe the MP3 File

Finally, use the following command to transcribe the MP3 file to text using Whisper AI. In this example, we’re specifying the model as “small” and the language as “Spanish.” Adjust these parameters according to your needs:

docker container run --rm --volume ${VOLUME_DIRECTORY}:/data whisper --model small --language Spanish /data/$FILE_NAME

Once you execute this command, Whisper AI will process the audio file and provide you with the transcribed text output.

You’ll see transcription is outputted through stdout so consider piping the docker run to a file.

docker container run --rm --volume ${VOLUME_DIRECTORY}:/data whisper --model small --language Spanish /data/$FILE_NAME &> result.txt

You can monitor how it goes with:

tail -f result.txt

If you see a warning like:

/usr/local/lib/python3.9/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead

It will mean that you lack a CUDA setup so it will run using your CPU.

Also notice that here we’re using the small model, which is good enough but perhaps too slow with CPU usage. In my machine, it takes like 2.5 hours to transcribe 3 hours of audio.

Conclusion

Transcribing audio from MP4 to text has never been easier, thanks to Whisper AI and the power of Docker. With this guide, you can efficiently convert spoken content into written text, opening up a world of possibilities for content creation, research, and more. Experiment with different Whisper AI models and languages to tailor your transcription experience to your specific needs. Happy transcribing!

Note: I’ve written this blog post with the help of ChatGPT based on my own experiments with Whisper AI. I’m just too lazy to write something coherent in English. Sorry for that, I hope you liked it anyway.


Prompt: “Write a blog post whose title is HOWTO transcribe from mp4 to txt with Whisper AI. It should explain what Whisper AI is but also explain how to extract mp3 from mp4, and the following commands, ignore first column: 10054 git clone https://github.com/hisano/openai-whisper-on-docker.git 10055 cd openai-whisper-on-docker 10056 docker image build –tag whisper:latest . 10057 VOLUME_DIRECTORY=$(pwd) 10058 FILE_NAME=hello.mp3 10059 cp ../20230503_094932-Meeting\ Recording.mp3 ./hello.mp3 10060 docker container run –rm –volume ${VOLUME_DIRECTORY}:/data whisper –model small –language Spanish /data/hello.mp3” . After that, I added some extra useful information about performance.

HOWTO restart Cinnamon

Linux Cinnamon is a popular desktop environment used by many Linux users. While it is generally stable and reliable, like any software, it can sometimes fail or crash. When this happens, it can be frustrating for users who rely on Cinnamon to get their work done. In this blog post, we will explain why Cinnamon might fail and how to restart it when it does.

Why does Cinnamon fail?

There are several reasons why Cinnamon might fail or crash. Some common causes include:

  1. System updates: Sometimes, updates to the Linux system or other software can cause compatibility issues that result in Cinnamon failing.
  2. Hardware issues: If there is a problem with your computer’s hardware, such as a failing hard drive or faulty RAM, it can cause Cinnamon to crash.
  3. User error: Occasionally, a user may accidentally make changes to their system or Cinnamon configuration that cause it to fail.
  4. Bugs in Cinnamon: While Cinnamon is generally a stable and reliable desktop environment, it is not immune to bugs or other issues that can cause it to fail.

How to restart Cinnamon

If Cinnamon fails, the first step to take is to try restarting it. Here are the steps to follow:

  1. Press Ctrl + Alt + F2 on your keyboard. This will take you to a command line interface.
  2. Enter your username and password to log in.
  3. Type the following command to stop the Cinnamon process: pkill -HUP cinnamon
  4. Wait a few seconds, then type the following command to start Cinnamon again: cinnamon --replace &
  5. Press Ctrl + Alt + F7 on your keyboard to return to the Cinnamon desktop environment.

If Cinnamon does not restart using these steps, you may need to try restarting your computer or troubleshooting other potential issues.

In conclusion, while Linux Cinnamon is generally a stable and reliable desktop environment, it can fail or crash for various reasons. When this happens, it can be frustrating, but restarting Cinnamon can often resolve the issue. If you are unable to restart Cinnamon using the steps outlined in this post, you may need to seek additional support or troubleshooting resources.

Bonus track!

There is indeed a more straightforward way to restart Cinnamon. Here are the steps to follow:

  1. Press Alt + F2 on your keyboard. This will open the “Run Command” dialog.
  2. Type the letter “r” into the text field and press Enter. This will restart the Cinnamon process.
  3. Wait a few seconds for Cinnamon to restart. If everything has gone smoothly, you should be able to continue using Cinnamon as normal.

Using Alt + F2 and typing “r” to restart Cinnamon is a quick and easy way to get your desktop environment back up and running if it has failed or crashed. This method does not require logging in to the command line interface or typing any commands, making it more accessible for users who may not be familiar with the command line.

Resizing the Root Partition on an Ext4 File System: A Guide to Swapping Out Your Swap Partition

Have you ever run out of space on your root partition and wished you could make it bigger? Or maybe you had a separate swap partition that you wanted to get rid of? Well, fear not, my friend, because today we’re going to be diving into the world of resizing partitions and making the switch to using a swap file instead of a partition.

First of all, let’s talk about why this is possible. The ext4 file system, which is the default file system for most modern Linux distributions, allows for resizing and modifying the partition layout on the fly. This is thanks to the advanced features of ext4, such as its ability to handle online resizing and the use of an advanced journaling system.

Now that we’ve got the basics out of the way, let’s get down to business.

  1. Backup your data

Before you do anything, it’s essential to backup your data. You never know what might go wrong during the resizing process, so it’s always better to be safe than sorry. You can use tools like rsync or tar to backup your important files to another location.

  1. Disable swap

Before we begin resizing the root partition, we need to disable the swap partition. This is because the swap partition may be in use while we are trying to resize it. You might also need to remove it so that you can increase the boundaries of the resize you need. To disable swap, you can use the following command:

sudo swapoff -a

  1. Resize the root partition

Next, we need to resize the root partition. We can do this using the resize2fs tool. In this example, we will be increasing the size of the root partition to 20GB:

sudo resize2fs /dev/sda2 20G

Note that you’ll need to replace “/dev/sda2” with the name of your root partition.

  1. Create the swap file

Now that we’ve resized the root partition, it’s time to create the swap file. A swap file is a file on your file system that is used as virtual memory. To create the swap file, we will use the fallocate tool. In this example, we will be creating a 4GB swap file:

sudo fallocate -l 4G /swapfile

  1. Configure the swap file

Once the swap file has been created, we need to configure it as a swap space. To do this, we will use the mkswap tool:

sudo mkswap /swapfile

  1. Enable the swap file. Finally, we need to enable the swap file so that it can be used as virtual memory. To enable the swap file, use the following command:

sudo swapon /swapfile

  1. Update /etc/fstab

At this point, the swap file is fully configured and ready to use. However, we need to update /etc/fstab to enable the swap file on boot. To do this, add the following line to /etc/fstab:

/swapfile none swap sw 0 0

Also, make sure you remove the old swap partition line. Otherwise, the system will try to check it every time you book taking more time!

And that’s it! You’ve successfully resized your root partition and switched from a swap partition to a swap file. Your system should now boot faster since it no longer has to test the swap partition on each boot.

In conclusion, resizing partitions and switching from a swap

partition to a swap file is a simple and effective way to manage your disk space and optimize your system’s performance. With the ext4 file system, the process is straightforward and can be done without having to take your system offline. Whether you’re running out of space on your root partition or just looking to streamline your system, I hope this guide has helped you accomplish your goals.

As always, when working with system configurations and disk partitions, it’s important to proceed with caution and to backup your data before making any changes. If you follow the steps outlined in this guide, you should have no trouble successfully resizing your root partition and switching to a swap file.

So, grab your terminal and get ready to play around with partitions and swap files. Who knows, you might just discover a new love for system administration.

From Eliza to ChatGPT

When I was in college, I studied Eliza, one of the first natural language processing programs developed in the 1960s. Eliza was designed to simulate a psychotherapist and used a set of pre-defined rules and responses to generate replies to user input. At the time, Eliza was considered a significant advancement in the field of natural language processing, but it was limited in its abilities and could not provide detailed or accurate responses to complex questions.

Today, we have programs like ChatGPT, a large language model trained by OpenAI that uses the latest advancements in natural language processing to generate human-like responses to questions and prompts. ChatGPT was trained on a vast amount of text data from a variety of sources, which allows it to have a broad range of knowledge and the ability to provide detailed, accurate responses to a wide range of questions.

Here is a sample snippet of code for the Eliza program:

// Define a set of rules for generating responses
const rules = [
  {key: "i need", response: "Why do you need"},
  {key: "i want", response: "What would it mean to you if you got"},
  {key: "i feel", response: "Do you often feel"}
];

// Define a function for generating a response to user input
function generateResponse(input) {
  // Use the find() method to look for the first rule that matches the input
  const rule = rules.find(r => input.includes(r.key));

  // If a match is found, return the corresponding response
  if (rule) {
    return rule.response;
  }

  // If no rules match, return a default response
  return "I'm sorry, I don't understand what you're saying.";
}

If you want to get a full implementation of Eliza, you can visit the following link on GitHub: https://github.com/brandongmwong/elizabot-js. This repository contains the complete source code for Eliza written in JavaScript, along with detailed instructions on how to use and customize it. In addition, the repository includes a live demonstration of Eliza in action, allowing you to see how it works and how it compares to other artificial intelligence systems.

Compared to Eliza, ChatGPT is much more advanced and can provide more detailed and accurate responses to user input. While Eliza used pre-defined rules and answers to generate its replies, ChatGPT uses machine learning algorithms and a vast amount of training data to generate its responses. This allows ChatGPT to have a much broader range of knowledge and the ability to provide accurate answers to complex questions.

Overall, while Eliza was a significant advancement in its time, it is now limited compared to more advanced programs like ChatGPT. ChatGPT’s ability to generate detailed, accurate responses to a wide range of questions makes it a valuable tool in the field of natural language processing.

There are many books available that can help you understand ChatGPT and the underlying technology behind it. Some books that may be of interest include “Speech and Language Processing” by Daniel Jurafsky and James H. Martin, and “Natural Language Processing with Python” by Steven Bird These books provide in-depth information about natural language processing and how it is used in programs like ChatGPT and Eliza.

Additionally, the book “The Master Algorithm” by Pedro Domingos provides an overview of the field of machine learning and discusses how it relates to natural language processing and programs like ChatGPT. This book is a valuable resource for anyone interested in learning more about the technology behind ChatGPT and how it is used in the field of artificial intelligence.

Overall, these books provide a wealth of information about natural language processing and its applications, including ChatGPT and Eliza. They are valuable resources for anyone looking to learn more about these technologies and how they are used in the field of artificial intelligence.

There are many science fiction books that feature artificial intelligence or advanced natural language processing technology that is related to ChatGPT. Some books that you may be interested in include:

These books are all science fiction stories with advanced artificial intelligence or natural language processing technology. They may be of interest to readers who are interested in the capabilities and potential consequences of such technology.

This blog post has been 100% generated by ChatGPT.

In the future, bloggers may have to compete with tools like ChatGPT that can quickly and efficiently generate high-quality content. However, there are also opportunities for bloggers to differentiate themselves from AIs like ChatGPT. For example, bloggers who offer unique perspectives or have a distinct voice can stand out from the crowd and continue to be valuable to their audiences.

My AMD Hackintosh OpenCore triple boot in same disk notes

A few notes about the main points I learnt installing triple boot into my new PC:

  1. When picking the hardware components, search for success stories related to such components so that you make sure they’re compatible and someone has already prepared configuration you can work on instead of building the setup from zero. E.g Non APU Ryzen (without G) + Gigabyte X570 + Radeon RX580
  2. Be aware that if you want to use Hackintosh as your only OS, intel will be easier and better supported, e.g docker with hypervisor, Adobe suite… My idea is using Linux, leaving OSX option for Xcode and Windows10 for gaming and win-only software.
  3. OpenCore is currently the only option for AMD, do not lose time reading about clover. See this video as an intro, not enough to get into action but you’ll get a general idea: https://www.youtube.com/watch?v=l_QPLl81GrY
  4. You can lose data quite easily, e.g touching partitions, so make sure you backup if needed.
  5. Make sure you read this guide carefully, it’s more precise and updated than the video: https://dortania.github.io/OpenCore-Install-Guide/
  6. This guide is also quite interesting: https://github.com/alkalim/ryzen-catalina
  7. Once you’ve seen the video and read the guide you’ll be ready if you understand these topics: Boot USB, STDT, ACPI, KEXT, UEFI, config.plist, SMBIOS
  8. If you find someone who already succeeded with your same CPU + Motherboard (e.g lucky me!) it will be way more easier to setup, as you might avoid the pain of testing different kexts and configs) but you still need to make sure you understand what you’re doing (previous points). Otherwise your Mac install menu will appear in Russian and you’ll have to figure out why that happens and how to reset NVRAM.
  9. You need to installs OSs in this order: Windows, Linux, Mac (3 pendrives). Both Windows and Linux need to be running in UEFI mode, and once both are running like that, you’ll need to resize the UEFI partition to at least 200MB as it’s a Mac requirement. (EFI created by default by Windows is 100MB…)
  10. You also need a Gparted USB so that you can create the Mac partition with the free space that you left after installing Windows and Linux, you’ll use HPFS+ but in Mac install partitions tool you’ll need to enable journaling for it (File > Enable Journaling) and convert it to APFS. Otherwise it will complain about lack of “firmware partition” (UEFI) even though you had already prepared it.
  11. In the middle of the installation it will reboot without warning and restart going on the installation from the disk.
  12. If the latest Realtek kext does not work for you, e.g unable to configure NIC on installation, try with v2.2.2, it did the trick for me.
  13. Once successfully installed you typically need to do a few postinstall things:
    1. Just in case Windows update messes up with opencore boot loader make sure you install BootStrap.efi in BIOS. That way you’ll always have the “OpenCore” option in BIOS.
    2. You need to update the hard disk UEFI partition. If you prepare the USB BOOT MAC drive with gibmacos you might not have an EFI partition there, you just need to mount the EFI hard disk partition manually, delete its EFI folder and drop the one you have in the USB BOOT.
    3. If OpenCore is unable to detect Linux, make sure you installed it in UEFI mode, e.g in Linux mint picking the UEFI partition as boot partition.

Enjoy!

Other links:

https://github.com/ivmos/config/tree/master/ryzentosh

https://github.com/sergeycherepanov/homebrew-docker-virtualbox

Designing Data-Intensive Applications Book: Chapter 1 Summary

I start a series of blog posts with summaries about this interesting book: Designing Data-Intensive Applications

51gP9mXEqWL._SX379_BO1,204,203,200_.jpg

What is a data-intensive application?

It s an application where raw CPU power is rarely a limiting factor and the problems are the amount of data, the complexity of data, and the speed at which it changes. It is built from standard building blocks that provide commonly needed functionality.

In this chapter, we see the fundamentals of what we are trying to achieve.

Continue reading Designing Data-Intensive Applications Book: Chapter 1 Summary

Google Photos API, how to use it and why it will probably disappoint you

photos_96dp

 

Recently I needed to close a Google Apps account, and I tried to migrate albums programmatically.  I’ll document here the needed steps and explain why this Google API is useless for most of us:

First you need an app token, you can get it from Google Console on https://console.developers.google.com. There you need to register your project and associate API from the library.

You should now have both client_id and client_secret so you can fetch the code quite easily with a OAUTH2 flow:

#!/bin/sh

$client_id = "foo"
$client_secret= "bar"

scope="https://www.googleapis.com/auth/photoslibrary+https://www.googleapis.com/auth/photoslibrary.sharing"

url="https://accounts.google.com/o/oauth2/auth?client_id=$client_id&redirect_uri=urn:ietf:wg:oauth:2.0:oob&scope=$scope&response_type=code"

echo $url

If you open such output URL with a browser you’ll get the $code and with such code, you can just fetch the tokens.

$code = "lol"
curl --request POST --data "code=$code&client_id=$client_id&client_secret=$client_secret&redirect_uri=urn:ietf:wg:oauth:2.0:oob&grant_type=authorization_code" https://accounts.google.com/o/oauth2/token

With the refresh_code you can already do what you need, here you have an example kotlin script I worked on https://gist.github.com/ivmos/860b5db0ffeeeba8ffd33adebfaaa094

But finally, I just did it manually zooming out from web client. It happens that Google just offers consents that allow you to manipulate photos and albums created with your own app, so you can’t move around photos between albums created by the official too. This means you cannot organize your library automatically unless you just need to work with photos you would upload with your own app…

 

 

My 2019 Coursera courses

2019 was an awesome year for me, mainly because I became father 🤗 but I also found time to keep my learning habit 🤓, something very important after 15 years since my first job in the field. So I’d like to list and elaborate on the Coursera courses I did and why:

  • Conflict Resolution Skills (cert): a good introduction, something essential even if you’re in an individual contributor position but critical in management.
  • Kotlin for Java developers (cert): a great course in order to jump from Java to Kotlin. We’ve been increasingly using Kotlin at work (even for microservices!) so I found it was a good way to review the language in general.
  • Programming Languages, Part A (cert): getting into functional programming was something I wanted to do for a long time, I did some Haskell at uni but that was ages ago and I knew typical few things used in JavaScript or Kotlin but using a pure FP language is a very different thing.
  • Programming Languages, Part B (cert): Part A used SML, this other part used Racket which was a bit parenthesis nightmare at first but it turned to be very fun as I practiced implementing a little programming language something I hadn’t done since university,

If you have a recommendation of any online course for 2020 please leave a comment 🙂

HOWTO see Google Calendar events in yearly view

245px-google_calendar

It turns out I already have booked a few events for 2019 so I wanted to have a yearly view of everything I have. I was disappointed to see that current Google Calendar yearly events view is useless as It’s just empty. There are lots of comments about this issue in this Google productforums entry.

Captura de pantalla 2019-01-12 a las 13.07.50.png
:S

Captura de pantalla 2019-01-12 a las 13.15.33.png
Ron Irrelavent is absolutely right

So I did some search to look for solutions and I found these 2:

  • Google Calendar Plus extension
    • I haven’t even tried as I’m tired of Chrome extensions but seems to work.
  • Visual-Planner project
    • A bit ugly but it works and it’s open-source so this is what I’m using. You can use it without installing it here (you just need to OAUTH to your gmail account). Only drawback is that it does not display names for multi-day events, as a workaround you can create a single event for the first day e.g “Flight to London”

Captura de pantalla 2019-01-12 a las 13.13.09.png
This is something ¯\_(ツ)_/¯

Let me know if you have better alternatives.

I also hope Google implements this..  Hello Google PMs? 🙂

Thoughts about React Native after working with it

react-native-800x450.png

I faced the following challenge in January:

  • Porting a complex webapp to native Android and iOS. The web-app to be ported is written in ReactJs+Redux. Besides most of its business logic is in a pure ES5 javascript library.

So in this situation, React Native (“RN” from now on) seemed like the way to go as we wanted to have a working prototype in a month and it should be maintained in both Android and iOS without extra resources.

Continue reading Thoughts about React Native after working with it