/notebooklm-detector

Detect whether or not an audio file was generated by NotebookLM

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

NotebookLM Detector

A simple tool to detect whether an audio file was generated by NotebookLM.

At Listen Notes, we've encountered a growing number of spammers submitting fake, NotebookLM-generated podcasts to our platform. Check out this list of fake podcasts generated by Notebook LM.

We hoped the NotebookLM team would provide a tool to help detect NotebookLM-generated audio. However, after a week of back-and-forth emails, we lost patience.

It's now Friday (Oct 4, 2024), and since we won't hear back from the NotebookLM team until next week, we decided to put together this simple script. Luckily, it seems to work!

Update:

  • October 9, 2024: After further emails with the NotebookLM team, it’s become evident that they are unable to provide tools or guidance to curb the spread of spammy, fake podcasts generated by NotebookLM. This is understandable, as they are typical 9-to-5 Google employees who enjoy a healthy work-life balance. NotebookLM remains an experimental project, and if it fails, the team members can easily transition to another project or team within Google, continuing their careers without significant disruption. There's little incentive for them to address issues that don't directly impact their performance reviews. Unfortunately, this leaves the podcasting industry vulnerable, but it's not a pressing concern for a handful of Googlers.
  • Notebook LM: A threat to the Podcasting World

Detection

Install Dependencies

$ pip install -r requirements.txt

Run the Detection Script

To detect whether an audio file is AI-generated or human-produced, run the following command:

$ python notebooklm_detector.py --action predict --file_path [filename].mp3

You’ll see output like this:

$ The audio is: AI Generated

or

$ The audio is: Human

Training the Model

You can train the model and regenerate model.pkl by following these steps:

Step 1: Organize the Dataset

  • Place NotebookLM-generated audio files (mp3, wav, or mp4) in the datasets/ai/ folder.
  • Place human-produced audio files in the datasets/human/ folder.

Step 2: Run the Training Script

To train the model, run:

$ python notebooklm_detector.py --action train --dataset_path datasets