/diverse_reading

Primary LanguageJupyter Notebook

diverse_reading

In progress development of an application to suggest books by international and diverse authors with similar themes to classic reading list books.

progress

  1. Scraping book lists from Goodreads:

minority and international lists:

  • 12716.Speculative_Fiction_by_Authors_of_Color
  • 116887.2018_Books_by_Authors_of_Color_Native_Authors
  • 113712.Book_Riot_s_100_Must_Read_Classics_by_People_of_Color
  • 96119._ReadPOC_List_of_Books_by_Authors_of_Color
  • 83339.Women_s_Fiction_by_Strong_Women
  • 22135.Around_the_World_One_Book_from_Each_Country
  • 4283.Around_the_World_in_100_Books
  • 5534.Non_American_books_that_every_American_should_read
  • 90194.Women_in_Translation
  • 71912.Africa_s_100_Best_Books_of_the_20th_Century
  • 73176.African_Writers_Series
  • 95678.Multicultural_Female_Authors
  • 31853.Iranian_Fiction
  • These are saved in files with names diverse_N_01_25.json where N = 0:12, respectively

    young adult minority and international:

  • 104750.Young_Adult_books_with_chronically_ill_physically_or_mentally_disabled_protagonists
  • 84609.South_Asians_in_Contemporary_YA
  • 100873.Anticipated_Diverse_2016_2017_YA_Books
  • 94201.MG_YA_Speculative_Fiction_by_Authors_of_Color
  • 97984.South_Africa_in_YA_Middle_Grade_Fiction
  • 92685.International_YA_Books
  • 104480.Internationally_Minded_YA_Books
  • These are saved in files with names diverse_ya_N_01_25.json where N = 0:6

    likely assigned books lists:

  • 3751.A_Journey_Through_Literary_America
  • 1126.John_Steinbeck
  • 9370.Best_of_Hemingway_
  • 5465.Best_of_Mark_Twain
  • 21652.Best_of_William_Faulkner
  • 74999.Historical_Novels_of_Early_America
  • 36785.WWII_Historic_Fiction
  • 68.Best_European_Literature
  • 10785.The_French_Revolution
  • 3990.Greatest_Eastern_European_Classics
  • 1339.Best_British_and_Irish_Literature
  • 1077.Modern_British_Novels
  • 2457.Best_Books_Of_The_Decade_1880s
  • 4509.Oh_Canada_
  • 2458.Best_Books_Of_The_Decade_1860s
  • 5464.Best_of_Charles_Dickens
  • These are saved in files with names western_N_01_25.json where N = 0:15

    likely assigned young adult:

  • 18678.Best_UKYA_Books
  • 7170.Young_Adult_fiction_by_UK_authors
  • These are saved in files with names western_ya_N_01_25.json where N = 0:1

    1. Data processing
    • process_data.ipynb contains a first look at the data from one of the lists. The descriptions have a problem where some lines are duplicated. I'm not sure if it makes sense to try to chop these off or just use unique words, or...?