mitodl/open-discussions

RFC: New Open API LearningResource endpoint

Closed this issue · 8 comments

This document discusses API routes in MIT Open and some implementation details around modifying Django models to align with those routes.

Routes

/api/v0/learning-resources
For the new Open Course Catalog API, I propose, we pull the basics up into the base LearningResource Model. We should use this as the basis for the api endpoint that encompasses all of our learning resources, preferably at /api/v0/learning-resources/.
We should pull the generic fields listed below up to the base model, and allow for null in those where a value does not make sense. Those models that come from AbstractList - namely Lists and Programs - would need to now inherit from it in some way as well at the top level.

LearningResource Object:

  • UUID
  • OfferedBy
  • Topics
  • Title
  • ShortDescription
  • Type
  • Platform
  • Department
  • Professional (property)
  • WithCertificate
  • Audience
  • Price
  • Availability
  • image_src
  • image_description
  • is_favorite
  • published
  • url - the location of the learning resource
  • LearningResourceUri - the URI of the objects's detail record (i.e. /api/v0/courses/[id]) (property)
  • Some sort of generic date field somehow? - point for discussion
    • Is there some way for us to make a generic date field across all learning resources - identify a date for each (next course run, last_modified, etc) and store in the LearningResource as a PublishDate or something to that effect - or is there a universal publish date? Basically - if I have a list of videos, courses, and lists - how do I order those by date? How do I filter those by current?
  • micromasters?
    • Also, not sure where we landed on how to handle micromasters - are we adding it as a boolean field that's filterable or as a type?

learning-resource filters:

  • OfferedBy - The MIT organization offering the resource.
  • Topics - a list (__in filter) of topic objects
  • Title/ShortDescription (Could be helpful for those looking to use the API as a simple search function?) - str
  • Type
  • Platform - The platform that hosts the resource i.e. youtube for a video, edx.org for a course hosted on edx.org
  • Department - The MIT department for the resource. Not available for professional offerings.
  • Professional - Bool value - calculated
  • WithCertificate - Bool value - stored
  • Audience - stored str
  • Price - Free/Paid
  • Availability - Choice field (upcoming/current/prior)
  • Some sort of generic date field somehow? - point for discussion

Sorts that require thought:

  • Start_date - for videos/podcasts this is “last_modified” - should we come up with a generic date field (see above)?
  • Upcoming (only future, starting now, +) as a sort?

Other Routes:

For the other routes, each resource type should get its own structure/serializer, which will include the same fields as the generic model, be at the URI that is noted in the generic model, and will include other model-specific fields such as rss-url for podcasts.
/api/v0/courses
/api/v0/programs
/api/v0/podcasts

  • episode_count
  • apple_podcasts_url
  • google_podcasts_url
  • Rss_url
  • Searchable
  • full_description

/api/v0/videos

  • Duration
  • Transcript
  • Raw_data
  • Last_modified
  • full_description

/api/v0/userlists
/api/v0/micromasters? - pretty sure this is program with micromasters=true which means we need micromasters as a field on learningresource

Permissions

Everything other than lists presently is read-only list and detail routes only. I think this should be the case for now - and we should likely leave the auth for lists as is.

Documentation

Using DRF-spectacular, it would be helpful to start up a redoc or swagger instance (we could technically throw up both - I found redoc was a nicer "out of the box" experience in the past, but either one is fine). The lift isn't super large to add bits that will become docs and there's no real add needed from an infrastructure side - it is easy and helpful to toss up just to have a "look" into how the api is structured.

This came up in the discussion, and I didn't include it here, but there shoud be a UUID field - at the learning resource level (even if this doesn't become a concrete model or index, I mean that the UUID is unique across all models).

* LearningResourceUri - the URI of the objects's detail record (i.e. `/api/v0/courses/[id]`) (property)

DRF can provide this behavior with https://www.django-rest-framework.org/api-guide/relations/#hyperlinkedrelatedfield

So I'd probably have it return something like:

{
  "resource_type": "course",
  "resource": "http://open.mit.edu/api/v0/courses/1"
}

Additionally, if we wanted to make this API a bit more flexible and permit any consumers of it to get data more efficiently, we could use that, plus drf-flex-fields to allow inline expansion of that resource. That would avoid N+1 querying against our APIs for something we can do with a few additional db queries.

* Some sort of generic date field somehow? - point for discussion
  
  * Is there some way for us to make a generic date field across all learning resources - identify a date for each (next course run, last_modified, etc) and store in the LearningResource as a PublishDate or something to that effect - or is there a universal publish date? Basically - if I have a list of videos, courses, and lists - how do I order those by date? How do I filter those by current?

I think possibly the way to do this is to turn published into a nullable datetime field and utilize that. None in the db would mean not published. published=False records are purged from search anyway.

* micromasters?
  
  * Also, not sure where we landed on how to handle micromasters - are we adding it as a boolean field that's filterable or as a type?

We shouldn't have anything product-specific on LearningResource.

/api/v0/micromasters? - pretty sure this is program with micromasters=true which means we need micromasters as a field on learningresource

Is this a real API? I can't find it in our code. At any rate, I don't think there should be anything micromasters specific on LearningResource and this API probably shouldn't exist either.

Using DRF-spectacular, it would be helpful to start up a redoc or swagger instance (we could technically throw up both - I found redoc was a nicer "out of the box" experience in the past, but either one is fine). The lift isn't super large to add bits that will become docs and there's no real add needed from an infrastructure side - it is easy and helpful to toss up just to have a "look" into how the api is structured.

Looks like drf-spectacular uses OpenAPI, which was one thing I was going to mention. DRF is migrating to OpenAPI and it's the go-to standard these days. OpenAPI is also great because it makes code generation possible with standardized tools.

  • Some sort of generic date field somehow? - point for discussion
    I think possibly the way to do this is to turn published into a nullable datetime field and utilize that. None in the db would mean not published. published=False records are purged from search anyway.

I like this - and actually makes sense - thank you!

* micromasters?

Is this a real API? I can't find it in our code. At any rate, I don't think there should be anything micromasters specific on LearningResource and this API probably shouldn't exist either.

So, all of this goes back to Ferdi looking to differentiate Micromasters from other programs - we'd discussed both having it as a boolean field or a separate content type (hence the separate route here, which is conceptual and doesn't exist) - I don't have enough background info I think to FULLY understand how else we can properly denote this, but what I DID take away is that platform isn't correct.

Looks like drf-spectacular uses OpenAPI, which was one thing I was going to mention. DRF is migrating to OpenAPI and it's the go-to standard these days. OpenAPI is also great because it makes code generation possible with standardized tools.

Ah yes, I should have said SwaggerUI vs Redoc - OpenAPI is a bit of a given IMO, was more just advocating for the "prettier" UI option

Some sort of generic date field somehow? - point for discussion
Is there some way for us to make a generic date field across all learning resources - identify a date for each (next course run, last_modified, etc) and store in the LearningResource as a PublishDate or something to that effect - or is there a universal publish date? Basically - if I have a list of videos, courses, and lists - how do I order those by date? How do I filter those by current?

We already have created_on to identify new items (ie new to our site, regardless of when the actual resource was created). And the concept of upcoming date only makes sense for courses and programs.

I think micromasters also only makes sense in the context of courses and programs, so we can just add the boolean to courses and programs and the filter to the course and program apis

/api/v0/learning-resources should also return different data for different resources

If this is meant to be a public API, I feel like it might be confusing to end users if the response includes lots of fields that are always going to have null values for some resource types but not others. Though perhaps we could include an explanation of which fields are relevant to which resource types in the API documentation.

Another option, as Anastasia mentioned:

/api/v0/learning-resources should also return different data for different resources

One way of doing this might be via something like in the existing GenericForeignKeyFieldSerializer where the appropriate class-specific serializer is used for each object.

@abeglova - for created_on - my worry here is - if someone wants to make a page of all physics department videos, courses, and stuff - there's no way to order by date without some sort of date that can relate to the timeliness of material - so we can get more recent podcast episodes bubbled up in those results - my problem is, for created_on - it's a bit of a stale date. You could have an old course that's uploaded somewhere that now shows out of order, a new podcast is added so all of its episodes just show as a big chunk. It doesn't give us as much wiggle room as using, say, published which could allow for the date to be reset not just on creation, but by some other logic as well? I'm not actually sure of the right answer, I'm just trying to find a way to allow for ordering by date where we don't completely lose half of our materials because they DO all have dates on when they're available, it's just a matter of how we display/parse said data.

As for the structure - if we'd rather not, we can pull out some of the fields, but I'm going off of the ones we discussed the other day and I'm not sure how many of them are ACTUALLY going to be nulled for this? Namely like - price - which if we don't WANT a top level filter for it, we can pull it back, I'm really just responding to what I'm hearing from different folks. I can try to reorganize it a bit