Combo819/social-media-archiver

Could you implement an example of an archive website such as headbook or Weibo?

Opened this issue · 2 comments

GoYiz commented

Could you implement an example of an archive website such as headbook or Weibo?
I encountered some problems in the implementation process. For example, during the implementation of Weibo API, the Save option prompts backup processing, and the post content is not displayed after refresh.
Here are some code paths and contents I changed.

  1. packages\backend\src\Utility\parsePostId\parsePostId.ts
import { getUrlLastSegment } from '../../Utility/urlParse/getLastSegment';
.......
.......
export async function parsePostId(urlStr: string): Promise<string> {
    return getUrlLastSegment(urlStr);
  1. packages/backend/src/Components/Post/Service/postApi.ts
function getPostApi(postId: string): AxiosPromise {
    return crawlerAxios.get(`https://weibo.com/ajax/statuses/longtext?id=${postId}`);
  1. packages/backend/src/Components/Post/Service/postCrawler.ts
  private transformData(res: any): {
    repostingId: string;
    postInfo: IPost;
    userRaw: unknown;
    embedImages: string[];
  } {
    /* extract the information here. If it's a html document, 
    try to manipulate the html with cheerio */
    return res.postInfo;

Running platform : Gitpod

Hi @GoYiz , This is the instance of headbook https://github.com/Combo819/headbook-archiver

The transformData is obviously incomplete. It should return an object of this type

{
    repostingId: string;
    postInfo: IPost;
    userRaw: unknown;
    embedImages: string[];
  }

but you just return res.postInfo.
You should make some transformation of the res object, and convert it into the type IPost. You can check the example here https://github.com/Combo819/headbook-archiver/blob/master/packages/backend/src/Components/Post/Service/postCrawler.ts#L162
The res from Weibo may have very different structure. So you need to write your own transformation based on the res structure.
BTW, if you're trying to implement a weibo instance, m.weibo.cn is easier than weibo.com.

You can also assign https://weibo.com to BASE_URL in https://github.com/Combo819/headbook-archiver/blob/master/packages/backend/src/Config/constants.ts#L1
So you can write the API function like

 return crawlerAxios.get(`/ajax/statuses/longtext?id=${postId}`);