pip install torch numpy transformers tiktoken
- pytorch
- numpy
transformers
for loadding huggingface transformers checkpointstiktoken
for OpenAI's BPE encoding
$ python data/prepare.py
This will create a folder containing all PDFs files scratched from Berkshire Hathaway website. And 4 txt files q1.txt
, q2.txt
, q3.txt
, q4.txt
, other.txt
and annualreport.txt
. Most importantly, train.bin
and val.bin
for fine tuning purpose.
Note that you can create your own data set as long as encoding it to the final train.bin and val.bin files.
Now we can simply fine tuning the model:
$ python train.py
We obtain the following losses on train and val:
| model | params | train loss | val loss | | gpt2 | 123.65M | 2.0619 | 2.0573 |
Sampling from the fine tuned model is very straight forward:
$ python sample.py
We get some new texts trying to iminate Warren Buffett:
To the Shareholders of Berkshire Hathaway Inc.:
Charlie Munger, my long-time partner, and I have
led or are currently managing a team that will manage and implement Berkshire Hathaway shares. As a result of our
commitment to manage and implement Berkshire Hathaway shares, we have acquired the extraordinary shares of common stock that we
currently own.
Berkshire currently owns approximately 88.5% of Berkshire common stock. Our Board of Directors
has designated certain of our Board members, with the remaining remaining three to be chosen at the end of each
year. The total number of members of the Board of Directors can be at an indefinite length determined by the Board, and our Board
may exercise any number of executive, legislative, or other powers that are of such duration, and we may elect to elect the first
member of this Board, so long as he or she remains active in the Board. The Board is also given the option to elect a number of additional members to the Board by
option or by vote of the Board and may elect members by the vote of the Board.
In addition to electing additional members, the Board may vote to approve or reject certain transactions, initiatives,
actions, or initiatives in the future.
See new.txt
for more examples.