Lack of Implementation for Large Language Models (LLMs)

Question

Lack of Implementation for Large Language Models (LLMs)

joelonsql opened this issue 10 months ago · 14 comments

In the published paper on FunSearch, there is a mention of using pre-trained large language models (LLMs) like Codey (based on the PaLM2 model family) and a reference to StarCoder, an open-source LLM, in the supplementary information. However, the current GitHub repository for FunSearch does not include implementations or integration guidelines for these LLMs.

This issue is particularly evident in the sampler.py file, where the LLM class seems to be a placeholder without an actual implementation:

class LLM:
  """Language model that predicts continuation of provided source code."""

  def __init__(self, samples_per_prompt: int) -> None:
    self._samples_per_prompt = samples_per_prompt

  def _draw_sample(self, prompt: str) -> str:
    """Returns a predicted continuation of `prompt`."""
    raise NotImplementedError('Must provide a language model.')

Suggested Resolution:

It would be greatly beneficial for the community if the repository could include a basic implementation or integration guide for an open-source LLM, especially StarCoder, which was referenced in the paper.
Providing such an implementation or guide would enhance the reproducibility and usability of the FunSearch project for researchers and developers looking to explore or build upon this work.

Looking forward to any updates or guidance on this matter.

Answer 1 · 2024-01-11T10:45:47.000Z

I made somewhat simple implementation of the main missing components: https://github.com/jonppe/funsearch

There's a 'Sandbox' implementation using a container with Podman or Docker
LLMs can be accessed using the https://llm.datasette.io/en/stable/ package that allows quite easy access to almost any model

It "seems" work, i.e., it is able to find some algorithms for the cap set problem (although I've only tested dimensions like ~10 ).

Asyncio would be nice addition to make it faster. Also I think more diagnostics tools would be quite critical (e.g. the tree view on the code evolution). I think that should allow better prompt engineering.

I could make a PR to this repo but I'm not sure if the authors intend to maintain the repository.
Besides, I made rather major changes while creating a Python package that can be installed.

Answer 2 · 2024-01-19T17:52:33.000Z

Interesting work! I would like to try your code out on my windows PC, but I have no experience with Docker, which is necessary if I understood you well. Could I do this in Google Colab? I use the free version of Colab.

Answer 3 · 2024-01-19T18:43:58.000Z

Interesting work! I would like to try your code out on my windows PC, but I have no experience with Docker, which is necessary if I understood you well. Could I do this in Google Colab? I use the free version of Colab.

This seems to work in Colab:


!pip install git+https://github.com/jonppe/funsearch.git

# examples are not installed. Download one example
!wget https://github.com/jonppe/funsearch/raw/main/examples/cap_set_spec.py

# Set OPEN_API_KEY or use some other model
import os
os.environ["OPENAI_API_KEY"] ="sk-..."

# Run funsearch to search cap sets for dimension 8
# Use ExternalProcessSandbox since it's quite safe since were in colab
!funsearch run cap_set_spec.py 8 --sandbox_type ExternalProcessSandbox

If the command line interface doesn't provide all features needed, you could copy the contents of the run() method and modify needed parts: https://github.com/jonppe/funsearch/blob/main/funsearch/__main__.py#L67

Answer 4 · 2024-01-19T19:42:21.000Z

Interesting work! I would like to try your code out on my windows PC, but I have no experience with Docker, which is necessary if I understood you well. Could I do this in Google Colab? I use the free version of Colab.

This seems to work in Colab:
!pip install git+https://github.com/jonppe/funsearch.git

# examples are not installed. Download one example
!wget https://github.com/jonppe/funsearch/raw/main/examples/cap_set_spec.py

# Set OPEN_API_KEY or use some other model
import os
os.environ["OPENAI_API_KEY"] ="sk-..."

# Run funsearch to search cap sets for dimension 8
# Use ExternalProcessSandbox since it's quite safe since were in colab
!funsearch run cap_set_spec.py 8 --sandbox_type ExternalProcessSandbox
If the command line interface doesn't provide all features needed, you could copy the contents of the run() method and modify needed parts: https://github.com/jonppe/funsearch/blob/main/funsearch/__main__.py#L67

Thanks for the quick response!

Now it does seem to get working in COLAB.

When I run this:
!funsearch run /content/funsearch/examples/cap_set_spec.py 8 --sandbox_type ExternalProcessSandbox

My output is:

INFO:root:Writing logs to data/1705692678
INFO:httpx:HTTP Request: GET https://gpt4all.io/models/models2.json "HTTP/1.1 301 Moved Permanently"
INFO:httpx:HTTP Request: GET https://raw.githubusercontent.com/nomic-ai/gpt4all/main/gpt4all-chat/metadata/models2.json "HTTP/1.1 200 OK"
INFO:absl:Best score of island 0 increased to 256
INFO:absl:Best score of island 1 increased to 256
INFO:absl:Best score of island 2 increased to 256
INFO:absl:Best score of island 3 increased to 256
INFO:absl:Best score of island 4 increased to 256
INFO:absl:Best score of island 5 increased to 256
INFO:absl:Best score of island 6 increased to 256
INFO:absl:Best score of island 7 increased to 256
INFO:absl:Best score of island 8 increased to 256
INFO:absl:Best score of island 9 increased to 256
INFO:openai:error_code=invalid_api_key error_message='Incorrect API key provided: "". You can find your API key at https://platform.openai.com/account/api-keys.' error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False
Traceback (most recent call last):
File "/usr/local/bin/funsearch", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/funsearch/main.py", line 132, in run
core.run(samplers, database, iterations)
File "/usr/local/lib/python3.10/dist-packages/funsearch/core.py", line 44, in run
s.sample()
File "/usr/local/lib/python3.10/dist-packages/funsearch/sampler.py", line 70, in sample
samples = self._llm.draw_samples(prompt.code)
File "/usr/local/lib/python3.10/dist-packages/funsearch/sampler.py", line 44, in draw_samples
return [self._draw_sample(prompt) for _ in range(self._samples_per_prompt)]
File "/usr/local/lib/python3.10/dist-packages/funsearch/sampler.py", line 44, in
return [self._draw_sample(prompt) for _ in range(self._samples_per_prompt)]
File "/usr/local/lib/python3.10/dist-packages/funsearch/sampler.py", line 38, in _draw_sample
self._log(prompt, response, self.prompt_count)
File "/usr/local/lib/python3.10/dist-packages/funsearch/sampler.py", line 51, in _log
f.write(str(response))
File "/usr/local/lib/python3.10/dist-packages/llm/models.py", line 109, in str
return self.text()
File "/usr/local/lib/python3.10/dist-packages/llm/models.py", line 112, in text
self._force()
File "/usr/local/lib/python3.10/dist-packages/llm/models.py", line 106, in _force
list(self)
File "/usr/local/lib/python3.10/dist-packages/llm/models.py", line 91, in iter
for chunk in self.model.execute(
File "/usr/local/lib/python3.10/dist-packages/llm/default_plugins/openai_models.py", line 356, in execute
completion = openai.Completion.create(
File "/usr/local/lib/python3.10/dist-packages/openai/api_resources/completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/openai/api_resources/abstract/engine_api_resource.py", line 155, in create
response, _, api_key = requestor.request(
File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 299, in request
resp, got_stream = self._interpret_response(result, stream)
File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 710, in _interpret_response
self._interpret_response_line(
File "/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py", line 775, in _interpret_response_line
raise self.handle_error_response(
openai.error.AuthenticationError: Incorrect API key provided: "". You can find your API key at https://platform.openai.com/account/api-keys.

I am testing with an open source LLM, namely:
orca-mini-3b-gguf2-q4_0

In /content/funsearch/build/lib/funsearch/main.py
I changed the following, so that funsearch uses this open source LLM orca-mini-3b-gguf2-q4_0 :
@click.option('--model_name', default="gpt-3.5-turbo-instruct", help='LLM model') # ORIGINAL
Changed this to:
@click.option('--model_name', default="orca-mini-3b-gguf2-q4_0", help='LLM model') # CHANGED VERSION

I changed the API key to an empty string "" using:
!llm keys set openai

To test if the open source LLM is working in COLAB, I did this:
!llm chat -m orca-mini-3b-gguf2-q4_0

Which works fine:
Chatting with orca-mini-3b-gguf2-q4_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish

What is the difference between fision and fusion?
The difference between fusion and fission lies in the way they split atoms. Fusion involves the joining of atomic nuclei to form new particles, such as helium or tritium, while fission involves splitting an atom into two smaller ones, typically uranium or plutonium. Fusion is a more complex process that requires very high temperatures and pressures, whereas fission can be achieved with lower energy inputs.
exit

I was also trying to use podman on COLAB, but that does not seem to work (maybe deliberately from COLABs rules ?):
/content/funsearch/funsearch.egg-info/PKG-INFO
You can run FunSearch in container using Podman or Docker
!podman build . -t funsearch

STEP 1/10: FROM docker.io/python:3.11.6
Trying to pull docker.io/library/python:3.11.6...
Getting image source signatures
…
…
…
Writing manifest to image destination
Storing signatures
STEP 2/10: WORKDIR /workspace
--> 3b3a70f43d1
STEP 3/10: RUN pip install pdm
error running container: error from /usr/bin/crun creating container for [/bin/sh -c pip install pdm]: create keyring buildah-buildah1918306187: Operation not permitted
: exit status 1
Error: error building at STEP "RUN pip install pdm": error while running runtime: exit status 1

Can you help me out on this?

Answer 5 · 2024-01-19T21:23:31.000Z

Interesting work! I would like to try your code out on my windows PC, but I have no experience with Docker, which is necessary if I understood you well. Could I do this in Google Colab? I use the free version of Colab.

This seems to work in Colab:
!pip install git+https://github.com/jonppe/funsearch.git

# examples are not installed. Download one example
!wget https://github.com/jonppe/funsearch/raw/main/examples/cap_set_spec.py

# Set OPEN_API_KEY or use some other model
import os
os.environ["OPENAI_API_KEY"] ="sk-..."

# Run funsearch to search cap sets for dimension 8
# Use ExternalProcessSandbox since it's quite safe since were in colab
!funsearch run cap_set_spec.py 8 --sandbox_type ExternalProcessSandbox
If the command line interface doesn't provide all features needed, you could copy the contents of the run() method and modify needed parts: https://github.com/jonppe/funsearch/blob/main/funsearch/__main__.py#L67

It seems that I have it working technically at least now!
Of course the open source LLM orca-mini-3b-gguf2-q4_0 will not be the best to do the job, but it remains to be seen what kind of solutions it generates.
Also I have it running on COLAB CPU; I will try T4 GPU later this weekend (I will not disturb the current running session .... :-) !

Thanks for your support !

Answer 6 · 2024-01-22T01:37:43.000Z

Hello,

I made somewhat simple implementation of the main missing components: https://github.com/jonppe/funsearch
* There's a 'Sandbox' implementation using a container with Podman or Docker

Can you add issues to your repository?
for example if we "run the main Python process on a host computer outside of any container and let the process build and run separate sandbox containers (still requires Podman/Docker)."
the error is now in the sandbox:

/usr/local/bin/python3: can't open file '/main.py': [Errno 13] Permission denied

any help would be appreciated.

Answer 7 · 2024-01-23T09:16:41.000Z

Thank you for your very helpful code additions, @jonppe!

I have tried to implement querying the original Codey LLM over a Google API in your code, and upon some smaller code tweaks, I have been able to retrieve responses from Codey. For example, upon the first iteration, the LLM answers with the response printed at the bottom.
As you can see, the response from Codey is of type MultiCandidateTextGenerationResponse, and besides from offering a first new priority function, it comes with all sorts of supplementary information. My main issue now is to understand how to present this response to the function _trim_function_body in the evaluator.py file. I have already tried to avoid this step by applying some regular expression corrections, but so far the function of interest is never correctly extracted. As a consequence, all new prompts remain unchanged (i.e., trivial).

Would you mind sharing the response that your LLM returns with me? That would help me to identify which part of the MultiCandidateTextGenerationResponse to discard and which to keep.

Thank you very much for your efforts!
Tim

Codey Response upon trivial prompt:

MultiCandidateTextGenerationResponse(text='python\ndef priority_v1(el: tuple[int, ...], n: int) -> float:\n """Improved version of `priority_v0`.\n\n This version takes into account the number of 1s in the tuple.\n """\n return el.count(1) / n\n\n', _prediction_response=Prediction(predictions=[{'content': 'python\ndef priority_v1(el: tuple[int, ...], n: int) -> float:\n """Improved version of `priority_v0`.\n\n This version takes into account the number of 1s in the tuple.\n """\n return el.count(1) / n\n\n', 'score': -9.306792259216309, 'citationMetadata': {'citations': []}, 'safetyAttributes': {'blocked': False, 'scores': [], 'categories': []}}], deployed_model_id='', model_version_id='', model_resource_name='', explanations=None), is_blocked=False, errors=(), safety_attributes={}, grounding_metadata=GroundingMetadata(citations=[], search_queries=[]), candidates=[```python
def priority_v1(el: tuple[int, ...], n: int) -> float:
"""Improved version of priority_v0.

This version takes into account the number of 1s in the tuple.
"""
return el.count(1) / n

Answer 8 · 2024-01-23T12:36:45.000Z

@timneumann1
The response text should not contain the "def priority_v1(..." part. It should start straight with implementation, possibly containing the method docstring.

By Looking at your description of MultiCandidateTextGenerationResponse, it looks like there's no ready field for that so I guess you need to drop that out yourself or alternatively, modify the rest of the _trim_function_body() so that it simply uses ast.parse() for the generated response without any extra.

Answer 9 · 2024-01-24T03:59:28.000Z

Thank you very much @jonppe. Upon making the suggested changes and hardcoding some edge cases, I got the code to run using the Codey API. Interestingly, in every dimension I have tested so far, in the very first few iterations a program is found that improves the trivial capset size quite significantly, but then no further improvement is made in the next couple of hundred iterations. I wonder if this is some sort of "local minimum", or if the number of iterations is simply way too small. It would be interesting to know how many programs the authors of the paper needed for the improvement of the bound in dimension 8!

Answer 10 · 2024-03-06T04:58:46.000Z

@timneumann1

Thank you very much @jonppe. Upon making the suggested changes and hardcoding some edge cases, I got the code to run using the Codey API.

Can you share some data on the cost of using Codey? the estimations of the paper are around 1400 dollars per experiment...

Answer 11 · 2024-03-06T20:41:42.000Z

Using the Google Vertex AI API costs me around 1$ per 1000 API calls to the code-bison model.

Answer 12 · 2024-07-17T14:08:12.000Z

Hello all, I wanted to share a working implementation including a documentation of how to use it. Based on @jonppe's architectural work, I ran FunSearch on a math problem similar to the capset problem (the problem of finding maximal SIdon sets in affine space) using the code and procedure detailed at https://github.com/timneumann1/FunSearch-Sidon.
There, you can also find the capset spec file as described in the read.me file. Please let me know if you have any questions, and I hope this helps to get some basic results.