This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently. |
||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| httplib.h | ||
| json.hpp | ||
| server.cpp | ||
| utils.hpp | ||