-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Add alpaca chat template #7383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alpaca chat template #7383
Conversation
I'm also not sure if the Python script above should do something to deal with default system messages that end with a newline, eg:
Doesn't look correct to me for
Perhaps the Python script should detect this and move the newline from the default system message into the template itself? |
Actually I think the safest way to ensure this is robust against refactoring the order of tests is: } else if (tmpl == "alpaca" || (tmpl.find("### Instruction:") != std::string::npos && tmpl.find("<|EOT|>") == std::string::npos)) { So have changed it to that. Feel free to change this if you can think of a better test or if just |
I just searched some of the closed PRs and found somebody had almost added the 'alpaca' template before:
This isn't true as the (now deleted off hugginface) Also, as I mentioned above; people are often using this for the Mistral models that use the "[INST]" template to improve their creative writing ability, etc. |
There's something not right about this as |
It looks like there is an extra space getting added between the
|
So I've completely redone it all using an edited version of the
You can see in the follow up instruction that it doesn't add a space around the BUT:
The code is almost the same:
as is the test template:
and both have:
I just can't for the life of me see why one inserts a space after its Not much else I can do and it's clear this PR won't work as |
Re-quantizing |
Nope the space is still here:
|
Nope it's completely useless with that extra space. I've even tried the "official" phind-codellama template using "### System Prompt", "### User Message" and "### Assistant", and it is just completely lobotomised: USERCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values ASSISTANTSure, I can provide you with a simple C++ program that uses GSL (GNU Scientific Library) to perform logistic regression. Please note that you have to install GSL library on your system to run this program. #include <iostream>
#include <gsl/gsl_multifit.h>
void logistic_regression(const gsl_vector *x, void *params, gsl_vector *y) {
double alpha = gsl_vector_get(x, 0);
double beta = gsl_vector_get(x, 1);
for (size_t i = 0; i < y->size; i++) {
double xi = gsl_vector_get(params, i);
double val = 1.0 / (1.0 + exp(-(alpha * xi + beta)));
gsl_vector_set(y, i, val);
}
}
int main() {
const size_t n = 5;
const size_t p = 2;
gsl_vector *x = gsl_vector_alloc(n);
gsl_vector *y = gsl_vector_alloc(n);
// Fill x and y with hardcoded values
for (size_t i = 0; i < n; ++i) {
gsl_vector_set(x, i, i);
gsl_vector_set(y, i, 1.0 / (1.0 + exp(-(3.0 * i + 2))));
}
gsl_multifit_linear_workspace *work = gsl_multifit_linear_alloc(n, p);
gsl_vector *c = gsl_vector_alloc(p);
gsl_matrix *cov = gsl_matrix_alloc(p, p);
gsl_multifit_linear(x, y, c, cov, &logistic_regression);
std::cout << "alpha = " << gsl_vector_get(c, 0) << ", beta = " << gsl_vector_get(c, 1) << std::endl;
gsl_multifit_linear_free(work);
gsl_vector_free(x);
gsl_vector_free(y);
gsl_vector_free(c);
gsl_matrix_free(cov);
return 0;
} This program calculates the logistic regression for a set of hardcoded values. The Please note that this is a very basic example and doesn't include any error checking or handling. For real-world applications, you should add proper error handling code. USERCan you implement the ASSISTANTSure, I can help with that. Here's a Python implementation of the import numpy as np
from scipy import special
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def logistic_regression(X, y, beta):
"""
Compute the logistic regression function for given X and parameters beta.
Args:
X : array_like
The input array of shape (n_samples, n_features)
y : array_like
The output array of shape (n_samples,)
beta : array_like
The parameters of the model of shape (n_features,)
Returns:
f : float
The value of the logistic regression function at X with parameters beta.
"""
return np.sum(np.log(sigmoid(np.dot(X, beta))))
def logistic_regression_gradient(X, y, beta):
"""
Compute the gradient of the logistic regression function for given X and parameters beta.
Args:
X : array_like
The input array of shape (n_samples, n_features)
y : array_like
The output array of shape (n_samples,)
beta : array_like
The parameters of the model of shape (n_features,)
Returns:
df : ndarray
The gradient of the logistic regression function at X with parameters beta.
"""
z = np.dot(X, beta)
s = sigmoid(z)
return np.dot(X.T, (s - y)) / y.size
# Testing
from sklearn import datasets
data = datasets.load_breast_cancer()
X = data.data
y = data.target
beta = np.random.randn(X.shape[1])
f = logistic_regression(X, y, beta)
df = logistic_regression_gradient(X, y, beta)
print("f:", f)
print("df:", df) This code defines the Please note that this code assumes that y values are either 0 or 1, as it's a binary classification problem. If you have more than two classes, you need to modify the code accordingly. USERNo, I mean add them to the C++ code you created earlier! ASSISTANTSure, I can help with that. But I need to know what you have in your C++ code so far and where exactly you want to add these new functions. Please provide more details or share your current code. |
I'll leave this PR up as it may point to some more serious bug with the initial |
<s>
and ### Instruction:
...)
Amazingly that space is so harmful to USERCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values ASSISTANTSure, here is a simple C++ program that uses the GNU Scientific Library (GSL) to perform logistic regression. Note that you'll need to have GSL installed on your system to compile this code. #include <iostream>
#include <gsl/gsl_multifit.h>
void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
double a = gsl_vector_get(x, 0);
double b = gsl_vector_get(x, 1);
size_t i;
for (i = 0; i < f->size; ++i) {
double yi = gsl_vector_get((gsl_vector *)params, i);
double fi = 1.0 / (1.0 + exp(-(a * yi + b)));
gsl_vector_set(f, i, fi);
}
}
int main() {
const size_t n = 5;
const size_t p = 2;
// independent variables (X) and dependent variable (y)
gsl_vector * X = gsl_vector_alloc(n);
gsl_vector * y = gsl_vector_alloc(n);
// set values of X and y
gsl_vector_set(X, 0, 1.0);
gsl_vector_set(X, 1, 2.0);
gsl_vector_set(X, 2, 3.0);
gsl_vector_set(X, 3, 4.0);
gsl_vector_set(X, 4, 5.0);
gsl_vector_set(y, 0, -1.0);
gsl_vector_set(y, 1, 1.0);
gsl_vector_set(y, 2, -1.0);
gsl_vector_set(y, 3, 1.0);
gsl_vector_set(y, 4, -1.0);
// allocate space for the parameters (a and b)
gsl_vector * params = gsl_vector_alloc(p);
// set initial guess for parameters
gsl_vector_set(params, 0, 0.0);
gsl_vector_set(params, 1, 0.0);
// allocate space for the fitted parameters
gsl_vector * f = gsl_vector_alloc(p);
// create a linear system and solve it
const gsl_multifit_fsolver_type *T;
gsl_multifit_fsolver *s;
T = gsl_multifit_fsolver_lmsder;
s = gsl_multifit_fsolver_alloc(T, n, p);
gsl_multifit_function f_;
f_.f = logistic_regression;
f_.params = y;
gsl_multifit_fsolver_set(s, &f_, params);
int status;
size_t iter = 0;
do {
iter++;
status = gsl_multifit_fsolver_iterate(s);
if (status)
break;
status = gsl_multifit_test_delta(params, f, 1e-7, 1e-7);
} while (status == GSL_CONTINUE && iter < 1000);
// print the results
printf("a = %g\n", gsl_vector_get(params, 0));
printf("b = %g\n", gsl_vector_get(params, 1));
// free memory
gsl_multifit_fsolver_free(s);
gsl_vector_free(X);
gsl_vector_free(y);
gsl_vector_free(params);
gsl_vector_free(f);
return 0;
} This program calculates logistic regression for a simple dataset with two independent variables (X) and one dependent variable (y). The USERCan you implement the f.f and f.df code too? ASSISTANTSure, here's the implementation of void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
double a = gsl_vector_get(x, 0);
double b = gsl_vector_get(x, 1);
size_t i;
for (i = 0; i < f->size; ++i) {
double yi = gsl_vector_get((gsl_vector *)params, i);
double fi = 1.0 / (1.0 + exp(-(a * yi + b)));
gsl_vector_set(f, i, fi);
}
}
void logistic_regression_df(const gsl_vector * x, void * params, gsl_matrix * J) {
double a = gsl_vector_get(x, 0);
double b = gsl_vector_get(x, 1);
size_t i;
for (i = 0; i < J->size1; ++i) {
double yi = gsl_vector_get((gsl_vector *)params, i);
double fi = 1.0 / (1.0 + exp(-(a * yi + b)));
gsl_matrix_set(J, i, 0, yi * fi * (1 - fi));
gsl_matrix_set(J, i, 1, fi * (1 - fi));
}
}
int main() {
// ...
f_.f = logistic_regression;
f_.df = logistic_regression_df;
// ...
} In this code, USERPlease write out the full code so I can run it! ASSISTANTHere is the full code: #include <iostream>
#include <gsl/gsl_multifit.h>
void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
double a = gsl_vector_get(x, 0);
double b = gsl_vector_get(x, 1);
size_t i;
for (i = 0; i < f->size; ++i) {
double yi = gsl_vector_get((gsl_vector *)params, i);
double fi = 1.0 / (1.0 + exp(-(a * yi + b)));
gsl_vector_set(f, i, fi);
}
}
void logistic_regression_df(const gsl_vector * x, void * params, gsl_matrix * J) {
double a = gsl_vector_get(x, 0);
double b = gsl_vector_get(x, 1);
size_t i;
for (i = 0; i < J->size1; ++i) {
double yi = gsl_vector_get((gsl_vector *)params, i);
double fi = 1.0 / (1.0 + exp(-(a * yi + b)));
gsl_matrix_set(J, i, 0, yi * fi * (1 - fi));
gsl_matrix_set(J, i, 1, fi * (1 - fi));
}
}
int main() {
const size_t n = 5;
const size_t p = 2;
// independent variables (X) and dependent variable (y)
gsl_vector * X = gsl_vector_alloc(n);
gsl_vector * y = gsl_vector_alloc(n);
// set values of X and y
gsl_vector_set(X, 0, 1.0);
gsl_vector_set(X, 1, 2.0);
gsl_vector_set(X, 2, 3.0);
gsl_vector_set(X, 3, 4.0);
gsl_vector_set(X, 4, 5.0);
gsl_vector_set(y, 0, -1.0);
gsl_vector_set(y, 1, 1.0);
gsl_vector_set(y, 2, -1.0);
gsl_vector_set(y, 3, 1.0);
gsl_vector_set(y, 4, -1.0);
// allocate space for the parameters (a and b)
gsl_vector * params = gsl_vector_alloc(p);
// set initial guess for parameters
gsl_vector_set(params, 0, 0.0);
gsl_vector_set(params, 1, 0.0);
// allocate space for the fitted parameters
gsl_vector * f = gsl_vector_alloc(p);
// create a linear system and solve it
const gsl_multifit_fsolver_type *T;
gsl_multifit_fsolver *s;
T = gsl_multifit_fsolver_lmsder;
s = gsl_multifit_fsolver_alloc(T, n, p);
gsl_multifit_function f_;
f_.f = logistic_regression;
f_.df = logistic_regression_df;
f_.params = y;
gsl_multifit_fsolver_set(s, &f_, params);
int status;
size_t iter = 0;
do {
iter++;
status = gsl_multifit_fsolver_iterate(s);
if (status)
break;
status = gsl_multifit_test_delta(params, f, 1e-7, 1e-7);
} while (status == GSL_CONTINUE && iter < 1000);
// print the results
printf("a = %g\n", gsl_vector_get(params, 0));
printf("b = %g\n", gsl_vector_get(params, 1));
// free memory
gsl_multifit_fsolver_free(s);
gsl_vector_free(X);
gsl_vector_free(y);
gsl_vector_free(params);
gsl_vector_free(f);
return 0;
} This code calculates logistic regression for a simple dataset with two independent variables (X) and one dependent variable (y). The I think it would probably be a good idea to:
|
I'm not even sure if it is the space that is causing it now, as this is the tokenization when using the
that produces "sane" output from |
<s>
and ### Instruction:
...)
I've figured it out now! The and there is no This now agrees with the Ollama-templated version for USERCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values ASSISTANTSure, here's a simple C++ program that uses the GNU Scientific Library (GSL) to perform logistic regression. Note that you need to have GSL installed on your system to compile this code. #include <iostream>
#include <gsl/gsl_multifit.h>
void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
double a = gsl_vector_get(params, 0);
double b = gsl_vector_get(params, 1);
size_t n = x->size;
for (size_t i = 0; i < n; ++i) {
double xi = gsl_vector_get(x, i);
double fi = 1.0 / (1.0 + exp(-(a * xi + b)));
gsl_vector_set(f, i, fi);
}
}
int main() {
const size_t n = 5;
const size_t p = 2;
// Hardcoded values for demonstration
double x[] = {1, 2, 3, 4, 5};
double y[] = {0.1, 0.3, 0.7, 0.85, 0.99};
gsl_vector_view xv = gsl_vector_view_array(x, n);
gsl_vector * X = gsl_vector_alloc(n);
for (size_t i = 0; i < n; ++i) {
gsl_vector_set(X, i, 1.0);
}
gsl_matrix * X2 = gsl_matrix_alloc(n, p);
for (size_t i = 0; i < n; ++i) {
gsl_matrix_set(X2, i, 0, x[i]);
gsl_matrix_set(X2, i, 1, 1.0);
}
gsl_vector * yv = gsl_vector_view_array(y, n);
gsl_multifit_function_fdf f;
f.f = NULL;
f.df = NULL;
f.fdf = &logistic_regression;
f.n = n;
f.p = p;
gsl_vector * params = gsl_vector_alloc(p);
gsl_multifit_fdfsolver * s = gsl_multifit_fdfsolver_alloc(gsl_multifit_fdfsolver_lmsder, n, p);
gsl_multifit_fdfsolver_set(s, &f, params);
int status;
size_t iter = 0;
do {
iter++;
status = gsl_multifit_fdfsolver_iterate(s);
if (status) break;
status = gsl_multifit_test_delta(s->dx, s->x, 1e-8, 1e-8);
} while (status == GSL_CONTINUE && iter < 1000);
std::cout << "a = " << gsl_vector_get(params, 0) << ", b = " << gsl_vector_get(params, 1) << std::endl;
gsl_multifit_fdfsolver_free(s);
gsl_vector_free(params);
gsl_matrix_free(X2);
gsl_vector_free(X);
return 0;
} This program performs logistic regression on a set of hardcoded data points (x, y). The output will be the parameters 'a' and 'b' of the logistic function. Please note that this is a basic example and may not be suitable for real-world applications without further modifications. For instance, you might want to add error handling or use more advanced optimization algorithms. USERCan you implement the f.f and f.df code too? ASSISTANTSure, here's the implementation of void logistic_regression_f(const gsl_vector * x, void * params, gsl_vector * f) {
double a = gsl_vector_get(params, 0);
double b = gsl_vector_get(params, 1);
size_t n = x->size;
for (size_t i = 0; i < n; ++i) {
double xi = gsl_vector_get(x, i);
double fi = a * xi + b;
gsl_vector_set(f, i, fi);
}
}
void logistic_regression_df(const gsl_vector * x, void * params, gsl_matrix * J) {
size_t n = x->size;
for (size_t i = 0; i < n; ++i) {
double xi = gsl_vector_get(x, i);
gsl_matrix_set(J, i, 0, xi);
gsl_matrix_set(J, i, 1, 1.0);
}
}
int main() {
// ...
f.f = &logistic_regression_f;
f.df = &logistic_regression_df;
// ...
} These two functions calculate the model function USERPlease write out the full code so I can run it! ASSISTANTHere's the full code with #include <iostream>
#include <gsl/gsl_multifit.h>
void logistic_regression_f(const gsl_vector * x, void * params, gsl_vector * f) {
double a = gsl_vector_get(params, 0);
double b = gsl_vector_get(params, 1);
size_t n = x->size;
for (size_t i = 0; i < n; ++i) {
double xi = gsl_vector_get(x, i);
double fi = a * xi + b;
gsl_vector_set(f, i, fi);
}
}
void logistic_regression_df(const gsl_vector * x, void * params, gsl_matrix * J) {
size_t n = x->size;
for (size_t i = 0; i < n; ++i) {
double xi = gsl_vector_get(x, i);
gsl_matrix_set(J, i, 0, xi);
gsl_matrix_set(J, i, 1, 1.0);
}
}
void logistic_regression(const gsl_vector * x, void * params, gsl_vector * f) {
double a = gsl_vector_get(params, 0);
double b = gsl_vector_get(params, 1);
size_t n = x->size;
for (size_t i = 0; i < n; ++i) {
double xi = gsl_vector_get(x, i);
double fi = 1.0 / (1.0 + exp(-(a * xi + b)));
gsl_vector_set(f, i, fi);
}
}
int main() {
const size_t n = 5;
const size_t p = 2;
// Hardcoded values for demonstration
double x[] = {1, 2, 3, 4, 5};
double y[] = {0.1, 0.3, 0.7, 0.85, 0.99};
gsl_vector_view xv = gsl_vector_view_array(x, n);
gsl_vector * X = gsl_vector_alloc(n);
for (size_t i = 0; i < n; ++i) {
gsl_vector_set(X, i, 1.0);
}
gsl_matrix * X2 = gsl_matrix_alloc(n, p);
for (size_t i = 0; i < n; ++i) {
gsl_matrix_set(X2, i, 0, x[i]);
gsl_matrix_set(X2, i, 1, 1.0);
}
gsl_vector * yv = gsl_vector_view_array(y, n);
gsl_multifit_function_fdf f;
f.f = &logistic_regression_f;
f.df = &logistic_regression_df;
f.fdf = &logistic_regression;
f.n = n;
f.p = p;
gsl_vector * params = gsl_vector_alloc(p);
gsl_multifit_fdfsolver * s = gsl_multifit_fdfsolver_alloc(gsl_multifit_fdfsolver_lmsder, n, p);
gsl_multifit_fdfsolver_set(s, &f, params);
int status;
size_t iter = 0;
do {
iter++;
status = gsl_multifit_fdfsolver_iterate(s);
if (status) break;
status = gsl_multifit_test_delta(s->dx, s->x, 1e-8, 1e-8);
} while (status == GSL_CONTINUE && iter < 1000);
std::cout << "a = " << gsl_vector_get(params, 0) << ", b = " << gsl_vector_get(params, 1) << std::endl;
gsl_multifit_fdfsolver_free(s);
gsl_vector_free(params);
gsl_matrix_free(X2);
gsl_vector_free(X);
return 0;
} This code should compile and run on a system with GSL installed. The output will be the parameters 'a' and 'b' of the logistic function. |
This is the code that I used to generate the tests using the from transformers import AutoTokenizer
VARIANTS_TO_TEST = [
'deepseek-ai/deepseek-coder-33b-instruct',
'meta-math/MetaMath-7B-V1.0',
]
HISTORY = [
{ 'role': 'system', 'content': 'You are a helpful assistant' },
{ 'role': 'user', 'content': 'Hello' },
{ 'role': 'assistant', 'content': 'Hi there' },
{ 'role': 'user', 'content': 'Who are you' },
{ 'role': 'assistant', 'content': ' I am an assistant ' },
{ 'role': 'user', 'content': 'Another question' },
]
for variant in VARIANTS_TO_TEST:
history = [m for m in HISTORY] # copy
if 'meta-math' in variant:
print("\n----- Alpaca -----")
ALPACA_TMPL = "{%- set ns = namespace(found=false) -%} {%- for message in messages -%} {%- if message['role'] == 'system' -%} {%- set ns.found = true -%} {%- endif -%} {%- endfor -%} {%- if not ns.found -%} {{- '' + 'Below is an instruction that describes a task. Write a response that appropriately completes the request.' + '\n\n' -}} {%- endif %} {%- for message in messages %} {%- if message['role'] == 'system' -%} {{- '' + message['content'] + '\n\n' -}} {%- else -%} {%- if message['role'] == 'user' -%} {{-'### Instruction:\n' + message['content'] + '\n\n'-}} {%- else -%} {{-'### Response:\n' + message['content'] + '\n\n' -}} {%- endif -%} {%- endif -%} {%- endfor -%} {%- if add_generation_prompt -%} {{-'### Response:\n'-}} {%- endif -%}"
output = AutoTokenizer.from_pretrained(VARIANTS_TO_TEST[1]).apply_chat_template(history, tokenize=False, add_generation_prompt=True, chat_template=ALPACA_TMPL)
print(output)
print("\n[Test String]\n// meta-math/MetaMath-7B-V1.0")
print(output.replace("\n", "\\n"))
print('"' + output.replace("\n", "\\n") + '",')
else:
print("\n----- " + variant + " -----")
tokenizer = AutoTokenizer.from_pretrained(variant)
output = tokenizer.apply_chat_template(history, tokenize=False, add_generation_prompt=True)
print(output)
print("\n[Test String]\n// " + variant)
print('"' + output.replace("\n", "\\n") + '",') |
Actually I've just found I have a copy of the 34b model! I will test and report back. |
Here is USERCan you write me a C++ program to calculate logistic regression using GSL? Write a short driver in main to test it with hard coded values ASSISTANTSure, here's an example program that calculates logistic regression using GSL: #include <iostream>
#include <gsl/gsl_multifit.h>
int main() {
// Define the input data
double x[] = {1, 2, 3, 4, 5};
double y[] = {0.1, 0.2, 0.3, 0.4, 0.5};
// Define the parameters
int n = sizeof(x) / sizeof(x[0]);
gsl_vector *X = gsl_vector_alloc(n);
gsl_vector *yv = gsl_vector_alloc(n);
// Copy the data into GSL vectors
for (int i = 0; i < n; ++i) {
gsl_vector_set(X, i, x[i]);
gsl_vector_set(yv, i, y[i]);
}
// Define the design matrix
gsl_matrix *design = gsl_matrix_alloc(n, 2);
for (int i = 0; i < n; ++i) {
gsl_matrix_set(design, i, 0, 1);
gsl_matrix_set(design, i, 1, x[i]);
}
// Define the coefficients
gsl_vector *c = gsl_vector_alloc(2);
// Solve for the coefficients
int status = gsl_multifit_linear(X, yv, design, c);
if (status) {
std::cerr << "Error: " << gsl_strerror(status) << std::endl;
return 1;
}
// Print the coefficients
double a = gsl_vector_get(c, 0);
double b = gsl_vector_get(c, 1);
std::cout << "a: " << a << std::endl;
std::cout << "b: " << b << std::endl;
// Clean up
gsl_vector_free(X);
gsl_vector_free(yv);
gsl_matrix_free(design);
gsl_vector_free(c);
return 0;
} This program assumes that the input data is in the form of an array Note that this program assumes that you have already installed GSL on your system and linked against it in your compiler settings. If you haven't done this yet, you can find instructions for installing GSL here: https://www.gnu.org/software/gsl/doc/html/usage.html#compiling-and-linking Also note that this program assumes a simple linear model with one independent variable ( USERThis is just linear regression though??? I wanted logistic regression! ASSISTANTYou're right, this is just linear regression. Here's an example program that calculates logistic regression using GSL: #include <iostream>
#include <gsl/gsl_multifit.h>
int main() {
// Define the input data
double x[] = {1, 2, 3, 4, 5};
double y[] = {0.1, 0.2, 0.3, 0.4};
// Define the parameters
int n = sizeof(x) / sizeof(x[0]);
gsl_vector *X = gsl_vector_alloc(n);
gsl_vector *yv = gsl_vector_alloc(n);
// Copy the data into GSL vectors
for (int i = 0; i < n; ++i) {
gsl_vector_set(X, i, x[i]);
gsl_vector_set(yv, i, y[i]);
}
// Define the design matrix
gsl_matrix *design = gsl_matrix_alloc(n, 2);
for (int i = 0; i < n; ++i) {
gsl_matrix_set(design, i, 0, 1);
gsl_matrix_set(design, i, 1, x[i]);
}
// Define the coefficients
gsl_vector *c = gsl_vector_alloc(2);
// Solve for the coefficients
int status = gsl_multifit_linear(X, yv, design, c);
if (status) {
std::cerr << "Error: " << gsl_strerror(status) << std::endl;
return 1;
}
// Calculate the logistic function for each input value
double a = gsl_vector_get(c, 0);
double b = gsl_vector_get(c, 1);
for (int i = 0; i < n; ++i) {
double xi = gsl_vector_get(X, i);
double yi = 1.0 / (1.0 + exp(-(a + b * xi));
std::cout << "x: " << x[i] << ", y: " << y[i] << ", predicted: " << yi << std::endl;
}
// Clean up
gsl_vector_free(X);
gsl_vector_free(yv);
gsl_matrix_free(design);
gsl_vector_free(c);
return 0;
} This program calculates the coefficients USERNo this is completely wrong... let's start again and make sure to use gsl_multifit_function_fdf ASSISTANTSure, here's an example program that calculates logistic regression using GSL's #include <iostream>
#include <gsl/gsl_multifit.h>
int main() {
// Define the input data
double x[] = {1, 2, 3, 4, 5};
double y[] = {0.1, 0.2, 0.3, 0.4};
// Define the parameters
int n = sizeof(x) / sizeof(x[0]);
gsl_vector *X = gsl_vector_alloc(n);
gsl_vector *yv = gsl_vector_alloc(n);
// Copy the data into GSL vectors
for (int i = 0; i < n; ++i) {
gsl_vector_set(X, i, x[i]);
gsl_vector_set(yv, i, y[i]);
}
// Define the design matrix
gsl_matrix *design = gsl_matrix_alloc(n, 2);
for (int i = 0; i < n; ++i) {
gsl_matrix_set(design, i, 0, 1);
gsl_matrix_set(design, i, 1, x[i]);
}
// Define the function and its derivatives
struct {
double (*f)(const gsl_vector *x, void *params);
void (*df)(const gsl_vector *x, void *params, gsl_matrix *J);
void (*fdf)(const gsl_vector *x, void *params, double *f, gsl_matrix *J);
} F;
F.f = [](const gsl_vector *c, void *) -> double {
double a = gsl_vector_get(c, 0);
double b = gsl_vector_get(c, 1);
double sum = 0.0;
for (int i = 0; i < n; ++i) {
double xi = gsl_vector_get(X, i);
double yi = gsl_vector_get(yv, i);
sum += log(1.0 + exp(-yi * (a + b * xi));
}
return sum;
};
F.df = [](const gsl_vector *c, void *params, gsl_matrix *J) {
double a = gsl_vector_get(c, 0);
double b = gsl_vector_get(c, 1);
for (int i = 0; i < n; ++i) {
double xi = gsl_vector_get(X, i);
double yi = gsl_vector_get(yv, i);
double e = exp(-yi * (a + b * xi));
gsl_matrix_set(J, i, 0, -yi * e / (1.0 + e));
gsl_matrix_set(J, i, 1, -yi * xi * e / (1.0 + e));
}
};
F.fdf = [](const gsl_vector *c, void *params, double *f, gsl_matrix *J) {
F.f(c, params);
F.df(c, params, J);
};
// Solve for the coefficients
gsl_multifit_function f;
f.n = n;
f.p = 2;
f.f = &F.f;
f.df = &F.df;
f.fdf = &F.fdf;
gsl_multifit_function_fdf FDF;
FDF.f = &F.f;
FDF.df = &F.df;
FDF.fdf = &F.fdf;
gsl_vector *c = gsl_vector_alloc(2);
gsl_multifit_fsolver *solver = gsl_multifit_fsolver_alloc(gsl_multifit_fsolver_lmsder, n, 2);
gsl_multifit_fsolver_set(solver, &f, c);
int status = gsl_multifit_fsolver_iterate(solver);
if (status) {
std::cerr << "Error: " << gsl_strerror(status) << std::endl;
return 1;
}
status = gsl_multifit_fsolver_solve(solver);
if (status) {
std::cerr << "Error: " << gsl_strerror(status) << std::endl;
return 1;
}
// Print the coefficients
double a = gsl_vector_get(c, 0);
double b = gsl_vector_get(c, 1);
std::cout << "a: " << a << std::endl;
std::cout << "b: " << b << std::endl;
// Clean up
gsl_vector_free(X);
gsl_vector_free(yv);
gsl_matrix_free(design);
gsl_multifit_fsolver_free(solver);
return 0;
} This program uses USERYou need to call gsl_multifit_fdfsolver_iterate in a loop though??? ASSISTANTYes, you're right. Here's an updated program that iterates until convergence: #include <iostream>
#include <gsl/gsl_multifit.h>
int main() {
// Define the input data
double x[] = {1, 2, 3, 4, 5};
double y[] = {0.1, 0.2, 0.3, 0.4};
// Define the parameters
int n = sizeof(x) / sizeof(x[0]);
gsl_vector *X = gsl_vector_alloc(n);
gsl_vector *yv = gsl_vector_alloc(n);
// Copy the data into GSL vectors
for (int i = 0; i < n; ++i) {
gsl_vector_set(X, i, x[i]);
gsl_vector_set(yv, i, y[i]);
}
// Define the design matrix
gsl_matrix *design = gsl_matrix_alloc(n, 2);
for (int i = 0; i < n; ++i) {
gsl_matrix_set(design, i, 0, 1);
gsl_matrix_set(design, i, 1, x[i]);
}
// Define the function and its derivatives
struct {
double (*f)(const gsl_vector *x, void *params);
void (*df)(const gsl_vector *x, void *params, gsl_matrix *J);
void (*fdf)(const gsl_vector *x, void *params, double *f, gsl_matrix *J);
} F;
F.f = [](const gsl_vector *c, void *) -> double {
double a = gsl_vector_get(c, 0);
double b = gsl_vector_get(c, 1);
double sum = 0.0;
for (int i = 0; i < n; ++i) {
double xi = gsl_vector_get(X, i);
double yi = gsl_vector_get(yv, i);
sum += log(1.0 + exp(-yi * (a + b * xi));
}
return sum;
};
F.df = [](const gsl_vector *c, void *params, gsl_matrix *J) {
double a = gsl_vector_get(c, 0);
double b = gsl_vector_get(c, 1);
for (int i = 0; i < n; ++i) {
double xi = gsl_vector_get(X, i);
double yi = gsl_vector_get(yv, i);
double e = exp(-yi * (a + b * xi));
gsl_matrix_set(J, i, 0, -yi * e / (1.0 + e));
gsl_matrix_set(J, i, 1, -yi * xi * e / (1.0 + e));
}
};
F.fdf = [](const gsl_vector *c, void *params, double *f, gsl_matrix *J) {
F.f(c, params);
F.df(c, params, J);
};
// Solve for the coefficients
gsl_multifit_function f;
f.n = n;
f.p = 2;
f.f = &F.f;
f.df = &F.df;
f.fdf = &F.fdf;
gsl_multifit_function_fdf FDF;
FDF.f = &F.f;
FDF.df = &F.df;
FDF.fdf = &F.fdf;
gsl_vector *c = gsl_vector_alloc(2);
gsl_multifit_fsolver *solver = gsl_multifit_fsolver_alloc(gsl_multifit_fsolver_lmsder, n, 2);
gsl_multifit_fsolver_set(solver, &f, c);
int status;
do {
status = gsl_multifit_fsolver_iterate(solver);
if (status) {
std::cerr << "Error: " << gsl_strerror(status) << std::endl;
return 1;
}
} while (status == GSL_CONTINUE);
// Print the coefficients
double a = gsl_vector_get(c, 0);
double b = gsl_vector_get(c, 1);
std::cout << "a: " << a << std::endl;
std::cout << "b: " << b << std::endl;
// Clean up
gsl_vector_free(X);
gsl_vector_free(yv);
gsl_matrix_free(design);
gsl_multifit_fsolver_free(solver);
return 0;
} This program uses |
See #8159 for de-conflicted version of the same PR. |
This PR adds the commonly used 'alpaca' chat template.
I can't actually find any models on huggingface that have the 'alpaca' template in their
tokenizer_config.json
files so had to manually add it to the Templates-supported-by-llama_chat_apply_template script (edit: see lower for corrected wiki python script addition).I used the Jinga template given by
text-generation-webui
:https://github.com/oobabooga/text-generation-webui/blob/main/instruction-templates/Alpaca.yaml
I should add that even though this template isn't actually used much now, people still do use it for creative writing as a workaround for the problematic Mistral
[INST]
type templates, so having the ability to manually specify 'alpaca' for these models would help.It is also subtly different to the existing
deepseek-coder
template: no'<|EOT|>
' and also uses double newlines.I also chose
meta-math/MetaMath-7B-V1.0
fairly arbitrarily, but this template is used in lots of other commonly used models, such as: thewizard
family of models,phind-codellama
, etc.