Abdullah Şamil Güser

PaLM 2 Technical Report

We introduce PaLM 2, the successor to PaLM (Chowdhery et al., 2022), a language model unifying modeling advances, data improvements, and scaling insights. PaLM 2 incorporates the following diverse set of research advances:

While scaling laws can be used to achieve optimal training loss for a given quantity of FLOPs, this does not necessarily transfer to achieving optimal performance for a given task. Moreover, there are several other considerations besides the optimal training loss, such as training throughput and serving latency, which affect the decision regarding the optimal model size.

PaLM 2 is trained on a dataset that includes a higher percentage of non-English data than previous large language models.

For a small fraction of pre-training data, we added special control tokens marking the toxicity of text, using signals from a fixed version of the Perspective API.

PaLM 2 outperforms PaLM across all datasets and achieves results competitive with GPT-4.

References