In this talk, Dr. Ibrahim Alabdulmohsin will give an overview about scaling laws, including their application in sample size planning and learning curve extrapolation, with an emphasis on how they have been used recently to optimize the model size (e.g. in Chinchilla). After that, Dr. Alabdulmohsin will extend those methods to optimize the full model's shape (e.g. width and depth). We demonstrate that scaled-down architectures, trained at their optimal shapes for the right amount of compute, are comparable to (or even better than) fully-scaled models.
JRC for AI (KFUPM-SDAIA)