Develop Machine Learning models with SQL and BigQuery ML

If you are a data analyst, chances are high that you often use SQL (Structured Query Language) in data mining and analysis projects. SQL is a standard language used for storing, manipulating and retrieving data in databases. SQL is not only used by data analysts, but according to StackOverflow’s Annual Developer Survey results, 57% of developers use it.


Build Machine Learning models with SQL and BigQuery ML

Machine Learning is used to give computers the ability to learn without being explicitly programmed. Implementing machine learning models is however very complex and, until recently, has been the domain of data scientists and developers with extensive knowledge of and experience in Python, R or Java.

BigQuery ML, part of the Google Cloud platform (GCP), enables anyone with a good knowledge of SQL to build and deploy a machine learning model :

1. Extract, Transform if necessary and Load your data into BigQuery.

2. Use standard SQL queries to create and train your model (with CREATE MODEL)



3. Use standard SQL queries to evaluate your model with data that you have not used for training (with ml.EVALUATE) :



4. Use standard SQL queries to use your model (with ml.PREDICT)



Normally, extensive knowledge of machine learning frameworks and programming (Python, R, Java...) is required to apply machine learning to datasets. This restricts access to ML projects to a small numbers of employees within a company. Data analysts who comprehend the data, but have limited programming expertise and machine learning knowledge, are often excluded.

BigQuery ML allows data analysts to use existing SQL skills to implement machine learning solutions. BigQuery ML is used to build, train and evaluate machine learning models on data stored in BigQuery.

BigQuery ML functionality is available by using:

- BigQuery’s command-line tools
- BigQuery’s web UI
- BigQuery’s REST API

BigQuery ML supports the following types of models:

- Linear regression. It estimates a dependent variable value (y) based on a given independent variable (x).

- Binary logistic regression. One example of this would be to determine whether a customer will make a purchase or not. In this model, labels can only have one of two possible values.

- Multinomial logistic regression (or multi-class). This type of model can be used to predict several possible values such as whether an output is a low, medium, or high value.

More model types will be added soon.

Conclusion

BigQuery ML allows users to write machine learning models with SQL, experiment and iterate right where the data lives, inside of BigQuery for fast and efficient machine learning model development.

The use of ML is democratized by enabling data analysts to build and run models using existing SQL code, thereby allowing predictive analytics to be used for faster business decision making.

Users who already know machine learning can explore the details of their model options and weights very easily. Common adjustments can be made above and beyond what BigQuery ML defaults to include things like the learning rate, regularization, the training evaluation dataset split, predefined weights for classes, and more.

Want to learn more ? We can help you get started with Machine Learning on Google Cloud Platform and BigQuery. Unlock the power of your data, get in touch today!

Talk to us

Need help? Want to learn more? Leave us a message!

Contact us