Building a Question Answering Search Engine with Power Query, OpenAI, MindsDB, Streamlit,and MindsDB SDK using SQL

Building a Question Answering Search Engine with Power Query, OpenAI, MindsDB, Streamlit,and MindsDB SDK using SQL

Since we live in an information age, finding answers to our questions is becoming integral to daily life. So whether we need to know what's happening across the globe or how to make a burger, we turn to search engines to provide us with the information we need. And with just a few clicks, we get access to a vast amount of information—articles, papers, videos, Tiktok short clips, and much more—at our fingertips.

However, it can be challenging to find what we're looking for with so much information available. Thus, question-answering search engines come in handy here. These search engines use natural language processing (NLP) and machine learning (ML) algorithms to understand your questions and provide accurate answers.

This article will explore the process of building a simple question-answering search engine with MindsDB, an open-source database with ML capability. We will also use data cleaning tools like Power Query and deep learning models like OpenAI.

What is MindsDB?

MindsDB is an open-source tool that brings machine learning (ML) into your database through AI Tables. AI tables here are ML models on your databases that exist as virtual tables.

Since MindsDB is designed as a database, users can build, train, and deploy models using simple SQL statements. Users can thus easily query data from multiple sources and build ML-powered applications using ML models. A few of these models are your classification, time series, regressions, and NLP models for customer churn, sentiment analysis, text summarization, and question-answering projects.

What Problem does this Project Address?

The question-answering search engine makes searching for information more efficient and less time-consuming. It also improves productivity and prioritises users getting timely, accurate, concise answers.

You can build and train reliable question-answering models by leveraging MindsDB's cutting-edge machine-learning technology.

What Inspired you to Create this Project?

As data scientists, we are always fascinated by the power of machine learning and its potential to solve real-world problems. This project recognizes and hopes to bridge this gap by showing how to implement machine learning models. Furthermore, it shows that machine learning can be a valuable tool for people to quickly and easily access information.

MindsDB democratizes the use of machine learning by making machine learning more accessible and user-friendly for everyone.

Prerequisites and Project Workflow

As mentioned, we'll build a question-answering search engine with MindsDB using SQL.

To follow through with this article, you will need:

Let's dive in. 🧑🏿‍💻👨‍💻

Our Dataset

Just like every data analysis project, we need data to get started.

This tutorial will use WebQuestions from SEMPRE: Semantic Parsing with Execution (Berant et al., 2013, CC-BY), a benchmarking QA structured knowledge base. This dataset is part of the BRMSON (brmlab) open source question answering project.

You can download the JSON file used in this project from the official repository. The dataset is pretty straightforward and consists of the question ID, questions, and answers.

Data Preparation

MindsDB accepts JSON files. However, we can use Power Query if you want to clean and preprocess the JSON data.

To do this;

  1. Open Microsoft Excel

  2. Click on the Data ribbon, then Get Data.

  3. Select JSON and click on the JSON file by navigating to it.

    A spreadsheet showing JSON get data in Power Query

  4. From Transform in Power Query, click on To TABLE to convert the file’s list format into a table.

  5. You don't need to pick a delimiter. So click Ok.

  6. Use the arrow by your column header to expand the data.

  7. For “Column1.answers” column, use the arrow to Extract Values. You can use a column or space as your delimiter.

  8. Click on your column name to rename your column; “Column1.qid”, “Column1.answers”, and “Column1.Text” as “ID”, “answers”, and “questions”.

    A spreadsheet showing preprocessing in Power Query

  9. Click on Close & Load since our data is clean, needs no more preprocessing, and is error-free.

  10. Save the file as an Excel workbook.

Building with MindsDB ML Tables

We got our data, so it is time to start building.

For this tutorial, I will use the MindsDB cloud editor. Sign up at cloud.mindsdb.com. After signing up, you will be redirected to the cloud editor dashboard. This is where the magic happens.✨

MindsDB's Cloud editor

Getting data into MindsDB

MindsDB allows users to get data from several sources using its numerous integrations. A few of these data sources are

  • Flat files like CSV, JSON, and Excel workbooks

  • Databases like MySQL, MongoDB, Amazon Redshift, and SnowFlake

  • Applications like Airtable, Google Sheets, and Cockroach Labs

  • And social media platforms like Twitter.

To get our data into this MindsDB cloud editor, you will need to:

  1. Click on the Add button on the Cloud editor.

  2. Select Upon File and browse for the file on your local computer. You can use the JSON or Excel workbook.

  3. Lastly, give your data source a name. I named mine “QA_WebQuest”.

Once done, MindsDB will automatically create a data table in its cloud database for you.

Creating the ML Model

MindsDB saves your files as a table into a filesystem database called “files”. You can see the uploaded file by running the syntax below:

SHOW TABLES FROM files;

To view the contents of the "QA_WebQuest" file, run the SQL query below in the cloud editor by clicking on the Run button or pressing Shift + Enter on your keyboard. The results of the query will display the content of the file.

SELECT * FROM files.QA_WebQuest LIMIT 10;

MindsDB's Cloud editor

To create our question-answering model, we would provide it with data (our features). We will also create it by passing the OpenAI model in MindsDB into the USING clause of our CREATE MODEL statement. The OpenAI model in MindsDB uses the gpt-3.5-turbo model by default. You can do that by running this query:

CREATE MODEL ques_answer_model
PREDICT answer
USING
    engine = 'openai',
    prompt_template = 'answer the question of text:{{Questions}} about text:{{Answers}}';

A question answering ML Model

"Ques_answer_model" is the name I gave the AI table MindsDB I created. Answer In the SQL statement is our label, which is what we want to predict. The prompt_template is the message structure to be conveyed by the model.


TIP: To drop your model, use DROP Predictor ques_answer_model.


Checking your Model Status

The model's runtime can vary greatly, ranging from a few seconds to several hours, depending on various factors, such as the size of the dataset, internet connection speed, and the complexity of the parameters used. To monitor your model's progress, you can execute the following SQL query to view the current status.

SELECT * FROM models
WHERE name = 'ques_answer_model';

Checking your model's status in MindsDB

Question Answering Prediction with MindsDB

To make predictions, we can specify data in a query. I am going to ask our model how many states are in Nigeria.

SELECT Questions, answer
FROM ques_answer_model
WHERE Questions = 'how many states in Nigeria?'
AND Answers = 'Nigeria'

Here is the output data:

Questionsanswer
how many states in Nigeria?There are 36 states in Nigeria, including the Federal Capital Territory (FCT) which is home to the capital city, Abuja.

Prediction with MindsDB

Batch predictions

We can also make predictions by joining them with another table. Let's upload new data from the same repository into MindsDB. You should also name it (QA_estT) and ensure that you preprocess the file if you used the Excel version.

Run this query to carry out a batch prediction:

SELECT input.Questions, input.Answers, output.answer
FROM files.QA_estT AS input
JOIN ques_answer_model AS output
LIMIT 2;

Here is our output:

QuestionsAnswersanswer
what does jamaican people speak?Jamaican Creole English Language,Jamaican EnglishThat's correct! Jamaican people speak a unique language called Jamaican Creole English, which is also known as Patois. However, many Jamaicans also speak Jamaican English, which is a more standard form of English.
what did james k polk do before he was president?LawyerBefore becoming the President of the United States, James K. Polk worked as a lawyer. He studied law under Felix Grundy and was admitted to the bar in 1820. Polk practiced law in Tennessee and served as a clerk in the Tennessee State Senate before entering politics.

Integrating your Question Answering MindsDB Model into an Application using an SDK

Software Development Kits, or SDKs, are a group of tools and frameworks created by a specific platform to aid developers in building with it. SDKs are important cause they help speed up the development and integration of other tools with your platform.

MindsDB has a JavaScript and Python SDK. These SDKs will allow you to perform MindsDB operations and use your ML model in a web application.

For this article, we will use the JavaScript SDK. You can learn about it by checking out the official MindsDB Javascript SDK documentation.

Here's a step-by-step guide on how to integrate a MindsDB model into a web application using a JavaScript SDK:

  1. You need to install the SDK first. You can do this by running this syntax in your project directory.

     npm install mindsdb
    
  2. Create a MindsDB client instance that communicates with the MindsDB server. You can create a client object by calling the MindsDB constructor and passing in the server URL and other configuration options.

     const MindsDB = require('mindsdb'); 
    
     const mdb = new MindsDB({ apiUrl: '[http://localhost:47334](http://localhost:47334)' });
    
  3. Load a MindsDB model into memory by calling the loadModel method and pass in the name of your model.

     await mdb.loadModel('ques_answer_model');
    
  4. Make predictions on your data by creating a JavaScript object representing the input data and passing it to the predict method.

     const input = {
       feature1: 0.5,
       feature2: 0.8,
       feature3: 1.0
     };
    
     const result = await mdb.predict('my-model', input);
     console.log(result);
    
  5. In the code below, the predict method returns a Promise that resolves to an object containing the predicted value and other information about the prediction. You can handle the prediction result in your code using the then method or await keyword.

mdb.predict('ques_answer_model', input)
  .then(result => {
    console.log(result);
  })
  .catch(error => {
    console.error(error);
  });

You can use any JavaScript framework or library of your choice to integrate the MindsDB model into a web application. For example, you can .predict() method code to a Vue method and use it to update the state of your Vue component.

However, to use React, here is the code:

import { useState } from 'react';
import MindsDB from 'mindsdb';

const App = () => {
  const [question, setQuestion] = useState('');
  const [answer, setAnswer] = useState('');

  const handleQuestionSubmit = async () => {
    const result = await mdb.predict('ques_answer_model', { question });
    setAnswer(result.answer);
  };

  return (
    <div>
      <input type="text" value={question} onChange={e => setQuestion(e.target.value)} />
      <button onClick={handleQuestionSubmit}>Ask</button>
      {answer && <p>{answer}</p>}
    </div>
  );
};

With this code, you can create a simple form. In the form, the user enters a question and receives an answer from the MindsDB question-answering model we built together. The handleQuestionSubmit function uses the predict method to make a prediction and displays an answer.

Give it a try using perhaps another JavaScript framework.

Deploying your Question Answering MindsDB Model with Streamlit and SDK

Let's explore the MindsDB Python SDK using Streamlit.

Streamlit is a Python library that lets you quickly and easily create interactive web applications with clean and intuitive user interfaces. This way, you can deploy your Ml models and build data-focused applications without requiring web development skills. Here are a few examples from Streamlit's official gallery.

By using Streamlit with MindsDB's Python SDK, you can quickly deploy your ML model and create a user-friendly interface to interact with it. However, you first need to use your Conda prompt, create a virtual environment and Install Pandas and Streamlit.

  • Create a virtual environment.
conda create --name mindenv python==3.9
  • Activate the virtual environment.
conda activate mindenv
  • Install Panda
conda install pandas
  • Install Streamlit
conda install streamlit
  1. Next, install the MindsDB Python SDK. I used Minconda, a minimal version of the Anaconda distribution, a popular Python package management system. Miniconda allows me to create a virtual environment and install only the necessary packages. This makes it a lightweight option for those who want control over the packages they install on their system.

pip install mindsdb_sdk
  1. Create a folder on your desktop and change your conda virtual environment directory. In this folder, create a .py file to hold all the necessary libraries and codes.

  2. Create a MindsDB client instance that communicates with the MindsDB server. Add your credentials and specify the model using project.list_models()[1] The [1] here means your model is number 2 in your cloud MindsDB editor model list.

  3. Now it is time to set up a simple web app using Streamlit. The code will ask the user to input a question which it will store in a dictionary and convert to a Pandas DataFrame.

Here is the code:

import mindsdb_sdk
import pandas as pd
import streamlit as st

# Connecting to MindsDB
server = mindsdb_sdk.connect('https://cloud.mindsdb.com', login='********@gmail.com', password='********')
project = server.get_project("mindsdb")
model = project.list_models()[1] 

# Web App title
st.title("MindsDB Question Answer search engine")
st.subheader("Ask a question")

# Retrieving Input from user
questions=st.text_input("Questions")
answers=st.text_input("Keyword")

# Create a dictionary to store value
variables = {
    "question": questions,
    "answer": answers
}

# Convert result to Dataframe
result = pd.DataFrame(variables, index=[0])

# Handler to predict the result
if st.button("Tell me"):
    st.success(model.predict(questions))

If the user clicks the "Tell me" button, our MindsDB model will make a prediction based on the input.

You can do further modifications by exploring the Streamlit documentation, which highlights various features of Streamlit.

Bring AI to Your Applications Using MindsDB

With MindsDB, you can build sophisticated applications and systems by utilizing the power of SQL and machine learning. It also doesn't matter where you want to get your data from. For example, you can connect to Twitter and build a sentiment analysis model to know how customers feel about your product or get data from a Google sheet linked to a Product survey (Google form) and create a sentiment analysis engine.

So explore the features of MindsDB by checking out their official documentation and joining the Slack community.

Start building today by creating a free account!