ChatGPT to Summarize Research Abstracts

Use the OpenAI API for ChatGPT to summarize, suggest applications and organize a collection of research abstracts.

After experimenting with the ChatGPT web interface and taking the OpenAI API for a spin (using the excellent openai-quickstart-node repository), I was looking for a reasonable first project to further explore what might be possible. At the same time, we have 48 engineering graduate students, each authoring an MS thesis. Could ChatGPT generate a summary of diverse engineering projects to capture the overview of what our students are doing and why it matters?

This turned out to be a good project for getting familiar with the OpenAI API and exploring how the current large language models (LLMs) perform when it comes to organizing and summarizing technical material.

Objective

Given a list of research abstracts (one paragraph each), generate a document that

Summarizes each research project with
- A single sentence describing the major findings of the work.
- A list of the most important applications of the research
Presents the project summaries clustered within topic categories
Summarizes the summaries with a brief overview of the entire collection.

Overview of Python Example

The complete Python script is summary.py from the ai-escapades git repository. To summarize the process, we do the following…

Read in the Research Abstracts

The one-paragraph abstracts are stored as an Excel spreadsheet with each record (author and abstract) on a single row.

We read this in using the openpyxl Python module…

# Read rows,  skipping header, and generate list of Resp objects.
records = []
for row in ws.iter_rows(
        min_row = 2, max_row = ws.max_row,
        min_col=1, max_col=ws.max_column,
        values_only=True):
    resp = Resp()
    resp.name = row[0]
    resp.abstract = row[1]
    records.append(resp)

Query ChatGPT to Summarize, Apply and Categorize

For each abstract we are going to ask ChatGPT to do the following:

Summarize the abstract with a single sentence. The prompt for this query is
Prompt: “Generate a one sentence summary of the following paragraph. The summary should describe what research was conducted and the key conclusion.
Paragraph: …”
Generate potential application.
Prompt: “Generate a list of two of the most relevant military and defense applications of the engineering technology described in the following paragraph. Each item in the list in the list should be a single sentence. The list should be formatted in markdown
Paragrph: …”
Categorize the research described in the abstract
Prompt: “Below is a set of categories and a one paragraph research abstract. Select the one best category to fit the following paragraph.
Categories: …
Paragraph: …”

So for each abstract we make an OpenAI call similar to this example

    psumm = summ_prompt(records[ii].abstract)
    rsumm = openai.Completion.create(model = model,
                                     prompt = psumm,
                                     max_tokens = 200,
                                     temperature=0.6)
    records[ii].summary = rsumm.choices[0].text

Summarize the Summaries

Now that we have ChatGPT generated one-sentence summaries of each project, we concatenate all of them and ask ChatGPT to then summarize its own output!

# Using all the summaries, request a summary of summaries
ssumm = ''
for record in records:
    ssumm += record.summary

prompt = 'Summarize the engineering topics described in the text below.  The summary should be a list of the five most important technology categories.  The list should be in markdown list format: \n\n' + ssumm 

rsumm= openai.Completion.create(model = model,
                                prompt = prompt,
                                max_tokens = 400,
                                temperature=0.3)

Write the Output as Markdown

Finally we write the output by writing the summary-of-summaries and then writing the summary and applications, organized by category.

Working Example

To illustrate the process, the the ai-escapades git repository includes the process script and example input (an Excel file with abstracts) and example output – synopsis.md.

Commentary

As an early experiment with the OpenAI API, this tasked proved to be an enlightening experience with the API and some of the capabilities and limitations of the current ChatGPT models. Overall, my takeaways are as follows:

The OpenAI API is very easy to use and well-documented.
Developing the script resulting in sending many queries. I was very liberal in making these calls – using a lot of tokens for each call. In tracking my OpeanAI API usage, I was pleasantly surprised by the limited cost. Each run of the program only cost a few cents.
Because this was one of my first experiences putting the LLMs to work, it was rather striking to be able to programmatically summarize and generate the text. It was definitely a new capability that generated many more ideas for projects.
The summaries were good, but not great. An expert in each field could certainly generate a better summary and ChatGPT did make mistakes about the technical details, but to the layperson, the summaries were useful.
I found the ChatGPT generated applications were even more valuable than the summaries. I was impressed at the (perceived) creativity evident in the application ideas.