Build Knowledge Graphs with Remyx Triplets

This tutorial will show you how to extract knowledge graph triplets from raw text using the Triplets tool to support GraphRAG based applications.

Introduction

Knowledge Graphs can help to abstract the information available from unstructured text. By extracting relationships from a knowledge base, you can condense the original data to its most salient information to facilitate precise search results.

Anatomy of a Triplet

Relationships in Knowledge Graphs are encoded through (subject, predicate, object) triplets. By indexing these triplets, along with additional ontological information, you can make robust inferences from limited data.

knowledge graph example

For example, in the triplet (Beatles, performed, 'Hello, Goodbye'), "Beatles" is the subject, "performed" is the predicate, and "'Hello, Goodbye'" is the object. Such triplets can be linked and expanded to build a comprehensive knowledge graph.

Extracting Triplets

Navigate to the Triplets tool to get started. Pick a name for your triplet extraction job and upload your data formatted as a text file. Click the "Create" button to process. You'll then be redirected to the Datasets view where you can see the progress of your triplets job.

Once completed, click the name of your triplets job. You should see a preview of your triplets in a table with four columns like:

SubjectPredicateObjectSource
Beatlesperformed'Hello, Goodbye'In a vibrant performance, the Beatles enchanted the audience with their lively rendition of "Hello, Goodbye."

Scrolling to the bottom of the view you'll find the "Download Full Dataset" button to download the triplets. Once downloaded, you can use them to build a knowledge graph to use with GraphRAG based applications.

Build a Knowledge Graph

You can use your triplets in a variety of ways, including building a knowledge graph to query. In this example, we'll use the llamaindex KnowledgeGraphIndex class to help us do just that. First, make sure you have the following dependencies installed:

pip install llama-index-llms-openai==0.1.21 llama-index-readers-file llama-index-embeddings-openai pandas

Then run this example, updating the fields as indicated with the name of your downloaded triplets csv file and your desired query. If you want to generate a response from your knowlege graph using OpenAI, make sure to set your key as an environment variable.

import os
import pandas as pd

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex
from llama_index.core.graph_stores import SimpleGraphStore
from llama_index.core import Document

# Create a node parser
node_parser = SentenceSplitter()

# Create an empty index
index = KnowledgeGraphIndex(
    [],
)

# Prepare triplets
df = pd.read_csv("triplets.csv")
grouped = df.groupby("Source").apply(lambda x: list(zip(x["Subject"], x["Predicate"], x["Object"])))

# Create a knowledge graph from the triplets
for node, triplets in grouped.items():
    document = Document(text=node)
    node = node_parser.get_nodes_from_documents([document])[0]
    for triplet in triplets:
        index.upsert_triplet_and_node(triplet, node)

# Query your index
query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about <YOUR QUESTION HERE>",
)
print(response.response)

What's next?

You've extracted triplets from your text data to build a knowledge graph. You can explore other tools in the Remyx Studio like:

Was this page helpful?