10 November 2020

DadenU Day: NodeJS NLP and Grakn


Over the last few months we've been looking at alternative to Chatscript and Apache Jena for chatbot development. We really like the chatbot-knowledge graph architecture, but we've found Chatscript too idiosyncratic for our purposes,  and lacking any solid scripting/programming capability, and Jena a bit buggy and limited.

On the Chatbot front we've had a play with all the standard offerings, all Machine Learning (ML) based, including Dialogflow, Microsoft LUIS, RASA, IBM Watson etc. In most cases it wasn't the performance as such that we found wanting but rather the authoring. All of them seemed very fiddly, with no proper "authoring" support (just coding or tables), and in several cases the intent-pattern-action grouping that is the core of the ML part was split over multiple files so authoring was a real pain (with RASA some people were even writing their own Python pre-processors so they could work with a more logical layout that then spat out the RASA format).

For the knowledge graph Neo4j looked a very good solution, looks fantastic and lots of capability. It doesn't have a "pure" semantic triples model, but through the neosemantics plug in it would read out Turtle files and work more like a triples store. But we then ran into deployment issues, with the plug-in not available on the community edition, the server hosting need certificates for clients as well as server, and the hosted version being very expensive as you're billed all the time the server is up, not just when used. We might come back to it, but at this stage the challenges were too great.

So our front-runners have ended up as NodeJS for the NLP side of the bot, and Grakn for the knowledge graph/triplesbase, and DadenU gave me a day to play with them a bit more, and get them talking to each other!

NodeJS

NodeJS is the "server" form of Javascript. There's a lot to love about NodeJS - it's good old familiar Javascript and feels relatively lightweight - in the sense you're not weighed down with IDEs, modules for everything and lines of code that you just have to trust. There is though a very full module library if you want to use it, and its robust and eminently scalable.

For NLP there are libraries that include both takes on Machine Learning intent-action style chatbots and traditional part-of-speech NLP analysis. The nice thing is that you don't have to choose, you can use both!

For ML there is a wonderful library released into open-source by Axa (the insurance people) called simply nlp.js which takes a standard RASA/LUIS/Watson ML approach but everything is controlled by a simple single data object such as:

{
      "intent": "agent.age",
      "utterances": [
        "your age",
        "how old is your platform",
        "how old are you",
        "what's your age",
        "I'd like to know your age",
        "tell me your age"
      ],
      "answers": [
        "I'm very young",
        "I was created recently",
        "Age is just a number. You're only as old as you feel"
      ]
}

You can have as many as you want, spread these over multiple files, and having everything in one place makes things dead easy.

Then there's another library called Natural from NaturalNode which provides Parts-of Speech Tagging, stemming and other NLP functions. Adding this to the bot meant that I could:

  • Use Natural to identify discourse act types (eg W5H question etc)
  • Use Natural to extract nouns (which will be the knowledge graph entities)
  • Use Natural to do any stemming I need
  • Use Axa-NLP to identify intent (using "it" and a dummy word in place of the noun) and pass back the predicate or module needed to answer the question
  • Assess NLP confidence scores to decide whether to get the answer from the knowledge graph (if needed) or a fallback response.

Grakn

Grakn is a knowledge graph database which you can install locally or on the cloud. Most of it is open source, just the enterprise management system is paid for. There is a command line database server (core), and a graphical visualiser and authoring tool (Workbase). 

With Grakn you MUST set up your ontology first, which is good practice anyway. It is defined in GRAQL, a cross between Turtle and SPARQL, and which is very readable, eg:

person sub entity,
  plays employee,
  has name,
  has description,
  has first-name,
  has description,
  has last-name;

Data can then be imported from a similar file, or created programmatically (eg form a CSV) - and in fact the ontology can also be done programmatically.

$dc isa product,
    has description "Discourse is Daden's chatbot and conversational AI platform which uses a combination of machine-learning, semantic knowledge graphs, grammar and rules to deliver innovative conversational solutions.",
    has benefit "Discourse seperates out the tasks of language understanding from content management, so clients can focus on information management, not linguistics",
    has name "Discourse";

With the data entered you can then visualise it in Workbase and run queries, visualising the data or the ontology.


Interfacing NodeJS and Grakn

Luckily there is a nice simple interface for NodeJS to Grakn at https://dev.grakn.ai/docs/client-api/nodejs. This lets you make a GRAQL call to Grakn, retrieve a JSON structure with the result (often a graph fragment), and then use as required.

const GraknClient = require("grakn-client");

async function graql (keyspace, query) {
const client = new GraknClient("localhost:48555");
const session = await client.session(keyspace);
const readTransaction = await session.transaction().read();
  answerIterator = await readTransaction.query(query);
const response = await answerIterator.collect();
await readTransaction.close();
await session.close();
client.close();
return(response);
}

(Re)Building Abi

So I think I've got the makings of a very nice knowledge graph driven chatbot/conversational AI system, leveraging Machine Learning where it makes sense, but also having traditional linguistic NLP tools available when needed. The basic flow is an extension of that presented above:

  • Use Natural to identify discourse act types (eg W5H question etc)
  • Use Natural to extract nouns (which will be the knowledge graph entities, we can do our own toeknisation for compound nouns and proper names)
  • Use Natural to do any stemming I need
  • Use Axa-NLP to identify intent (using "it" and a dummy word in place of the noun) - typically the predicate needed of the noun/subject entity in order to answer the question, or a specialised module/function
  • Make the Graql call to Grakn for the object entity associated with the noun/subject and predicate
  • Give the answer back
Of course having a full programming language like NodeJS available means that we can make things a lot more complex than this.

As a test case of an end-to-end bot I'm now rebuilding Abi, our old website virtual assistant to see how it all fares, and it it looks good we'll plug this new code into out existing user-friendly back end, so that chatbot authors can focus on information and knowledge, and leave Discourse to look after the linguistics.










No comments:

Post a Comment