Utilizing Machine Studying to Enhance the Constancy of Non-Participant Characters in Coaching Simulations

On November 9, 1979, the North American Aerospace Command’s (NORAD’s) early warning system interpreted a coaching situation involving Soviet submarines as an precise nuclear assault on the US. Within the six minutes that adopted, the American navy went on the very best degree of alert. Afterward, coaching simulations have been explicitly moved exterior of the NORAD advanced to forestall such a scenario from occurring once more sooner or later.

Whereas the end result of this fall day at NORAD is undeniably terrifying to contemplate, these of us considering coaching and train improvement work arduous to carry realism to each situation we design. However there are obstacles to creating lifelike eventualities. On this weblog put up, extracted from a extra detailed SEI technical report, we describe our use of machine-learning (ML) modeling and a set of software program instruments to create decision-making preferences for non-player characters (NPCs) in order that they are going to be extra credible and plausible to recreation gamers.

The perfect-case situation is for gamers to not have the ability to distinguish between an train and their each day operations. Experiences that appear actual to gamers in coaching and train eventualities improve studying. Bettering the constancy of automated NPCs can improve the extent of realism skilled by gamers.

In our analysis, we take a look at ML options and ensure that NPCs can exhibit lifelike laptop exercise that improves over time. We carry the eventualities that we construct to life by means of our GHOSTS framework, which is an NPC simulation-and-orchestration platform for lifelike community habits and ensuing site visitors. The ideas described on this put up, nevertheless, is also tailored to different NPC frameworks.

NPCs simulate real-world consumer exercise and create correct community site visitors. The perfect cyber-defense groups triangulate their findings primarily based on community site visitors, logs, sensor information, and a rising host-based toolchain. We due to this fact concentrate on general exercise realism and ensure our method by evaluating NPC exercise to real-world customers performing the identical exercise. There’s a giant corpus of artifacts and information needed for coaching and train eventualities, and designers should usually create a complete universe to clarify the folks, locations, and exercise that may happen all through the lifecycle of the coaching or train occasion.

We enhance the realism of NPCs in coaching workouts with new software program we have now created known as ANIMATOR. The flexibility of ANIMATOR to extend the realism of NPCs is related and helpful to anybody who’s tasked with creating coaching for cyberteams. Our major objective in ANIMATOR is to make our information as lifelike as doable by utilizing weighted randomization for as many datapoints about NPCs for which we are able to discover datasets.


Within the training-exercise eventualities we create, ML holds the important thing to constructing a pondering teammate or adversary. Nevertheless, there are challenges to its software. Along with the necessity for constancy of consumer simulations, a key problem is the tendency of individuals to recreation the system.

Gamers are at all times searching for patterns and can shortly exploit NPC weaknesses. This gaming of the system isn’t dishonest, neither is it an try to realize an unfair benefit. Slightly, it occurs in several methods—both knowingly or unknowingly—by leveraging game-isms (unrealistic patterns that happen in an train).

An instance of a game-ism is when an train provides a restricted, shared web, the place the scope of site visitors in or out of a pleasant community is unrealistically restricted. This situation makes it straightforward for gamers to (1) filter site visitors to focus on potential points shortly or (2) establish site visitors from particular IP addresses as problematic. Within the worst case, gamers can place IP blocks in permitted or unapproved lists—a way that might not work in real-world community operations. This instance underscores why realism ought to stay the very best precedence for coaching and train builders.

Cybersecurity coaching requires the coordination of distributed software program brokers that drive NPCs and their actions. The automation required to attain most constancy and decrease game-ism is offered solely by means of using ML.

Real looking Looking by NPCs

To enhance the constancy of consumer simulation, our GHOSTS software program brokers allow NPCs to browse the Web utilizing any main browser. We configure brokers to affiliate NPCs with preferences by making requests in a specific order or randomly utilizing a provided listing. Most implementations use randomness, which is a gameable attribute.

Gamers utilizing monitoring methods can infer details about shopping classes, and these inferences allow them to filter and unrealistically monitor classes. Our first trace of this drawback was once we noticed gamers monitoring the NPC browser’s user-agent (UA) string in several methods whereas monitoring NPC-based outbound net requests. The UA string uniquely identifies the browser getting used, together with its model, working system, and sort of machine (e.g., laptops, telephones, and different computing gadgets).

Beforehand, we constructed mechanisms to alter this UA string periodically for every NPC and even randomize adjustments to it over time. Altering the UA string simulates how customers would possibly replace or change their net browsers periodically over time. With this method, we are able to additionally implement UA strings recognized to be questionable or malicious. Nevertheless, we noticed gamers gaming the system by searching for UA strings that didn’t observe the patterns of UA strings in current releases of main browsers. Because of this, gamers flagged our use of other or malicious strings instantly.

The extent to which participant groups used this info of their filtering and monitoring pressured us to rethink the worth of true randomization and to re-examine what real-world shopping habits seems to be like on a typical community. We used the GHOSTS framework to look at patterns in NPC shopping habits and requested questions akin to

  • What does lifelike net shopping appear to be to a community workforce?
  • What’s the motivation behind explicit shopping patterns?
  • In a big, distributed system, how can we introduce the correct diploma of randomness with out alerting gamers that the randomness is laptop generated?

When researching shopping patterns, we thought of what folks do when shopping the online. An NPC that browses web sites randomly—going from information, to sports activities, to purchasing—appears synthetic and inconsistent with the true world.

Individuals usually discover an internet site in depth. They could interact in studying long-form content material that’s not captured on a single web page. They could search by means of lengthy lists of content material that’s paginated by design resulting from its size. They could evaluate a number of completely different objects which are showcased intimately on separate pages. They could learn information articles that spotlight their different pursuits. Because of this, we launched the notion of an internet site’s stickiness (an enticement to browse past the house web page). We carried out this configurable function with a point of randomness but in addition with the flexibility to have NPCs go to not less than some variety of extra pages from the web page first visited inside a web site. After we included stickiness into our method, we have been higher in a position to simulate a consumer clicking related hyperlinks on pages throughout an internet site, thereby rising the constancy of NPCs and the brokers that management them.

NPC Context and Preferences

GHOSTS data each exercise a software program agent executes to manage an NPC and the outcomes. Brokers can use that information to assist the NPC make choices, and previous NPC choices can have an effect on future ones.

Examples of an NPC’s preferences are sure web sites, explicit duties, and the way it responds to emails. Preferences may additionally embrace some adverse partiality (i.e., avoiding sure duties). Though our major objective is to enhance how an NPC browses related hyperlinks on an internet site, we additionally introduce a extra formidable functionality: offering context for an NPC to make steady choices about its future. Context consists of

  • human elements—details about the consumer, social atmosphere, and consumer’s process
  • bodily atmosphere—location, infrastructure, and bodily situations

Social atmosphere and tasking might be associated when NPCs are a part of a workforce that performs duties particular to that workforce. Up to now, we constructed coaching and workouts to mannequin real-world workforce behaviors. For instance, Group A performs this set of particular duties, and Group B performs another separate set of duties (a lot as you would possibly count on a logistics and advertising workforce to do within the company world). By assigning these preferences to NPCs, we replicate these workforce configurations extra dynamically and allow them to evolve.

Our method to fixing the problem of lifelike shopping and studying from the context and choices the NPCs make over time is to make use of ML methods that concentrate on personalization. Nevertheless, there are related NPC behaviors in GHOSTS that may assist us perceive and enhance these behaviors over time. The consumer fashions which are carried out in several workouts by way of GHOSTS are huge and can proceed to develop; due to this fact, understanding how NPCs make choices gives necessary pointers to assist participant groups as they prepare and carry out workouts in ever-evolving cyber eventualities.

Utilizing Personas

The time period desire as we use it consists of comparability, prioritization, and selection rating. If preferences are evaluations, due to this fact, they’re precious to an NPC and supply context to assist inform choices. Preferences additionally allow an NPC to match related issues.

As GHOSTS NPCs make extra knowledgeable and extra advanced choices, there’s a want for every NPC to (1) have an current system of preferences when it’s created and (2) have the ability to replace these preferences over time because it makes choices and measures the outcomes. To expedite creating NPCs with related capabilities, the preliminary preferences are drawn from a predefined persona. Every persona has a set of ranked curiosity attributes, akin to a desire for information, sports activities, or leisure. To keep up an NPC’s heterogeneity, the values of a persona are copied to the person NPCs randomly. An NPC is due to this fact assigned to an preliminary mounted worth when a persona has a spread for a given desire.

For instance, an enclave of NPCs in logistics is drawn from a persona with a number of purposes used to handle logistics duties. The persona has a spread for every of those purposes; when brokers are created, they get a random mounted quantity from that vary. Amongst particular person NPCs within the enclave, due to this fact, some desire software A over B. Pursuits are sometimes multi-faceted, so a single NPC can have a number of pursuits; choices should account for these a number of pursuits.

Together with Preferences and Determination Making in ML Fashions

The objective is for a specific NPC’s shopping historical past to point out patterns that mirror its actions (e.g., studying the information when the NPC begins its shift or searching for new sneakers over lunch). Analyzing a shopping historical past ought to establish overarching duties. On this case, even a easy sample that displays a process is an enchancment over purely random shopping.

Purely random shopping was a easy, frequent use case for many consumer simulations, however this method doesn’t mirror human habits. In human habits, we are able to search for particular info or execute a selected process. However purely random shopping produces a browser historical past that bounces from web site to web site arbitrarily—with no obvious connections or motive, as if the NPC has no intent behind its shopping actions.

To shift from this arbitrariness, we (1) categorize all of the web sites an NPC visits and (2) construct and apply a desire engine.

Classifying Web sites

Classifying the web sites that an NPC agent might go to ought to end in every web site being a member of some variety of classes. Such a categorization is a machine studying (ML) drawback, and ML researchers are regularly refining many alternative approaches to its answer.

Since we management the Web in any simulation, coaching, or train occasion, we are able to pre-classify all web sites that an NPC would possibly browse. To do that, we created a listing of high websites and categorized them with the identical attributes we use to outline pursuits for our NPCs. A easy manner to consider categorization is to contemplate how an internet listing would possibly listing a specific web site. Net searches have turn out to be ubiquitous, so net directories aren’t as broadly used, however they nonetheless exist. For our functions, DMOZ (quick for listing.mozilla.org) is helpful as a result of it provides not less than a single class for every web site in our itemizing:

  • arts
  • enterprise
  • computer systems
  • video games
  • well being
  • residence
  • youngsters
  • information
  • recreation
  • reference
  • science
  • purchasing
  • society

Cross-referencing our listing of domains with a class enabled us to align NPC shopping to the websites that match their preferences. We polled every web site and captured related metadata—together with the location’s key phrases and outline to cross-reference that info with our chosen NPC classes. We did this cross-referencing by performing easy key phrase matching for the key phrases we beforehand constructed for our NPC classes, which enabled us to cross-reference websites with classes and tag each appropriately, as proven in Desk 1:


Desk 1: Web sites Annotated with Descriptions, Key phrases, and Classes

As GHOSTS brokers make extra knowledgeable and complicated choices, there’s a want for every agent to have a system of preferences current on the time the agent is created, and for a capability to replace these preferences over time because the agent continues to make choices and measure the end result of these choices afterward. To implement this functionality, we created SPECTRE software program, an optionally available package deal throughout the GHOSTS framework that permits GHOSTS brokers to make preference-based choices and to make use of the end result of these choices to study and consider future selections extra intelligently.

Our GHOSTS NPCs want a desire that motivates them to pick which web site to browse subsequent. We represented every desire with a easy key/worth pair. Keys might be any distinctive string, whereas values have to be an integer starting from 100 (representing a robust desire) to -100 (representing a very robust dislike). Utilizing this method, an NPC with a robust desire for computer systems and a robust dislike for printing can be represented as

[{"computers":100}, {"printing":-100}]

An NPC can have any variety of preferences, and whereas they’ll have normal preferences like “computer systems,” that desire will also be way more exact, maybe indicating a selected most well-liked software program software, printer, or file share. See Determine 1 for an instance.


Determine 1: Precision in Preferences

NPCs can acquire new preferences and their current preferences can change over time. These adjustments are dealt with transactionally, so will increase or decreases in a specific desire are tracked. We are able to due to this fact return to any time limit and decide what an NPC’s desire was and the way it has modified.

Now that we have now NPCs that desire to do some issues over others, we are able to look extra carefully on the duties they could carry out from a browser and the way they could browse to finish that process. We are able to additionally align an NPC’s preferences to browse for info over lunch in order that sports activities followers can get the most recent scores. To perform our objective of constructing an ML mannequin that improves NPC shopping patterns in a manner that extra carefully matches its shopping historical past to its preferences, we want three units of information:

  • NPC preferences
  • present NPC browser historical past
  • listing of categorized web sites

With this information, we would think about every NPC by way of the query, “Does your browser historical past match the content material related along with your position and preferences?” As mentioned beforehand, we have now a listing of internet sites and their classifications primarily based on their content material and a mechanism for assigning a persona to an NPC and buying the relevant desire settings. Because the detailed historical past of each GHOSTS NPC’s motion is logged, we are able to reconstruct any single NPC’s shopping historical past.

We construct an ML mannequin that gives higher shopping patterns in the identical manner that client websites use information (e.g., utilizing a client’s earlier exercise or buy historical past to advocate merchandise that may curiosity them). If a client is searching for a brand new laptop computer, the patron web site would possibly ask them if they’re considering shopping for an additional laptop computer charger as effectively. In our ML mannequin, we ask the NPC these questions:

  • Based mostly on (1) websites that you’ve got browsed up to now and (2) a web site’s alignment to your preferences, would you browse this web site sooner or later?
  • If sure, would you be considering shopping different websites?
  • What would possibly these websites be?
  • Are these websites just like this one?

Much like shoppers having a purchase order historical past, we have now an NPC’s shopping historical past. Utilizing shopping historical past, we are able to carry out the next steps:

  1. Decide if the location matches any NPC preferences, both optimistic or adverse.
  2. Based mostly on the matches discovered, add or take away the location from the subsequent iteration of websites to browse.
  3. Based mostly on the ultimate set of websites the NPC is considering, discover websites which are just like this set.

Step 3 incorporates our ML mannequin, which finds websites just like the NPC’s preferences after an iteration of shopping. NPC exercise must also mirror the randomness that people typically exhibit. We should due to this fact watch out to permit the sort of randomness no matter what number of occasions the mannequin is run.

Outcomes and Future Analysis Questions

Utilizing the methodology described right here, we iteratively created and adjusted fashions resulting in a 26 p.c enchancment in an NPC’s skill to browse websites that carefully match its preferences. See our report for the total particulars of our outcomes.

Whereas our outcomes present that a mean of an NPC’s shopping historical past is extra aligned to its major desire, we perceive that this can be a drastically simplified illustration of human shopping habits. There stays nice alternative for future work to develop the notion of personas and the variety of preferences {that a} single NPC would possibly concurrently preserve. Equally, utilizing the outcomes of the mannequin additionally provides future alternative to reply questions akin to

  • Ought to the size of content material an NPC consumes matter? Does long-form content material matter kind of?
  • Does the frequency of content material matter? If an NPC sees content material aligned to at least one desire way over different preferences, how does that affect the NPC’s general set of preferences?
  • If frequency issues, what occurs when an NPC saturates a specific desire? Does an NPC change from its browser to a different software to “take a break?”
  • How ought to we motive about adverse preferences? What affect have they got for an NPC in relation to correlating optimistic preferences?
  • How do NPCs implement the outcomes of a choice? For instance, does the NPC linger on a web page longer when it aligns with its preferences?

Leave a Comment