热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

算法-编译原理之词法分析器-状态机SM.确定性有限自动机DFA.非确定性有限自动机NFA

算法-编译原理之词法分析器-状态机SM.确定性有限自动机DFA.非确定性有限自动机NFA--Linux通用技术-Linux编程与内核信息,下面是详情阅读。
/***********************************************
Title : StateMachine-NFA-DFA.c
Author :
Time :
************************************************/

/*********************************************************************************
** $Id: acsmx.c,v 1.4 2003/03/05 16:42:11 chrisgreen Exp $
**
** Multi-Pattern Search Engine
**
** Aho-Corasick State Machine - uses a Deterministic Finite Automata - DFA
**
** Copyright (C) 2002 Sourcefire,Inc.
** Marc Norton
**
** Reference - Efficient String matching: An Aid to Bibliographic Search
** Alfred V Aho and Margaret J Corasick
** Bell Labratories
** Copyright(C) 1975 Association for Computing Machinery,Inc
**
** Implemented from the 4 algorithms in the paper by Aho & Corasick
** and some implementation ideas from 'Practical Algorithms in C'
**
** Notes:
** 1) This version uses about 1024 bytes per pattern character - heavy on the memory.
** 2) This algorithm finds all occurrences of all patterns within a
** body of text.
** 3) Support is included to handle upper and lower case matching.
** 4) Some comopilers optimize the search routine well, others don't, this makes all the difference.
** 5) Aho inspects all bytes of the search text, but only once so it's very efficient,
** if the patterns are all large than the Modified Wu-Manbar method is often faster.
** 6) I don't subscribe to any one method is best for all searching needs,
** the data decides which method is best,
** and we don't know until after the search method has been tested on the specific data sets.
**
** May 2002 : Marc Norton 1st Version
** June 2002 : Modified interface for SNORT, added case support
** Aug 2002 : Cleaned up comments, and removed dead code.
** Nov 2,2002: Fixed queue_init() , added count=0
**
***********************************************************************************/




#include
#include
#include




#define ALPHABET_SIZE 256
#define ACSM_FAIL_STATE -1


typedef struct _acsm_pattern {

struct _acsm_pattern *next;
unsigned char *patrn;
unsigned char *casepatrn;
int n;
int nocase;
int offset;
int depth;
unsigned id;
int iid;
} ACSM_PATTERN;

typedef struct {

/* Next state - based on input character */
int NextState[ ALPHABET_SIZE ];

/* Failure state - used while building NFA & DFA */
int FailState;

/* List of patterns that end here, if any */
ACSM_PATTERN *MatchList;

}ACSM_STATETABLE;

/* ----State machine Struct----*/
typedef struct {

int acsmMaxStates;
int acsmNumStates;

ACSM_PATTERN * acsmPatterns;
ACSM_STATETABLE * acsmStateTable;

int bcSize;
short bcShift[256];

}ACSM_STRUCT;




/*==========================function declaration======================*/
ACSM_STRUCT * acsmNew ();
int acsmAddPattern( ACSM_STRUCT * p, unsigned char * pat, int n,
int nocase, int offset, int depth, unsigned id, int iid );
int acsmCompile ( ACSM_STRUCT * acsm );
int acsmSearch ( ACSM_STRUCT * acsm,unsigned char * T, int n,
int (*Match)( unsigned id, int index, void * data ),
void * data );
void acsmFree ( ACSM_STRUCT * acsm );




#define MEMASSERT(p,s) if(!p){fprintf(stderr,"ACSM-No Memory: %s!\n",s);exit(0);}

#ifdef DEBUG_AC
static int max_memory = 0;
#endif





/*==========================function implementation==========================*/
/*----------------MALLOC--------------------*/
static void *AC_MALLOC (int n)
{
void *p;
p = malloc (n);
#ifdef DEBUG_AC
if (p)
max_memory += n;
#endif
return p;
}

/*------------FREE--------------*/
static void AC_FREE (void *p)
{
if(p) free(p);
}

/*------Simple QUEUE NODE---------*/
typedef struct _qnode
{
int state;
struct _qnode *next;
}
QNODE;

/*------------Simple QUEUE Structure-------------*/
typedef struct _queue
{
QNODE * head, *tail;
int count;
}
QUEUE;

/*------------------queue init------------------------*/
static void queue_init(QUEUE * s)
{
s->head = s->tail = 0;
s->count = 0;
}


/*---------------------Add Tail Item to queue----------------------*/
static void queue_add (QUEUE * s, int state)
{
QNODE * q;
if (!s->head){
q = s->tail = s->head = (QNODE *) AC_MALLOC (sizeof (QNODE));
MEMASSERT (q, "queue_add");
q->state = state;
q->next = 0;
}
else{
q = (QNODE *) AC_MALLOC (sizeof (QNODE));
MEMASSERT (q, "queue_add");
q->state = state;
q->next = 0;
s->tail->next = q;
s->tail = q;
}
s->count++;
}


/* -----------Remove Head Item from queue-------------*/
static int queue_remove (QUEUE * s)
{
int state = 0;
QNODE * q;

if (s->head){
q = s->head;
state = q->state;
s->head = s->head->next;
s->count--;
if (!s->head){
s->tail = 0;
s->count = 0;
}
AC_FREE (q);
}

return state;
}


/*---------------queue_count-----------------*/
static int queue_count (QUEUE * s)
{
return s->count;
}


/*----------------------queue_free--------------------------*/
static void queue_free (QUEUE * s)
{
while (queue_count(s)){
queue_remove (s);
}
}


/* ----Case Translation Table ----*/
static unsigned char xlatcase[256];

/*---------init_xlatcase----------*/
static void init_xlatcase ()
{
int i;
for (i = 0; i <256; i++){
xlatcase = toupper (i);
}
}


/*-------------------------------------------*/
static inline void ConvertCase (unsigned char *s, int m)
{
int i;
for (i = 0; i s = xlatcase[s];
}
}


/*---------------------------------------*/
static inline void ConvertCaseEx (unsigned char *d, unsigned char *s, int m)
{
int i;
for (i = 0; i d = xlatcase[s];
}
}


/*------------------------------------------------*/
static ACSM_PATTERN * CopyMatchListEntry (ACSM_PATTERN * px)
{
ACSM_PATTERN * p;
p = (ACSM_PATTERN *) AC_MALLOC (sizeof (ACSM_PATTERN));
MEMASSERT (p, "CopyMatchListEntry");
memcpy (p, px, sizeof (ACSM_PATTERN));
p->next = 0;
return p;
}


/*------------------------------------------------------------------------
* Add a pattern to the list of patterns terminated at this state.
* Insert at front of list.
-------------------------------------------------------------------------*/
static void AddMatchListEntry (ACSM_STRUCT * acsm, int state, ACSM_PATTERN * px)
{
ACSM_PATTERN * p;
p = (ACSM_PATTERN *) AC_MALLOC (sizeof (ACSM_PATTERN));
MEMASSERT (p, "AddMatchListEntry");
memcpy (p, px, sizeof (ACSM_PATTERN));
p->next = acsm->acsmStateTable[state].MatchList;
acsm->acsmStateTable[state].MatchList = p;
}


/* -------------------------------------------------------------------------------------------------------------------
Add Pattern States
---------------------------------------------------------------------------------------------------------------------*/
static void AddPatternStates (ACSM_STRUCT * acsm, ACSM_PATTERN * p)
{
unsigned char *pattern;
int state=0, next, n;
n = p->n;
pattern = p->patrn;

/* Match up pattern with existing states*/
for (; n > 0; pattern++, n--){
next = acsm->acsmStateTable[state].NextState[*pattern];
if (next == ACSM_FAIL_STATE) break;
state = next;
}

/* Add new states for the rest of the pattern bytes, 1 state per byte*/
for (; n > 0; pattern++, n--){
acsm->acsmNumStates++;
acsm->acsmStateTable[state].NextState[*pattern] = acsm->acsmNumStates;
state = acsm->acsmNumStates;
}

AddMatchListEntry (acsm, state, p);
}


/*-----------Build Non-Deterministic Finite Automata--------*/
static void Build_NFA (ACSM_STRUCT * acsm)
{
int r, s;
int i;
QUEUE q, *queue = &q;
ACSM_PATTERN * mlist=0;
ACSM_PATTERN * px=0;

/* Init a Queue */
queue_init (queue);

/* Add the state 0 transitions 1st */
for (i = 0; i s = acsm->acsmStateTable[0].NextState;
if(s){
queue_add (queue, s);
acsm->acsmStateTable[s].FailState = 0;
}
}

/* Build the fail state transitions for each valid state */
while (queue_count (queue) > 0){
r = queue_remove (queue);

/* Find Final States for any Failure */
for (i = 0; i int fs, next;
if ((s = acsm->acsmStateTable[r].NextState) != ACSM_FAIL_STATE){
queue_add (queue, s);
fs = acsm->acsmStateTable[r].FailState;

/* Locate the next valid state for 'i' starting at s*/
while ((next=acsm->acsmStateTable[fs].NextState) ==
ACSM_FAIL_STATE){
fs = acsm->acsmStateTable[fs].FailState;
}

/*Update 's' state failure state to point to the next valid state */
acsm->acsmStateTable[s].FailState = next;

/*
* Copy 'next'states MatchList to 's' states MatchList,
* we copy them so each list can be AC_FREE'd later,
* else we could just manipulate pointers to fake the copy.
*/
for (mlist = acsm->acsmStateTable[next].MatchList;
mlist != NULL ;mlist = mlist->next){
px = CopyMatchListEntry (mlist);

if( !px ){
printf("*** Out of memory Initializing Aho Corasick in acsmx.c ****");
}

/* Insert at front of MatchList */
px->next = acsm->acsmStateTable[s].MatchList;
acsm->acsmStateTable[s].MatchList = px;
}
}
}
}

/* Clean up the queue */
queue_free (queue);
}


/*------------Build Deterministic Finite Automata from NFA---------------*/
static void Convert_NFA_To_DFA (ACSM_STRUCT * acsm)
{
int r, s;
int i;
QUEUE q, *queue = &q;

/* Init a Queue */
queue_init (queue);

/* Add the state 0 transitions 1st */
for (i = 0; i s = acsm->acsmStateTable[0].NextState;
if (s){
queue_add (queue, s);
}
}

/* Start building the next layer of transitions */
while (queue_count (queue) > 0){
r = queue_remove (queue);

/* State is a branch state */
for (i = 0; i if ((s = acsm->acsmStateTable[r].NextState) != ACSM_FAIL_STATE){
queue_add (queue, s);
}
else{
acsm->acsmStateTable[r].NextState =
acsm->acsmStateTable[acsm->acsmStateTable[r].FailState].
NextState;
}
}
}

/* Clean up the queue */
queue_free (queue);
}


/*-----------------------------------------*/
ACSM_STRUCT * acsmNew ()
{
ACSM_STRUCT * p;
init_xlatcase ();
p = (ACSM_STRUCT *) AC_MALLOC (sizeof (ACSM_STRUCT));
MEMASSERT (p, "acsmNew");
if (p)
memset (p, 0, sizeof (ACSM_STRUCT));
return p;
}


/*-----------------------------Add a pattern to the list of patterns for this state machine-------------------------*/
int acsmAddPattern (ACSM_STRUCT * p, unsigned char *pat, int n, int nocase,
int offset, int depth, unsigned id, int iid)
{
ACSM_PATTERN * plist;
plist = (ACSM_PATTERN *) AC_MALLOC (sizeof (ACSM_PATTERN));
MEMASSERT (plist, "acsmAddPattern");
plist->patrn = (unsigned char *) AC_MALLOC (n);
ConvertCaseEx (plist->patrn, pat, n);
plist->casepatrn = (unsigned char *) AC_MALLOC (n);
memcpy (plist->casepatrn, pat, n);
plist->n = n;
plist->nocase = nocase;
plist->offset = offset;
plist->depth = depth;
plist->id = id;
plist->iid = iid;
plist->next = p->acsmPatterns;
p->acsmPatterns = plist;
return 0;
}


/* ---------------Compile State Machine--------------*/
int acsmCompile(ACSM_STRUCT * acsm)
{
int i, k;
ACSM_PATTERN * plist;

/* Count number of states */
acsm->acsmMaxStates = 1;
for (plist = acsm->acsmPatterns; plist != NULL; plist = plist->next){
acsm->acsmMaxStates += plist->n;
}
acsm->acsmStateTable =(ACSM_STATETABLE *) AC_MALLOC (sizeof (ACSM_STATETABLE) *
acsm->acsmMaxStates);
MEMASSERT (acsm->acsmStateTable, "acsmCompile");
memset (acsm->acsmStateTable, 0,sizeof (ACSM_STATETABLE) * acsm->acsmMaxStates);

/* Initialize state zero as a branch */
acsm->acsmNumStates = 0;

/* Initialize all States NextStates to FAILED */
for (k = 0; k acsmMaxStates; k++){
for (i = 0; i acsm->acsmStateTable[k].NextState = ACSM_FAIL_STATE;
}
}

/* Add each Pattern to the State Table */
for (plist = acsm->acsmPatterns; plist != NULL; plist = plist->next){
AddPatternStates (acsm, plist);
}

/* Set all failed state transitions to return to the 0'th state */
for (i = 0; i if (acsm->acsmStateTable[0].NextState == ACSM_FAIL_STATE){
acsm->acsmStateTable[0].NextState = 0;
}
}

/* Build the NFA */
Build_NFA (acsm);

/* Convert the NFA to a DFA */
Convert_NFA_To_DFA (acsm);

/*printf ("ACSMX-Max Memory: %d bytes, %d states\n", max_memory,
acsm->acsmMaxStates);
*/
return 0;
}


static unsigned char Tc[64*1024];

/*Search Text or Binary Data for Pattern matches*/
int acsmSearch(ACSM_STRUCT * acsm, unsigned char *Tx, int n,
int (*Match) (unsigned id, int index, void *data), void *data)
{
int state;
ACSM_PATTERN * mlist;
unsigned char *Tend;
ACSM_STATETABLE * StateTable = acsm->acsmStateTable;
int nfound = 0;
unsigned char *T;
int index;

/* Case conversion */
ConvertCaseEx (Tc, Tx, n);
T = Tc;
Tend = T + n;

for (state = 0; T state = StateTable[state].NextState[*T];

if( StateTable[state].MatchList != NULL ){
for( mlist=StateTable[state].MatchList; mlist!=NULL;
mlist=mlist->next ){
index = T - mlist->n + 1 - Tc;
if( mlist->nocase ){
nfound++;
if (Match (mlist->id, index, data)) return nfound;
}
else{
if( memcmp (mlist->casepatrn, Tx + index, mlist->n)==0){
nfound++;
if (Match (mlist->id, index, data))
return nfound;
}
}
}
}
}
return nfound;
}


/*-----------------Free all memory----------------*/
void acsmFree(ACSM_STRUCT * acsm)
{
int i;
ACSM_PATTERN * mlist, *ilist;
for (i = 0; i acsmMaxStates; i++){
if (acsm->acsmStateTable.MatchList != NULL){
mlist = acsm->acsmStateTable.MatchList;
while (mlist){
ilist = mlist;
mlist = mlist->next;
AC_FREE (ilist);
}
}
}
AC_FREE (acsm->acsmStateTable);
}


//#ifdef ACSMX_MAIN

/*Text Data Buffer*/
unsigned char text[512];

/*---------------------------------A Match is found-----------------------------------*/
int MatchFound(unsigned id, int index, void *data)
{
fprintf (stdout, "%s\n", (char *) id);
return 0;
}


/*=========================main function=========================*/
int main(int argc, char *argv[])
{
int i, nocase = 0;
ACSM_STRUCT * acsm;
if (argc <3){
fprintf (stderr,"Usage: acsmx pattern word-1 word-2 ... word-n -nocase\n");
exit (0);
}
acsm = acsmNew ();
strcpy (text, argv[1]);
for (i = 1; i if (strcmp (argv, "-nocase") == 0)
nocase = 1;
for (i = 2; i if (argv[0] == '-')
continue;
acsmAddPattern (acsm, argv, strlen (argv), nocase, 0, 0,
(unsigned) argv, i - 2);
}
acsmCompile (acsm);
acsmSearch (acsm, text, strlen (text), MatchFound, (void *) 0);
acsmFree (acsm);
printf ("normal pgm end\n");
return (0);
}
//#endif /* */
推荐阅读
  • 2018年人工智能大数据的爆发,学Java还是Python?
    本文介绍了2018年人工智能大数据的爆发以及学习Java和Python的相关知识。在人工智能和大数据时代,Java和Python这两门编程语言都很优秀且火爆。选择学习哪门语言要根据个人兴趣爱好来决定。Python是一门拥有简洁语法的高级编程语言,容易上手。其特色之一是强制使用空白符作为语句缩进,使得新手可以快速上手。目前,Python在人工智能领域有着广泛的应用。如果对Java、Python或大数据感兴趣,欢迎加入qq群458345782。 ... [详细]
  • vue使用
    关键词: ... [详细]
  • Echarts图表重复加载、axis重复多次请求问题解决记录
    文章目录1.需求描述2.问题描述正常状态:问题状态:3.解决方法1.需求描述使用Echats实现了一个中国地图:通过选择查询周期&#x ... [详细]
  • 本文介绍了设计师伊振华受邀参与沈阳市智慧城市运行管理中心项目的整体设计,并以数字赋能和创新驱动高质量发展的理念,建设了集成、智慧、高效的一体化城市综合管理平台,促进了城市的数字化转型。该中心被称为当代城市的智能心脏,为沈阳市的智慧城市建设做出了重要贡献。 ... [详细]
  • 本文介绍了数据库的存储结构及其重要性,强调了关系数据库范例中将逻辑存储与物理存储分开的必要性。通过逻辑结构和物理结构的分离,可以实现对物理存储的重新组织和数据库的迁移,而应用程序不会察觉到任何更改。文章还展示了Oracle数据库的逻辑结构和物理结构,并介绍了表空间的概念和作用。 ... [详细]
  • IhaveconfiguredanactionforaremotenotificationwhenitarrivestomyiOsapp.Iwanttwodiff ... [详细]
  • Python字典推导式及循环列表生成字典方法
    本文介绍了Python中使用字典推导式和循环列表生成字典的方法,包括通过循环列表生成相应的字典,并给出了执行结果。详细讲解了代码实现过程。 ... [详细]
  • 本文讨论了在Windows 8上安装gvim中插件时出现的错误加载问题。作者将EasyMotion插件放在了正确的位置,但加载时却出现了错误。作者提供了下载链接和之前放置插件的位置,并列出了出现的错误信息。 ... [详细]
  • CSS3选择器的使用方法详解,提高Web开发效率和精准度
    本文详细介绍了CSS3新增的选择器方法,包括属性选择器的使用。通过CSS3选择器,可以提高Web开发的效率和精准度,使得查找元素更加方便和快捷。同时,本文还对属性选择器的各种用法进行了详细解释,并给出了相应的代码示例。通过学习本文,读者可以更好地掌握CSS3选择器的使用方法,提升自己的Web开发能力。 ... [详细]
  • “你永远都不知道明天和‘公司的意外’哪个先来。”疫情期间,这是我们最战战兢兢的心情。但是显然,有些人体会不了。这份行业数据,让笔者“柠檬” ... [详细]
  • 本文主要解析了Open judge C16H问题中涉及到的Magical Balls的快速幂和逆元算法,并给出了问题的解析和解决方法。详细介绍了问题的背景和规则,并给出了相应的算法解析和实现步骤。通过本文的解析,读者可以更好地理解和解决Open judge C16H问题中的Magical Balls部分。 ... [详细]
  • 本文讨论了使用差分约束系统求解House Man跳跃问题的思路与方法。给定一组不同高度,要求从最低点跳跃到最高点,每次跳跃的距离不超过D,并且不能改变给定的顺序。通过建立差分约束系统,将问题转化为图的建立和查询距离的问题。文章详细介绍了建立约束条件的方法,并使用SPFA算法判环并输出结果。同时还讨论了建边方向和跳跃顺序的关系。 ... [详细]
  • 生成对抗式网络GAN及其衍生CGAN、DCGAN、WGAN、LSGAN、BEGAN介绍
    一、GAN原理介绍学习GAN的第一篇论文当然由是IanGoodfellow于2014年发表的GenerativeAdversarialNetworks(论文下载链接arxiv:[h ... [详细]
  • [译]技术公司十年经验的职场生涯回顾
    本文是一位在技术公司工作十年的职场人士对自己职业生涯的总结回顾。她的职业规划与众不同,令人深思又有趣。其中涉及到的内容有机器学习、创新创业以及引用了女性主义者在TED演讲中的部分讲义。文章表达了对职业生涯的愿望和希望,认为人类有能力不断改善自己。 ... [详细]
  • 本文介绍了在Win10上安装WinPythonHadoop的详细步骤,包括安装Python环境、安装JDK8、安装pyspark、安装Hadoop和Spark、设置环境变量、下载winutils.exe等。同时提醒注意Hadoop版本与pyspark版本的一致性,并建议重启电脑以确保安装成功。 ... [详细]
author-avatar
yangxin
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有