Los ensambladores de la vieja escuela normalmente se codificaban a mano en el ensamblador y se usaban técnicas de análisis adhoc para procesar las líneas de origen del ensamblaje para producir un código de ensamblador real. Cuando la sintaxis del ensamblador es simple (por ejemplo, siempre OPCODE REG, OPERAND) funcionó bien.
Las máquinas modernas tienen conjuntos de instrucciones sucias y desagradables con muchas variaciones de instrucciones y operandos, que pueden expresarse con una sintaxis compleja que permite que múltiples registros de índice participen en la expresión del operando. Permitir expresiones sofisticadas de tiempo de montaje con constantes fijas y reubicables con varios tipos de operadores adicionales complica esto. Los ensambladores sofisticados que permiten compilación condicional, macros, declaraciones de datos estructurados, etc. añaden nuevas demandas a la sintaxis. Procesar toda esta sintaxis mediante métodos ad hoc es muy difícil y es la razón por la que se inventaron los generadores de analizadores sintácticos.
El uso de un BNF y un generador de analizador es una forma muy razonable de construir un ensamblador moderno, incluso para un procesador heredado como el Z80. He creado dichos ensambladores para máquinas Motorola de 8 bits, como el 6800/6809, y me estoy preparando para hacer lo mismo con un moderno x86. Creo que te diriges hacia el camino correcto.
********** EDIT **************** El OP pidió, por ejemplo, definiciones de lexer y analizador. He proporcionado ambos aquí.
Estos son extractos de especificaciones reales para un ensamblador 6809. Las definiciones completas son 2-3 veces el tamaño de las muestras aquí.
Para mantener el espacio reducido, he eliminado gran parte de la complejidad de la esquina oscura , que es el objetivo de estas definiciones. Uno podría estar consternado por la complejidad aparente; el punto es que con tales definiciones, está tratando de describir la forma del idioma, no codificarlo procesalmente. Pagará una complejidad significativamente más alta si codifica todo esto de manera ad hoc, y será mucho menos menos mantenible.
También será de alguna ayuda para saber que estas definiciones se utilizan con un sistema de análisis de programas de gama alta que cuenta con herramientas léxico/análisis sintáctico como subsistemas, llamado el The DMS Software Reengineering Toolkit. DMS creará automáticamente AST a partir de las reglas de gramática
en la especificación del analizador, lo que lo convierte en un lote más fácil de construir herramientas de análisis. Por último, la especificación del analizador contiene las llamadas declaraciones "prettyprinter" , que permite a DMS re-generar el texto fuente de los AST. (El verdadero propósito de la gramática era que nos permita construir AST representan ensamblador instrucciones, y luego escupir hacia fuera a ser alimentado a un ensamblador de verdad!)
Una cosa de la nota: ¿cómo se expresan los lexemas y reglas gramaticales (¡el metasyntxax!) varía un poco entre diferentes sistemas de generador lexer/analizador. La sintaxis de las especificaciones basadas en DMS no es una excepción. DMS tiene reglas de gramática relativamente sofisticadas , que realmente no son prácticas para explicar en el espacio disponible aquí. Tendrá que vivir con la idea de que otros sistemas usan anotaciones similares, para EBNF para reglas y variantes de expresiones regulares para lexemas.
Teniendo en cuenta los intereses de la OP, que puede poner en práctica similares lexer/analizadores con cualquier herramienta de generador de analizador léxico/analizador, por ejemplo, FLEX/YACC, javacc, antlr, ...
******* *** lexer **************
-- M6809.lex: Lexical Description for M6809
-- Copyright (C) 1989,1999-2002 Ira D. Baxter
%%
#mainmode Label
#macro digit "[0-9]"
#macro hexadecimaldigit "<digit>|[a-fA-F]"
#macro comment_body_character "[\u0009 \u0020-\u007E]" -- does not include NEWLINE
#macro blank "[\u0000 \ \u0009]"
#macro hblanks "<blank>+"
#macro newline "\u000d \u000a? \u000c? | \u000a \u000c?" -- form feed allowed only after newline
#macro bare_semicolon_comment "\; <comment_body_character>* "
#macro bare_asterisk_comment "\* <comment_body_character>* "
...[snip]
#macro hexadecimal_digit "<digit> | [a-fA-F]"
#macro binary_digit "[01]"
#macro squoted_character "\' [\u0021-\u007E]"
#macro string_character "[\u0009 \u0020-\u007E]"
%%Label -- (First mode) processes left hand side of line: labels, opcodes, etc.
#skip "(<blank>*<newline>)+"
#skip "(<blank>*<newline>)*<blank>+"
<< (GotoOpcodeField ?) >>
#precomment "<comment_line><newline>"
#preskip "(<blank>*<newline>)+"
#preskip "(<blank>*<newline>)*<blank>+"
<< (GotoOpcodeField ?) >>
-- Note that an apparant register name is accepted as a label in this mode
#token LABEL [STRING] "<identifier>"
<< (local (;; (= [TokenScan natural] 1) ; process all string characters
(= [TokenLength natural] ?:TokenCharacterCount)=
(= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
(= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
[ThisCharacterCode natural]
(define Ordinala #61)
(define Ordinalf #66)
(define OrdinalA #41)
(define OrdinalF #46)
);;
(;; (= (@ Result) `') ; start with empty string
(while (<= TokenScan TokenLength)
(;; (= ThisCharacterCode (coerce natural TokenString:TokenScan))
(+= TokenScan) ; bump past character
(ifthen (>= ThisCharacterCode Ordinala)
(-= ThisCharacterCode #20) ; fold to upper case
)ifthen
(= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=
);;
)while
);;
)local
(= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0)) ; nothing interesting in string
(GotoLabelList ?)
>>
%%OpcodeField
#skip "<hblanks>"
<< (GotoEOLComment ?) >>
#ifnotoken
<< (GotoEOLComment ?) >>
-- Opcode field tokens
#token 'ABA' "[aA][bB][aA]"
<< (GotoEOLComment ?) >>
#token 'ABX' "[aA][bB][xX]"
<< (GotoEOLComment ?) >>
#token 'ADC' "[aA][dD][cC]"
<< (GotoABregister ?) >>
#token 'ADCA' "[aA][dD][cC][aA]"
<< (GotoOperand ?) >>
#token 'ADCB' "[aA][dD][cC][bB]"
<< (GotoOperand ?) >>
#token 'ADCD' "[aA][dD][cC][dD]"
<< (GotoOperand ?) >>
#token 'ADD' "[aA][dD][dD]"
<< (GotoABregister ?) >>
#token 'ADDA' "[aA][dD][dD][aA]"
<< (GotoOperand ?) >>
#token 'ADDB' "[aA][dD][dD][bB]"
<< (GotoOperand ?) >>
#token 'ADDD' "[aA][dD][dD][dD]"
<< (GotoOperand ?) >>
#token 'AND' "[aA][nN][dD]"
<< (GotoABregister ?) >>
#token 'ANDA' "[aA][nN][dD][aA]"
<< (GotoOperand ?) >>
#token 'ANDB' "[aA][nN][dD][bB]"
<< (GotoOperand ?) >>
#token 'ANDCC' "[aA][nN][dD][cC][cC]"
<< (GotoRegister ?) >>
...[long list of opcodes snipped]
#token IDENTIFIER [STRING] "<identifier>"
<< (local (;; (= [TokenScan natural] 1) ; process all string characters
(= [TokenLength natural] ?:TokenCharacterCount)=
(= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
(= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
[ThisCharacterCode natural]
(define Ordinala #61)
(define Ordinalf #66)
(define OrdinalA #41)
(define OrdinalF #46)
);;
(;; (= (@ Result) `') ; start with empty string
(while (<= TokenScan TokenLength)
(;; (= ThisCharacterCode (coerce natural TokenString:TokenScan))
(+= TokenScan) ; bump past character
(ifthen (>= ThisCharacterCode Ordinala)
(-= ThisCharacterCode #20) ; fold to upper case
)ifthen
(= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=
);;
)while
);;
)local
(= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0)) ; nothing interesting in string
(GotoOperandField ?)
>>
#token '#' "\#" -- special constant introduction (FDB)
<< (GotoDataField ?) >>
#token NUMBER [NATURAL] "<decimal_number>"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
(GotoOperandField ?)
>>
#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
(GotoOperandField ?)
>>
#token NUMBER [NATURAL] "\% <binary_digit>+"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
(GotoOperandField ?)
>>
#token CHARACTER [CHARACTER] "<squoted_character>"
<< (= ?:Lexeme:Literal:Character:Value (TokenStringCharacter ? 2))
(= ?:Lexeme:Literal:Character:Format (LiteralFormat:MakeCompactCharacterLiteralFormat 0 0)) ; nothing special about character
(GotoOperandField ?)
>>
%%OperandField
#skip "<hblanks>"
<< (GotoEOLComment ?) >>
#ifnotoken
<< (GotoEOLComment ?) >>
-- Tokens signalling switch to index register modes
#token ',' "\,"
<<(GotoRegisterField ?)>>
#token '[' "\["
<<(GotoRegisterField ?)>>
-- Operators for arithmetic syntax
#token '!!' "\!\!"
#token '!' "\!"
#token '##' "\#\#"
#token '#' "\#"
#token '&' "\&"
#token '(' "\("
#token ')' "\)"
#token '*' "\*"
#token '+' "\+"
#token '-' "\-"
#token '/' "\/"
#token '//' "\/\/"
#token '<' "\<"
#token '<' "\<"
#token '<<' "\<\<"
#token '<=' "\<\="
#token '</' "\<\/"
#token '=' "\="
#token '>' "\>"
#token '>' "\>"
#token '>=' "\>\="
#token '>>' "\>\>"
#token '>/' "\>\/"
#token '\\' "\\"
#token '|' "\|"
#token '||' "\|\|"
#token NUMBER [NATURAL] "<decimal_number>"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
>>
#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
>>
#token NUMBER [NATURAL] "\% <binary_digit>+"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
>>
-- Notice that an apparent register is accepted as a label in this mode
#token IDENTIFIER [STRING] "<identifier>"
<< (local (;; (= [TokenScan natural] 1) ; process all string characters
(= [TokenLength natural] ?:TokenCharacterCount)=
(= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
(= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
[ThisCharacterCode natural]
(define Ordinala #61)
(define Ordinalf #66)
(define OrdinalA #41)
(define OrdinalF #46)
);;
(;; (= (@ Result) `') ; start with empty string
(while (<= TokenScan TokenLength)
(;; (= ThisCharacterCode (coerce natural TokenString:TokenScan))
(+= TokenScan) ; bump past character
(ifthen (>= ThisCharacterCode Ordinala)
(-= ThisCharacterCode #20) ; fold to upper case
)ifthen
(= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=
);;
)while
);;
)local
(= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0)) ; nothing interesting in string
>>
%%Register -- operand field for TFR, ANDCC, ORCC, EXG opcodes
#skip "<hblanks>"
#ifnotoken << (GotoRegisterField ?) >>
%%RegisterField -- handles registers and indexing mode syntax
-- In this mode, names that look like registers are recognized as registers
#skip "<hblanks>"
<< (GotoEOLComment ?) >>
#ifnotoken
<< (GotoEOLComment ?) >>
#token '[' "\["
#token ']' "\]"
#token '--' "\-\-"
#token '++' "\+\+"
#token 'A' "[aA]"
#token 'B' "[bB]"
#token 'CC' "[cC][cC]"
#token 'DP' "[dD][pP] | [dD][pP][rR]" -- DPR shouldnt be needed, but found one instance
#token 'D' "[dD]"
#token 'Z' "[zZ]"
-- Index register designations
#token 'X' "[xX]"
#token 'Y' "[yY]"
#token 'U' "[uU]"
#token 'S' "[sS]"
#token 'PCR' "[pP][cC][rR]"
#token 'PC' "[pP][cC]"
#token ',' "\,"
-- Operators for arithmetic syntax
#token '!!' "\!\!"
#token '!' "\!"
#token '##' "\#\#"
#token '#' "\#"
#token '&' "\&"
#token '(' "\("
#token ')' "\)"
#token '*' "\*"
#token '+' "\+"
#token '-' "\-"
#token '/' "\/"
#token '<' "\<"
#token '<' "\<"
#token '<<' "\<\<"
#token '<=' "\<\="
#token '<|' "\<\|"
#token '=' "\="
#token '>' "\>"
#token '>' "\>"
#token '>=' "\>\="
#token '>>' "\>\>"
#token '>|' "\>\|"
#token '\\' "\\"
#token '|' "\|"
#token '||' "\|\|"
#token NUMBER [NATURAL] "<decimal_number>"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
>>
... [snip]
%% -- end M6809.lex
**************** ******** analizador ******
-- M6809.ATG: Motorola 6809 assembly code parser
-- (C) Copyright 1989;1999-2002 Ira D. Baxter; All Rights Reserved
m6809 = sourcelines ;
sourcelines = ;
sourcelines = sourcelines sourceline EOL ;
<<PrettyPrinter>>: { V(CV(sourcelines[1]),H(sourceline,A<eol>(EOL))); }
-- leading opcode field symbol should be treated as keyword.
sourceline = ;
sourceline = labels ;
sourceline = optional_labels 'EQU' expression ;
<<PrettyPrinter>>: { H(optional_labels,A<opcode>('EQU'),A<operand>(expression)); }
sourceline = LABEL 'SET' expression ;
<<PrettyPrinter>>: { H(A<firstlabel>(LABEL),A<opcode>('SET'),A<operand>(expression)); }
sourceline = optional_label instruction ;
<<PrettyPrinter>>: { H(optional_label,instruction); }
sourceline = optional_label optlabelleddirective ;
<<PrettyPrinter>>: { H(optional_label,optlabelleddirective); }
sourceline = optional_label implicitdatadirective ;
<<PrettyPrinter>>: { H(optional_label,implicitdatadirective); }
sourceline = unlabelleddirective ;
sourceline = '?ERROR' ;
<<PrettyPrinter>>: { A<opcode>('?ERROR'); }
optional_label = labels ;
optional_label = LABEL ':' ;
<<PrettyPrinter>>: { H(A<firstlabel>(LABEL),':'); }
optional_label = ;
optional_labels = ;
optional_labels = labels ;
labels = LABEL ;
<<PrettyPrinter>>: { A<firstlabel>(LABEL); }
labels = labels ',' LABEL ;
<<PrettyPrinter>>: { H(labels[1],',',A<otherlabels>(LABEL)); }
unlabelleddirective = 'END' ;
<<PrettyPrinter>>: { A<opcode>('END'); }
unlabelleddirective = 'END' expression ;
<<PrettyPrinter>>: { H(A<opcode>('END'),A<operand>(expression)); }
unlabelleddirective = 'IF' expression EOL conditional ;
<<PrettyPrinter>>: { V(H(A<opcode>('IF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'IFDEF' IDENTIFIER EOL conditional ;
<<PrettyPrinter>>: { V(H(A<opcode>('IFDEF'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'IFUND' IDENTIFIER EOL conditional ;
<<PrettyPrinter>>: { V(H(A<opcode>('IFUND'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'INCLUDE' FILENAME ;
<<PrettyPrinter>>: { H(A<opcode>('INCLUDE'),A<operand>(FILENAME)); }
unlabelleddirective = 'LIST' expression ;
<<PrettyPrinter>>: { H(A<opcode>('LIST'),A<operand>(expression)); }
unlabelleddirective = 'NAME' IDENTIFIER ;
<<PrettyPrinter>>: { H(A<opcode>('NAME'),A<operand>(IDENTIFIER)); }
unlabelleddirective = 'ORG' expression ;
<<PrettyPrinter>>: { H(A<opcode>('ORG'),A<operand>(expression)); }
unlabelleddirective = 'PAGE' ;
<<PrettyPrinter>>: { A<opcode>('PAGE'); }
unlabelleddirective = 'PAGE' HEADING ;
<<PrettyPrinter>>: { H(A<opcode>('PAGE'),A<operand>(HEADING)); }
unlabelleddirective = 'PCA' expression ;
<<PrettyPrinter>>: { H(A<opcode>('PCA'),A<operand>(expression)); }
unlabelleddirective = 'PCC' expression ;
<<PrettyPrinter>>: { H(A<opcode>('PCC'),A<operand>(expression)); }
unlabelleddirective = 'PSR' expression ;
<<PrettyPrinter>>: { H(A<opcode>('PSR'),A<operand>(expression)); }
unlabelleddirective = 'TABS' numberlist ;
<<PrettyPrinter>>: { H(A<opcode>('TABS'),A<operand>(numberlist)); }
unlabelleddirective = 'TITLE' HEADING ;
<<PrettyPrinter>>: { H(A<opcode>('TITLE'),A<operand>(HEADING)); }
unlabelleddirective = 'WITH' settings ;
<<PrettyPrinter>>: { H(A<opcode>('WITH'),A<operand>(settings)); }
settings = setting ;
settings = settings ',' setting ;
<<PrettyPrinter>>: { H*; }
setting = 'WI' '=' NUMBER ;
<<PrettyPrinter>>: { H*; }
setting = 'DE' '=' NUMBER ;
<<PrettyPrinter>>: { H*; }
setting = 'M6800' ;
setting = 'M6801' ;
setting = 'M6809' ;
setting = 'M6811' ;
-- collects lines of conditional code into blocks
conditional = 'ELSEIF' expression EOL conditional ;
<<PrettyPrinter>>: { V(H(A<opcode>('ELSEIF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional[1])); }
conditional = 'ELSE' EOL else ;
<<PrettyPrinter>>: { V(H(A<opcode>('ELSE'),A<eol>(EOL)),CV(else)); }
conditional = 'FIN' ;
<<PrettyPrinter>>: { A<opcode>('FIN'); }
conditional = sourceline EOL conditional ;
<<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(conditional[1])); }
else = 'FIN' ;
<<PrettyPrinter>>: { A<opcode>('FIN'); }
else = sourceline EOL else ;
<<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(else[1])); }
-- keyword-less directive, generates data tables
implicitdatadirective = implicitdatadirective ',' implicitdataitem ;
<<PrettyPrinter>>: { H*; }
implicitdatadirective = implicitdataitem ;
implicitdataitem = '#' expression ;
<<PrettyPrinter>>: { A<operand>(H('#',expression)); }
implicitdataitem = '+' expression ;
<<PrettyPrinter>>: { A<operand>(H('+',expression)); }
implicitdataitem = '-' expression ;
<<PrettyPrinter>>: { A<operand>(H('-',expression)); }
implicitdataitem = expression ;
<<PrettyPrinter>>: { A<operand>(expression); }
implicitdataitem = STRING ;
<<PrettyPrinter>>: { A<operand>(STRING); }
-- instructions valid for m680C (see Software Dynamics ASM manual)
instruction = 'ABA' ;
<<PrettyPrinter>>: { A<opcode>('ABA'); }
instruction = 'ABX' ;
<<PrettyPrinter>>: { A<opcode>('ABX'); }
instruction = 'ADC' 'A' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>(H('ADC','A')),A<operand>(operandfetch)); }
instruction = 'ADC' 'B' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>(H('ADC','B')),A<operand>(operandfetch)); }
instruction = 'ADCA' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>('ADCA'),A<operand>(operandfetch)); }
instruction = 'ADCB' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>('ADCB'),A<operand>(operandfetch)); }
instruction = 'ADCD' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>('ADCD'),A<operand>(operandfetch)); }
instruction = 'ADD' 'A' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>(H('ADD','A')),A<operand>(operandfetch)); }
instruction = 'ADD' 'B' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>(H('ADD','B')),A<operand>(operandfetch)); }
instruction = 'ADDA' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>('ADDA'),A<operand>(operandfetch)); }
[..snip...]
-- condition code mask for ANDCC and ORCC
conditionmask = '#' expression ;
<<PrettyPrinter>>: { H*; }
conditionmask = expression ;
target = expression ;
operandfetch = '#' expression ; --immediate
<<PrettyPrinter>>: { H*; }
operandfetch = memoryreference ;
operandstore = memoryreference ;
memoryreference = '[' indexedreference ']' ;
<<PrettyPrinter>>: { H*; }
memoryreference = indexedreference ;
indexedreference = offset ;
indexedreference = offset ',' indexregister ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' '--' indexregister ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' '-' indexregister ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister '++' ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister '+' ;
<<PrettyPrinter>>: { H*; }
offset = '>' expression ; -- page zero ref
<<PrettyPrinter>>: { H*; }
offset = '<' expression ; -- long reference
<<PrettyPrinter>>: { H*; }
offset = expression ;
offset = 'A' ;
offset = 'B' ;
offset = 'D' ;
registerlist = registername ;
registerlist = registerlist ',' registername ;
<<PrettyPrinter>>: { H*; }
registername = 'A' ;
registername = 'B' ;
registername = 'CC' ;
registername = 'DP' ;
registername = 'D' ;
registername = 'Z' ;
registername = indexregister ;
indexregister = 'X' ;
indexregister = 'Y' ;
indexregister = 'U' ; -- not legal on M6811
indexregister = 'S' ;
indexregister = 'PCR' ;
indexregister = 'PC' ;
expression = sum '=' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '<<' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '</' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '<=' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '<' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '>>' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '>/' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '>=' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '>' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '#' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum ;
sum = product ;
sum = sum '+' product ;
<<PrettyPrinter>>: { H*; }
sum = sum '-' product ;
<<PrettyPrinter>>: { H*; }
sum = sum '!' product ;
<<PrettyPrinter>>: { H*; }
sum = sum '!!' product ;
<<PrettyPrinter>>: { H*; }
product = term '*' product ;
<<PrettyPrinter>>: { H*; }
product = term '||' product ; -- wrong?
<<PrettyPrinter>>: { H*; }
product = term '/' product ;
<<PrettyPrinter>>: { H*; }
product = term '//' product ;
<<PrettyPrinter>>: { H*; }
product = term '&' product ;
<<PrettyPrinter>>: { H*; }
product = term '##' product ;
<<PrettyPrinter>>: { H*; }
product = term ;
term = '+' term ;
<<PrettyPrinter>>: { H*; }
term = '-' term ;
<<PrettyPrinter>>: { H*; }
term = '\\' term ; -- complement
<<PrettyPrinter>>: { H*; }
term = '&' term ; -- not
term = IDENTIFIER ;
term = NUMBER ;
term = CHARACTER ;
term = '*' ;
term = '(' expression ')' ;
<<PrettyPrinter>>: { H*; }
numberlist = NUMBER ;
numberlist = numberlist ',' NUMBER ;
<<PrettyPrinter>>: { H*; }
Dupe de http://stackoverflow.com/questions/1305091/writing-an-z80-assembler-lexi ng-asm-and-building-a-parse-tree-using-composition por el mismo usuario –
@Butterworth: no es un duplicado. La otra pregunta tiene que ver con pasar información sobre cualquier árbol que pueda construir usando una gramática. Esta pregunta tiene que ver con si debería usar una gramática, y si es así, cómo sería. La respuesta a esta pregunta es una condición previa para que el otro sea interesante. –