XML

This package is an XML lexer written in Go. It follows the specification at Extensible Markup Language (XML) 1.0 (Fifth Edition). The lexer takes an io.Reader and converts it into tokens until the EOF.

Installation

Run the following command

go get -u github.com/tdewolff/parse/v2/xml

or add the following import and run project with go get

import "github.com/tdewolff/parse/v2/xml"

Lexer

Usage

The following initializes a new Lexer with io.Reader r:

l := xml.NewLexer(parse.NewInput(r))

To tokenize until EOF an error, use:

for {
	tt, data := l.Next()
	switch tt {
	case xml.ErrorToken:
		// error or EOF set in l.Err()
		return
	case xml.StartTagToken:
		// ...
		for {
			ttAttr, dataAttr := l.Next()
			if ttAttr != xml.AttributeToken {
				// handle StartTagCloseToken/StartTagCloseVoidToken/StartTagClosePIToken
				break
			}
			// ...
		}
	case xml.EndTagToken:
		// ...
	}
}

All tokens:

ErrorToken TokenType = iota // extra token when errors occur
CommentToken
CDATAToken
StartTagToken
StartTagCloseToken
StartTagCloseVoidToken
StartTagClosePIToken
EndTagToken
AttributeToken
TextToken

Examples

package main

import (
	"os"

	"github.com/tdewolff/parse/v2/xml"
)

// Tokenize XML from stdin.
func main() {
	l := xml.NewLexer(parse.NewInput(os.Stdin))
	for {
		tt, data := l.Next()
		switch tt {
		case xml.ErrorToken:
			if l.Err() != io.EOF {
				fmt.Println("Error on line", l.Line(), ":", l.Err())
			}
			return
		case xml.StartTagToken:
			fmt.Println("Tag", string(data))
			for {
				ttAttr, dataAttr := l.Next()
				if ttAttr != xml.AttributeToken {
					break
				}

				key := dataAttr
				val := l.AttrVal()
				fmt.Println("Attribute", string(key), "=", string(val))
			}
		// ...
		}
	}
}

License

Released under the MIT license.

Last updated