By now, you’re most likely conversant in a number of programming languages. However have you ever ever puzzled how you would create your individual programming language? And by that, I imply:
A programming language is any algorithm that convert strings to varied sorts of machine code output.
Briefly, a programming language is only a set of predefined guidelines. And to make them helpful, you want one thing that understands these guidelines. And people issues are compilers, interpreters, and so forth. So we will merely outline some guidelines, then, to make it work, we will use any present programming language to make a program that may perceive these guidelines, which will probably be our interpreter.
Compiler
A compiler converts codes into machine code that the processor can execute (e.g. C++ compiler).
Interpreter
An interpreter goes by this system line by line and executes every command.
Wish to give it a attempt? Let’s create an excellent easy programming language collectively that outputs magenta-colored output within the console. We’ll name it Magenta.
Our easy programming language creates a codes variable that incorporates textual content that will get printed to the console… in magenta, in fact.
Establishing our programming language
I’m going to make use of Node.js however you should utilize any language to comply with alongside, the idea will stay the identical. Let me begin by creating an index.js file and set issues up.
class Magenta {
constructor(codes) {
this.codes = codes
}
run() {
console.log(this.codes)
}
}
// For now, we’re storing codes in a string variable referred to as `codes`
// Later, we’ll learn codes from a file
const codes =
`print “howdy world”
print “howdy once more”`
const magenta = new Magenta(codes)
magenta.run()
What we’re doing right here is declaring a category referred to as Magenta. That class defines and initiates an object that’s liable for logging textual content to the console with no matter textual content we offer it by way of a codes variable. And, in the interim, we’ve outlined that codes variable straight within the file with a few “howdy” messages.
If we had been to run this code we might get the textual content saved in codes logged within the console.
OK, now we have to create a what’s referred to as a Lexer.
What’s a Lexer?
OK, let’s talks in regards to the English language for a second. Take the next phrase:
How are you?
Right here, “How” is an adverb, “are” is a verb, and “you” is a pronoun. We even have a query mark (“?”) on the finish. We are able to divide any sentence or phrase like this into many grammatical elements in JavaScript. One other means we will distinguish these elements is to divide them into small tokens. This system that divides the textual content into tokens is our Lexer.
Since our language may be very tiny, it solely has two kinds of tokens, every with a worth:
keywordstring
We might’ve used a daily expression to extract tokes from the codes string however the efficiency will probably be very sluggish. A greater method is to loop by every character of the code string and seize tokens. So, let’s create a tokenize methodology in our Magenta class — which will probably be our Lexer.
Full code
class Magenta {
constructor(codes) {
this.codes = codes
}
tokenize() {
const size = this.codes.size
// pos retains observe of present place/index
let pos = 0
let tokens = []
const BUILT_IN_KEYWORDS = [“print”]
// allowed characters for variable/key phrase
const varChars = ‘abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_’
whereas (pos < size) {
let currentChar = this.codes[pos]
// if present char is house or newline, proceed
if (currentChar === ” ” || currentChar === “n”) {
pos++
proceed
} else if (currentChar === ‘”‘) {
// if present char is ” then we have now a string
let res = “”
pos++
// whereas subsequent char is just not ” or n and we’re not on the finish of the code
whereas (this.codes[pos] !== ‘”‘ && this.codes[pos] !== ‘n’ && pos < size) {
// including the char to the string
res += this.codes[pos]
pos++
}
// if the loop ended due to the tip of the code and we did not discover the closing “
if (this.codes[pos] !== ‘”‘) {
return {
error: `Unterminated string`
}
}
pos++
// including the string to the tokens
tokens.push({
sort: “string”,
worth: res
})
} else if (varChars.consists of(currentChar)) { arater
let res = currentChar
pos++
// whereas the subsequent char is a legitimate variable/key phrase charater
whereas (varChars.consists of(this.codes[pos]) && pos < size) {
// including the char to the string
res += this.codes[pos]
pos++
}
// if the key phrase is just not a inbuilt key phrase
if (!BUILT_IN_KEYWORDS.consists of(res)) {
return {
error: `Surprising token ${res}`
}
}
// including the key phrase to the tokens
tokens.push({
sort: “key phrase”,
worth: res
})
} else { // we have now a invalid character in our code
return {
error: `Surprising character ${this.codes[pos]}`
}
}
}
// returning the tokens
return {
error: false,
tokens
}
}
run() {
const {
tokens,
error
} = this.tokenize()
if (error) {
console.log(error)
return
}
console.log(tokens)
}
}
If we run this in a terminal with node index.js, we should always see a listing of tokens printed within the console.
Nice stuff!
Defining guidelines and syntaxes
We need to see if the order of our codes matches some type of rule or syntax. However first we have to outline what these guidelines and syntaxes are. Since our language is so tiny, it solely has one easy syntax which is a print key phrase adopted by a string.
key phrase:print string
So let’s create a parse methodology that loops by our tokens and see if we have now a legitimate syntax shaped. In that case, it’s going to take needed actions.
class Magenta {
constructor(codes) {
this.codes = codes
}
tokenize(){
/* earlier codes for tokenizer */
}
parse(tokens){
const len = tokens.size
let pos = 0
whereas(pos < len) {
const token = tokens[pos]
// if token is a print key phrase
if(token.sort === “key phrase” && token.worth === “print”) {
// if the subsequent token would not exist
if(!tokens[pos + 1]) {
return console.log(“Surprising finish of line, anticipated string”)
}
// examine if the subsequent token is a string
let isString = tokens[pos + 1].sort === “string”
// if the subsequent token is just not a string
if(!isString) {
return console.log(`Surprising token ${tokens[pos + 1].sort}, anticipated string`)
}
// if we attain this level, we have now legitimate syntax
// so we will print the string
console.log(‘x1b[35m%sx1b[0m’, tokens[pos + 1].worth)
// we add 2 as a result of we additionally examine the token after print key phrase
pos += 2
} else{ // if we did not match any guidelines
return console.log(`Surprising token ${token.sort}`)
}
}
}
run(){
const {tokens, error} = this.tokenize()
if(error){
console.log(error)
return
}
this.parse(tokens)
}
}
And would you take a look at that — we have already got a working language!
Okay however having codes in a string variable is just not that enjoyable. So lets put our Magenta codes in a file referred to as code.m. That means we will maintain our magenta codes separate from the compiler logic. We’re utilizing .m as file extension to point that this file incorporates code for our language.
Let’s learn the code from that file:
// importing file system module
const fs = require(‘fs’)
//importing path module for handy path becoming a member of
const path = require(‘path’)
class Magenta{
constructor(codes){
this.codes = codes
}
tokenize(){
/* earlier codes for tokenizer */
}
parse(tokens){
/* earlier codes for parse methodology */
}
run(){
/* earlier codes for run methodology */
}
}
// Studying code.m file
// Some textual content editors use rn for brand new line as an alternative of n, so we’re eradicating r
const codes = fs.readFileSync(path.be a part of(__dirname, ‘code.m’), ‘utf8’).toString().substitute(/r/g, "")
const magenta = new Magenta(codes)
magenta.run()
Go create a programming language!
And with that, we have now efficiently created a tiny Programming Language from scratch. See, a programming language may be so simple as one thing that accomplishes one particular factor. Positive, it’s unlikely {that a} language like Magenta right here will ever be helpful sufficient to be a part of a preferred framework or something, however now you see what it takes to make one.
The sky is de facto the restrict. If you need dive in slightly deeper, attempt following together with this video I made going over a extra superior instance. That is video I’ve additionally proven hoe you possibly can add variables to your language additionally.
Let’s Create a Tiny Programming Language initially revealed on CSS-Methods. You must get the e-newsletter.
Subscribe to MarketingSolution.
Receive web development discounts & web design tutorials.
Now! Lets GROW Together!