Camen Design Forum

Kroc's Work Log

append delete Kroc

I've actually been working on programming projects almost daily for the last two years, despite the silence from me on the blog & forum (our son requires our full attention).

This thread will be a log of what I'm coding as I go. Feel free to ask questions.

Reply RSS

Replies

append delete #1. Kroc

:: OZ80MANDIAS / blu

Thus far I have managed to get my Z80 assembler to successfully build Sonic 1 SMS, therefore I am now working on making it 'user-ready' in preparation for a '1.0' release.

This week I've been reworking the tokeniser (the routine that turns the source code text into a machine representation) into a more complete lexer; rather than a linear stream of tokens it'll use an Abstract Syntax Tree and also verifies the scope (that is, words follow in allowed order and the hierarchy of nested statements is validated).

Previously, the validation of scope was occurring during assembly (when the tree-like order is processed anyway), but this slows down the assembling and duplicates a great deal of validation when we introduce loops.

_Today's work:_

* Removed several lookup tables to check Token category (e.g. Expression, List, Keyword) and replaced it instead with a bit-mask included in each Token / syntax tree node). This reduces Class initialisation time (quite slow ATM), and makes checking the Token type much quicker throughout the code base. This also helps ease our progressive move from Token Stream to Syntax Tree (both systems are in use side-by-side ATM)

_Things to Consider:_

* The Repeat Operator `x` is not an operator! It creates Lists out of Expressions, so shouldn't be detected as part of any normal Expression (only one Value allowed). I will need to re-classify it as something else

* I need to solve the problem of Properties (`.property`) differentiating definition and use only by a new line; that is, a line that begins with a property defines that property, otherwise it's considered a read of an existing property. This is problematic for OZ80 because new-lines are entirely optional everywhere except for this instance, and it breaks the notion that line-breaks can be added for readability, e.g.:

% OZ80
TABLE :someData {
    BYTE .a .b .c
.d  .e .f ;this is an error as `.d` is considered a sub-routine definition and not a `BYTE`
}
%

For now the thought has been to require some kind of prefix for using Property names, e.g. `this.property` but I haven't considered what exactly

append delete #2. Kroc

More scope-walking work. Messing about with Property syntax. I hadn't noticed how hopelessly out of date the SYNTAX.TXT documentation was, it'll have to be totally rewritten before release. Whilst `:.property` is logical for current syntax (`:` is shorthand for the current `PROC`/`TABLE`), there isn't an equivalent if we want to make self-references in a Type-definition, for example (`#.property`?)

I also reworked some of the Master System memory map in 'system.sms.oz80' to reflect intended spacing Property functionality)

append delete #3. Kroc

It's the function that never ends! (both in the time it's taking and the length of it). The `FileParse` function is 1'500 lines long and I've been working on massively re-architecting it for the last few weeks. Now before you get a hernia screaming 'bad programming', you should understand that this function is purposefully written with tons of GOTOs for speed, more speed and extra speed. When you have to parse millions of characters of text in VB6, function calls are are no longer viable.

* bluString now partially normalises whitespace when loading a file; this means that OZ80 only needs to consider ASCII Space, CarriageReturn and Tab and no other kind of whitespace during parsing, greatly simplifying and speeding up the character walking

* Completely re-worked the character walking to work on the basis of the first character of the word, words that end with white-space, words that end with line-break and multi-line words like comments. This helped eliminate some messy comment parsing and known-to-crash character look-aheads

* I noticed that the `HELP` statement would not work for `PROC` `PARAMS`/`RETURN` -- the Text is a list, and the next `PARAM` that follows would be treated as another byte for the text. To solve this I've added Documentation Text syntax (double back-tick) e.g.

% OZ80
PROC :doStuff
PARAMS HL ``Address to the data to process
{ }
%

This way documentation text can never clash with real text or other expressions. We also don't have to deal with how documentation text treats white-space compared to real text.

append delete #4. Kroc

Haven't had a lot of chance to update this over the last week, but work has been ongoing.

I fixed up the scope-walking enough to get the whole Sonic 1 source parsing and assembling. Thankfully it's faster than before, especially considering that it's doing more -- both token stream and AST are being built -- and should speed up much more once the old token stream is removed and the now unnecessary validation in the assembler can be cleaned up.

---

More shortcomings with the grammar design: The way OZ80 allows you to define RAM needs to be more flexible than I originally envisioned. The purpose of OZ80 is to allow assembly programs to be as flexible as high-level programs, therefore one can imagine that one has a 'library' of code and also possibly that one code base will extend another (ROM hacks). Much like how the Z80 code/data is arranged in the ROM automatically by OZ80, we should allow multiple blocks of RAM to be independently defined and let OZ80 choose the layout. This would allow discreet parts of the codebase to define their own RAM rather than all RAM having to be defined before the code (preventing 'extensions').

We also need to support more than one 'page' of RAM (the main system memory), as there will be pages of RAM that overlap the same address space; SRAM has two banks that can be switched between in the same Slot. We also need to consider VRAM and the Z80 ports ($00-$FF).

To that end I'll be changing OZ80 to work as such:

----

An instance of an anonymous RAM block will simply define a chunk of RAM to be laid out in the default system memory ($C000-$DFFF on the SMS)

% OZ80
RAM [
    .LIVES BYTE
    .TIME WORD
]
%

These can be littered throughout the codebase freely and OZ80 will fit them into system memory (`$`) automatically.

Some kind of statement to define a new page of RAM will be needed:

% OZ80
RAM PAGE $_SRAM1 $8000 16 KB
%

This will define a new page of RAM with its own address space to be auto-numbered from $8000, for 16 KBs length. Then, when you want to add some variables to the SRAM you plop a named RAM block into your code:

% OZ80
RAM $_SRAM1 [
    .CHECKSUM WORD
]
%
append delete #5. Kroc

I pushed a few small changes in the ever-continuing epic saga that is the `FileParse` rewrite:

* Changed the hex-to-decimal conversion to a fast look-up table method instead of using string-concat
* Defined Names are now validated inline, doing away with a very slow function I had for it before
* Defined Names now use one token instead of separate ones, with the token attributes providing the particulars. This makes it easy to handle composite Names (e.g. `::section:label.property`

I've now begun work on getting the Abstract Syntax Tree nodes to link together. This is a bit tricky to do due to fetching one word at a time; the relationships between nodes varies.

Once the AST is linked up, I'll need to redo the assembly pass to use the tree instead (doing away with a lot of duplicate validation) and I should finally see some results for all this work. The purpose of this work is to provide thorough validation during source reading so that grammar and syntax errors can be comprehensively trapped -- a prerequisite for a 1.0 release.

append delete #6. Kroc

This function is such drudgery! I haven't been able to do much in the last week. The Abstract Syntax Tree is now generated with the nodes linked together. I need to write more error messages and carefully check that all the possible validity checks are being made.

Once this is done I will flesh out the RAM Page feature explained a couple of posts up as the Sonic 1 code is not assembling with the correct RAM addresses at the moment.

I had a nice idea for handling appending to data tables, something that is very necessary in assembly and useful for making extensible code. For my needs, I would like all the mobs (enemies and interactive objects) in Sonic 1 insert their required code and table-entries (across the code-base) only when that mob's file is included, allowing fully dynamic inclusion and ordering of mobs.

An elegant way of handling this dynamic inclusion would be to define a table initially deft of data:

% OZ80
TABLE :pointers SECTION ::mobs
%

And then allow us to append a row of data to this table anywhere in the code-base:

% OZ80
ROW :pointers SECTION ::mobs {
WORD :sonic
}

This would also make row indexing and enumeration easier and more implicit too.

append delete #7. Kroc

Half-way through this Abstract Syntax Tree stuff...

Whilst not required at this stage, I would like to have some kind of source caching in place at a later date. Opening and reading files in Windows is quite slow and for a large code base with 100+ source files this adds a lot of up-front time -- 100 file reads are a roadblock as-is, but the bulk of time then falls upon parsing the text into words.

My plan is that once a project is assembled, the token-streams for all files are written out to a binary file. Should the dates / filesizes of source files not change, the tokenised version of the file can be retrieved from the cache, without having to do any string parsing -- a major speed boost, no doubt.

Now one could cache the Abstract Syntax Tree instead, which would be even faster (as the token-streams would not have to be built into the syntax tree), but this is far, far too dangerous and unreliable. The whole point of the Abstract Syntax Tree is to do away with validation during assembly, having moved that responsibility to the source parsing. If one were to accept a binary file of an Abstract Syntax Tree, the nodes could be entirely wrong, crashing the assembler with little hope of recovery. Also, should the OZ80 language change, loading an outdated AST would have the same effect.

Instead, if we cache the token-stream (a simple word-for-word compact binary form of the source code), we can re-run this through the validation to build an AST. Therefore poisoned and out-of-date caches would be safely parsed with full error handling.

append delete #8. Kroc

Has it really been that long since the last post? Crazy. We've been having some difficulty with our son's sleep so I've not been working on the code much. I have begun the major work of replacing the linear Token Stream based assembling with the Syntax Tree.

What I'd also like to solve whilst rewriting the assembler is removing any need to do string-based lookups of Names (i.e. Constants, Label names &c.). Whilst building the Abstract Syntax Tree, we can actually pre-allocate the names and replace them with index numbers so we can get straight to the defined value without having to search any arrays.

It's probably going to be quiet again for a while. See you then.

append delete #9. Kroc

Been on holiday, haven't coded much this month. Just poking at the OZ80 code to see where I need to continue.

For speed / simplicity, I want to be able to directly associate the Named Values (`::sections`, `:labels`, `.properties` &c.) to their Value-store; the syntax for differentiating property definition and use has reared its ugly head again and I need to come up with an aesthetic answer.

Within OZ80 I do not want to mandate any whitespace as I believe the author should have that flexibility available for themselves. Currently a property (a sub-routine within a Procedure, or a key in a Hash) is defined when it begins a new line, and is otherwise read. I am pained to come up with a good-looking syntax to separate the two.

% Some possibilities:
;#1. some kind of keyword to do the definition:
; pros: clarity, less burden on the more common property use
; cons: somewhat ugly, inconsistent with other forms of definitions (Hashes)
DEF .loop
    jp .loop

;#2. a 'self' prefix to use the property:
; pros: terse, `self` prefix may find use consistently elsewhere in the syntax
; cons: increases potential typos and places burden on the more common usage scenario
.loop
    jp ~.loop

;#3. conversely, a prefix to define a property:
; pros: terse, less burden on the usage scenario
; cons: consistently ugly and confusing, potentially inconsistent with other forms of definition 
:.loop
    jp .loop
%
append delete #10. Kroc

Weeks of illness, sleep difficulties and general busyness; I haven't had the brain power to get my head around OZ80 and I've been feeling disheartened over how endless development is. It's taken since the last post until now to work out what was making recent work too difficult to nail down -- the Abstract Syntax Tree was not abstract! I made the mistake of re-using the language tokens for the syntax tree so all I was doing was double-implementing the same grammar parsing -- once with tokens and again with the tree. This got seriously confusing, fast.

The AST doesn't need the meta data and semantics of the user-friendly OZ80 language. It just needs to be a series of direct actions the compiler needs to take. There doesn't need to be a distinction between Constants, Labels and so on in the AST, they're all just Symbols.

Now I can finally make sense of what I'm doing, my next main concern is how to define an object format -- assembled code, but with blanks inserted for later linking (that's an over-simplification unfortunately as some 'blanks' require link-time calculations)

append delete #11. Kroc

I looked at a few different Linker Object formats and read the documentation for some 1980s Z80 assemblers and came to the conclusions that 1.) object formats are very poorly, if ever, documented and 2.) typically too compiler specific (e.g. WLA DX) or just plain ancient and lacking the features I need (e.g. Intel Hex)

Anyway, this is getting ahead of myself, progress on generating the AST is glacial. The object linking will be done in memory to begin with.

append delete #12. Kroc

Oh man, my brain is so frazzled. Writing a compiler from scratch whilst going through years of sleep deprivation must surely rank high in the list of 'things not to do to yourself'.

I get the feeling that I'm writing this compiler/assembler the wrong way around. A professional would begin with the linker Intermediate Representation (LISP like S-expressions), adding one thing at a time and unit-testing as s/he goes and then build the higher level language using the AST of the well-defined LISPy IR. (A final LISP IR is how WebAssembly is being compiled)

I'm feeling wary of implementing an AST for an IR I don't yet know. At the same time, I know that I need to execute link-time expressions and function calls; but then I want to avoid implementing the linker in the LISP IR (the actual linking should be meta-data driven, not programmatically driven) .

A simple Reverse Polish Notation, FORTH-like will provide the ability to calculate expressions very easily, but this concept falls apart as soon as we allow for OZ80 user functions. What if an expression calls a function that loads an image from disk and manipulates it? FORTH will not handle that as easily as LISP.

Ergo, does my AST need to be a direct in-memory representation of a LISP, or should it be something in-between OZ80 and LISP? This is mind-numbingly broad and confusing. Why I struggle is that I'm not sure to what extent an Object file (for linking) is required to be an actual LISP program, vs. a data structure.

append delete #13. Kroc

I'm burnt out on OZ80 and this month is particularly full with various commitments. For a few weeks I've been taking a slow look at PHP with a view to updating all of my webware (DOMTemplate / ReMarkable / NoNonsenseForum / CamenDesign et al.)

append delete #14. Kroc

Being ground into dust by home / work at the moment, so nothing to report atm.

append delete #15. Kroc

I've been meaning to write something here for the last year and whilst I have been working on various things continuously throughout, I initially avoided writing anything on NNF out of fear of there being too little meaningful activity as I struggled with new levels of stress and pressure that prevented me from being able to focus on a project long enough in one go to attack any serious progress as well as dealing with such a level of mental fatigue that every time I sat down to do some work on my projects it took forever to even get back up to speed with what I was doing before I had to stop again. This awful 'stuttering' drained any fun out of the work I was doing and I had to find projects that were less pure-engineering with no meaningful demonstrableness.

:: PortaDOOM

* https://forum.zdoom.org/viewtopic.php?f=19&t=54628

PortaDOOM is a '90s-style disk-zine that provides an easy, zero-config way to play classic DOOM and the ton of creative community content out there. I created this for my own purposes as a way to simplify the complex process of playing DOOM mods -- there's multiple different engines and just loading mods requires swotting up on quite a lot of tech knowledge in a totally new field. I didn't want anybody else to have to repeat that nonsense so I built PortaDOOM so that the specifics of launching any particular mod would all be handled by code and I wouldn't have to think about it again.

If you want to experience some of the best of what the DOOM community has been creating in the last 20 years, PortaDOOM is worth a look.

Appropriately, PortaDOOM is written in QuickBASIC and a few thousand lines of batch script, for that authentic '90s feel.

:: DOOM-Crusher

* https://github.com/Kroc/DOOM-Crusher

PortaDOOM is pretty big and constantly growing so I needed a way to wrestle back whatever bytes I could. There already existed tools, but no automation and PortaDOOM's expansion to hundreds of WADs (DOOM files) meant that the workload would balloon to an unmanageable amount.

DOOM-Crusher is a JPG/PNG optimiser with support for DOOM files (WAD/PK3). It's one of the most complex set of batch files I've written and was brutally difficult to develop given the complete lack of automation for DOOM tools in the community.

What we have though is a tool that will walk through a folder and optimise anything it can including the internals of packed files (WADs, PK3s -- which are just ZIP files) but uses a cache of file hashes so that it never tries to optimise files it's already crushed before. This means that you can incrementally optimise a project without having to wait hours for the PNG tools to retry thousands of already crushed files. When you have 6+ GB of content, this is critical and there didn't exist anything out there to do this -- what a pain!

Reply

(Leave this as-is, it’s a trap!)

There is no need to “register”, just enter the same name + password of your choice every time.

Pro tip: Use markup to add links, quotes and more.

Your friendly neighbourhood moderators: Kroc, Impressed, Martijn