Naming

Currently the naming is a bit of a mess. For example, we often find something like:

TokenSym *ts, **pts;

but pts is not a pointer to a ts, it's a pointer to a pointer to a ts; it should be called ppts or the like. May seem a minor point, but it can result in a serious time sink for somebody trying to figure out the code. Once you discover the names are nottrustworthy, you have little choice but to ignore the names and just figure out the logic.

Proposed Conventions:

global vars prefixed tcc_, e.g. tcc_ch, tcc_file

typed names (typedefs, structs, etc.) suffixed _t, e.g. tok_t, reg_t

use BOOL for predicates, e.g. isid, isspace, etc.

use tcc_char_t (or some other kind of #define wrapper) instead of char, anticipating unicode source text in general, prefer toktype to tok, since that's what we mean

functions follow OO <object>.<op>, e.g. sect_new, not new_section. this means for lexing functions use a prefix like inbuf_, e.g. buf_parse_c_comment. This adds clarity; e.g. "parse_btype" tells use we're looking for a basic type, but not where we're looking; "inbuf_parse_btype" tells us we're looking at the input buffer (and not something else).

use <obj>_new and <obj>_delete, not malloc/free (except for the funcs that actually malloc/free). Or not; so long as we're consistent.

use typedefs instead of plain int for e.g. token codes

use size_t, ptrdiff_t, etc. where appropriate instead of int

replace v by meaningful name, e.g. struct_find(tok_t toktype) instead of struct_find(int v). In this case, the arg is a number that both indexes into table_ident (a/k/a lexis) and codes a token class.

use size, SIZE only for byte counts; otherwise use SIZEN? or ELTS? e.g. TOK_MAX_SIZE is int count, not byte count, so call it TOK_MAX_SIZE4 or TOK_MAX_ELTS - anything except TOK_MAX_SIZE

Fixing current names

TokenSym **pts is common; should be ppts; ditto for similar pp types

tok_alloc_new is redundant; s/b tok_new

tok_alloc s/b tok_get ("get" accomodates "create if necessary")

find_section s/b section_get ("create if necessary")

check all functions for <obj>_<op> name

consistency: either is_id and is_space, or isid and isspace

what does "stray" mean? isolated escape char '\'?

some funcs use uint8_t for char; use tcc_char_t instead? in any case, be consistent

some funcs wrap others; e.g. handle_stray1 calls handle_stray, which calls handle_stray_noerror. Use same name, but prefix __ for implementation funcs; e.g. handle_stray calls __handle_stray

size_t where appropriate

reserve "string" for char strings, so e.g. TokenString -> TokenSeq; also, wherever int *str occurs make it int *seq

beware of "last" meaning "most recent"; e.g. TokenString.last_line_num means TokenString.most_recent_line_nbr (I think)

table_ident - inaccurate; it's actually a table of keywords, preprocessor directives, assembly keywords, etc. (the stuff in tcctok.h), /after/ which identifiers are inserted. What is doesn't include is one- and two-char terminals. So we should call it something like stack_lexis or just plain lexis, which is reasonably accurate (the lexis also includes the 1 and two byte operators etc.)

SYM_ANOM => SYM_ANON

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License