self-modifying code (thing) by ssd

Self modifying code is a programming technique where the program modifies itself as it runs. This technique is generally frowned on except when used in extremely limited ways, and has been largely made impossible, undesirable, or useless by modern computer architectures. Self modifying code was most useful on architectures with a very limited number of registers and limited (less than 64k) ram.

ways to self modify code:

store loop index in instruction
save memory & registers
modify instruction as a flag
replace NOP's with instructions or vice versa to add or remove operations
Problems with self modifying code when used fully
- Self modifying code can difficult to read. Sometimes this was done intentionally, as job security or as part of copy protection to make cracking the software harder.
- self modifying code can be tricky to debug, since it may do different things each time you run it
- self modifying code is tricky to reuse, since it is not reentrant; what one run does depends on what the last one did
current architecture obsticles

cpu instruction cache
Instructions that are modified in memory are not modified in cpu cache, and thus are ignored until the cache line expires. This could be exploited, of course, but then you have to totally understand how the instruction cache works.
read only text segments
Executable code in memory may be marked as read only by the operating system so it can be shared...
shared text segments
Exeuctable pages may be shared between separate processes, and thus modifying one page would affect other users' processes. This is generally not allowed in multiuser operating systems.
compiled code vs. machine language
The instructions generated by the compiler are not necessarily known when the code is written, making it difficult to modify code that isn't generated yet.
modern uses of self modifying code

runtime linker
The linker may patch unresolved jump statements in a jump table or in the code itself at or immediately before runtime; an unresolved symbol may be expressed as a jump to a routine that would backpatch the original jump to the correct address, thus allowing demand linking.
patch kernel to match cpu features available (fpu, etc.)
The Linux kernel does (or at one time did) include cpu instructions and features such as math instructions that were not available on all cpu's. When such an instruction is encountered the first time, a trap is genenerated and code is called to patch the instruction into a more efficent subroutine call to emulate the instruction next time instead of generating the trap.
trampoline
On the fly generation of temporary code which may load or switch banks to run another piece of code; this was especially popular in bank switched machines, where the addressable memory was smaller than the available memory, and in systems that used overlays.
overflow exploits
Many security holes are exploited by using potential buffer overruns in buggy code and modifying either the stack or the running code, sometimes even by putting a trampoline on the stack.
polymorphic viruses or stealth viruses
So called "polymorphic viruses" work by modifying their own code to attempt to prevent virus checkers from finding them.
genetic algorithms
Genetic algorithms are inherently self modifying; "code" fragments are mixed and matched and mutated using a search algorithm (random search is common) until an ideal combination is found
Structured languages have better methods that give the same advantages of self modifying code without actually modifying existing code:

eval
Many languages, especially interpreted languages, have eval, which will take a pregenerated string and run it as program code, thus generating new code rather than modifying existing code.
function pointers
Rather than modifying code in place, the code is generated using a function pointer (an indirect jump in assembly) which is given a value at runtime. This has the advantage that type checking can still be done, but may be less efficent on some architectures.
dynamic linking and using DLLs or ld.so to add functions
Some operating systems have support for linking in additional code at runtime, either via the use of function pointers to activate the code once linked in, or via unresolved symbols that cause the additional code to be automatically linked. (This uses the same mechanism as shared libraries.)
overloading
Some object oriented languages allow functions to be overloaded (defined multiple times in different ways), and linking of overloaded functions may actually change at runtime depending on what modules are loaded or the current context.
thunk or closure or lazy evaluation
Some languages (java, lisp, perl, others) allow code to be stored in or with a variable; the key is that the thunk may be created and passed to another piece of code (carrying along with it some of its execution environment) where it is later executed, similar to a trampoline.

This was brought to you by the Save Our Archaic Technical Terms Society.

American girls are all so easy	Instructions for kissing a stranger	Reentrant	polymorphic virus
Don't touch your brain	von Neumann architecture	demo scene	KaZaA
genetic algorithm	The Perl Song	Metamagical Themas	Functional programming
disassembler	opcode	Befunge	POCAL
OISC	Autonomous	lambda	ICQ
LISP	Standard Occupational Classification	Generic Webcomic

self-modifying code (thing)

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups

Login
Password

self-modifying code (thing)

Sign In

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups