was originally designed for teaching programming
, not for real-world software development
The language had a string type suitable for demonstrating the concepts of text manipulation without having to deal with memory management. As this was the days before object orientation, the string type had to be built in to the language, an atomic type as with integer or char. Like these, it occupied a fixed number of bytes on the stack.
Here is an e.g. in standard Pascal:
I: integer; // occupies four bytes on a 32bit OS
C: char; // one byte
S: string; // 256 bytes
ca: array[1..1024] of char; // 1024 bytes - c-style strings can also be done
// copy a p string to a char array the hard way
for i := 1 to length(s) do
ca[i] := s[i] ;
ca[length(s) + 1] := #0; // null terminate
// get the adress - this is now Ok for a c string
pc := @ca;
The layout of a standard Pascal string is quite simple: It occupies a fixed 256 bytes. The first byte stores a length, the remaining bytes contain the characters. The string is not null terminated. It is likely that this length was thought at the time to be a reasonable compromise between usable length and not wasting to much precious memory. This was fine for teaching purposes, but falls short for real world usage in two ways:
- For many uses, it is just too short. 255 characters is not enough to store a HTML page or the contents of a text file, or even a long SQL query.
- It is not compatible with most OS APIs. C or C++ is the language of choice for systems programming, and vast majority of operating system libararies (including those of Windows and Linux) expose an API of C functions. The API functions will thus want C strings, that is the address of an array of chars that ends with a null (zero) byte. The Pascal string, as noted, is not null terminated.
You can try to make Pascal strings work with OS APIs (assuming that your Pascal has a few additions like pointer arithmetic and the like.
Assuming that the string that you wish to pass to the C api is shorter than 255 chars, you can forcibly put a null character #0 on the end of the string and can then pass the address of the first char to the API.
There are even worse problems with return values – the API will generally expect the address of a buffer, which it will fill. It will not correctly set the length byte for you.
Alternatives to Pascal strings in Pascal programs
Use char arrays
If you are using a C API from a Pascal program, it is a better idea to go down to the level of a plain C programmer, and use an array of characters. I would think that this can be done in any version of Pascal. You can write routines to pack and unpack these arrays from regular Pascal strings, up to the null char (see the code above).
Use a class
If you are working in a Pascal (or Pascal-derived language), which has only Pascal strings, but some Object Orientation (Delphi version 1 fits this category), there may be a class (e.g. TStringList in Delphi), which is capable of storing longer pieces of text.
Delphi strings were implemented in Delphi 2 onwards, i.e. in the first 32 bit version of Delphi, released in 1996.
They are the default string type - if you are using Delphi 2 or later, then unless you are trying hard, you are actually not using pascal strings at all.
Delphi strings are a complete replacement for Pascal strings. They have a 4-byte length field (i.e. they can hold 2Gb, or until you run out of memory, whichever comes first), are always null-terminated for easy use with OS functions, and have reference counting with copy-on-write.
The layout of a Delphi string is slightly more complex than a pascal string. The Delphi help explains it well:
A long-string variable is a pointer occupying four bytes of memory. When the variable is empty—that is, when it contains a zero-length string—the pointer is nil and the string uses no additional storage. When the variable is nonempty, it points to a dynamically allocated block of memory that contains the string value, a 32-bit length indicator, and a 32-bit reference count. This memory is allocated on the heap, but its management is entirely automatic and requires no user code.
Because long-string variables are pointers, two or more of them can reference the same value without consuming additional memory. The compiler exploits this to conserve resources and execute assignments faster. Whenever a long-string variable is destroyed or assigned a new value, the reference count of the old string (the variable’s previous value) is decremented and the reference count of the new value (if there is one) is incremented; if the reference count of a string reaches zero, its memory is deallocated. This process is called reference-counting. When indexing is used to change the value of a single character in a string, a copy of the string is made if—but only if—its reference count is greater than one. This is called copy-on-write semantics.
The block of memory is sized appropriately to the content. So it looks to the programmer like a simple variable, but under the hood it is a pointer to a buffer. Casting the string to pchar (pointer to character, i.e. a C string) just gets the address of the first character.