Перейти до вмісту

Користувач:Luver/Чернетка

Матеріал з Вікіпедії — вільної енциклопедії.

В програмуванні поняття магічна константа має декілька значень:

  • Постійне числове або текстове значення, яке ідентифікує формат файлу або протокол
  • Характерні унікальні значення, які навряд можуть бути переплутані з чимось іншим (наприклад, GUID)
  • Константи з незрозумілими значеннями або ті, що можливо замінити іменованою константою.

Індикатори форматів

[ред. | ред. код]

Походження

[ред. | ред. код]

Перші магічні числа з'явилися в ранніх версіях вихідного коду операційної системи Unix. І хоча термін вже втратив своє початкове значення, він став частиною лексикону IT-індустрії.

Коли UNIX був портований один з комп'ютерів DEC PDP-11/20s, він не мав механізму безпечної роботи з пам'яттю, тому перші версії Unix використовували механізм позиційно-незалежного коду, тому розробники в усьому коді використовували значення постійних зміщень у пам'яті та давали їм імена. Це були перші магічні константи.

Магічні числа в файлах

[ред. | ред. код]

Магічні числа — є звичайним явищам у програмах у багатьох операційних системах. Магічні числа реалізують строго типізовану інформацію. Багато файлі мають такі константи, які ідентифікують інформацію в ньому. Виявлення таких констант — це ефективний засіб відрізняти формати файлів і можуть дати корисну інформацію під час виконання програм.

Приклади

[ред. | ред. код]

Деякі приклади:

  • Байт-код скомпільованих файлів класів на мові програмування Java починається з hex-значення CAFEBABE.
  • GIF-файли мають ASCII-код для «GIF89a» (47 49 46 38 39 61) або «GIF87a» (47 49 46 38 37 61)
  • JPEG-файли починаються з коду FF D8 та закінчуються FF D9. JPEG/JFIF файли містять ASCII-код для «JFIF» (4A 46 49 46) як нульовий рядок (англ. null-terminated string). JPEG/Exif-файли містять ASCII-код для «Exif» (45 78 69 66) також для нульового рядка.
  • PNG image files begin with an 8-byte signature which identifies the file as a PNG file and allows detection of common file transfer problems: \211 P N G \r \n \032 \n (89 50 4E 47 0D 0A 1A 0A). That signature contains various newline characters to permit detecting unwarranted automated newline conversions, such as transferring the file using FTP with the ASCII transfer mode instead of the binary mode.
  • Standard MIDI music files have the ASCII code for "MThd" (4D 54 68 64) followed by more metadata.
  • Unix script files usually start with a shebang, "#!" (23 21) followed by the path to an interpreter.
  • PostScript files and programs start with "%!" (25 21).
  • PDF files start with "%PDF" (hex 25 50 44 46).
  • MS-DOS EXE files and the EXE stub of the Microsoft Windows PE (Portable Executable) files start with the characters "MZ" (4D 5A), the initials of the designer of the file format, Mark Zbikowski. The definition allows "ZM" (5A 4D) as well, but this is quite uncommon.
  • The Berkeley Fast File System superblock format is identified as either 19 54 01 19 or 01 19 54 depending on version; both represent the birthday of the author, Marshall Kirk McKusick.
  • The Master Boot Record of bootable storage devices on almost all IA-32 IBM PC compatibles has a code of AA 55 as its last two bytes.
  • Executables for the Game Boy and Game Boy Advance handheld video game systems have a 48-byte or 156-byte magic number, respectively, at a fixed spot in the header. This magic number encodes a bitmap of the Nintendo logo.
  • Amiga software executable Hunk files running on Amiga classic 68000 machines all started with the hexadecimal number $000003f3, nicknamed the "Magic Cookie."
  • Amiga's black screen of death called Guru Meditation, in its first version, when the machine hung up for uncertain reasons, showed the hexadecimal number 48454C50, which stands for "HELP" in hexadecimal ASCII characters (48=H, 45=E, 4C=L, 50=P).
  • In the Amiga, the only absolute address in the system is hex $0000 0004 (memory location 4), which contains the start location called SysBase, a pointer to exec.library, the so-called kernel of Amiga.
  • PEF files, used by Mac OS and BeOS for PowerPC executables, contain the ASCII code for "Joy!" (4A 6F 79 21) as a prefix.
  • TIFF files begin with either II or MM followed by 42 as a two-byte integer in little or big endian byte ordering. II was for Intel which uses little endian byte ordering, so the magic number is 49 49 2A 00. MM was for Motorola which uses big endian byte ordering, so the magic number is 4D 4D 00 2A.
  • Unicode text files encoded in UTF-16 often start with the Byte Order Mark to detect endianness (FE FF for big endian and FF FE for little endian). UTF-8 text files often start with the UTF-8 encoding of the same character, EF BB BF.
  • LLVM Bitcode files start with BC (0x42, 0x43)
  • WAD files start with IWAD or PWAD (for Doom), WAD2 (for Quake) and WAD3 (for Half-Life).
  • Microsoft Office document files start with D0 CF 11 E0, which is visually suggestive of the word "DOCFILE0".

Note:

  • Zip files often begin with "PK" (50 4B), the initials of Phil Katz, author of DOS compression utility PKZIP, but they are not required to do so, therefore "PK" is not actually a magic number for zip files.[джерело?]

Detection

[ред. | ред. код]

The Unix utility program file can read and interpret magic numbers from files, and indeed, the file which is used to parse the information is called magic. The Windows utility TrID has a similar purpose.

Magic numbers in protocols

[ред. | ред. код]
  • The OSCAR protocol, used in AIM/ICQ, prefixes requests with 2A.
  • In the RFB protocol used by VNC, a client starts its conversation with a server by sending "RFB" (52 46 42, for "Remote Frame Buffer") followed by the client's protocol version number.
  • In the SMB protocol used by Microsoft Windows, each SMB request or server reply begins with 'FF 53 4D 42', or "\xFFSMB" at the start of the SMB request.
  • In the MSRPC protocol used by Microsoft Windows, each TCP-based request begins with 05 at the start of the request (representing Microsoft DCE/RPC Version 5), followed immediately by a 00 or 01 for the minor version. In UDP-based MSRPC requests the first byte is always 04.
  • In COM and DCOM marshalled interfaces, called OBJREFs, always start with the byte sequence "MEOW" (4D 45 4F 57). Debugging extensions (used for DCOM channel hooking) are prefaced with the byte sequence "MARB" (4D 41 52 42).
  • Unencrypted BitTorrent tracker requests begin with a single byte containing the value 19 representing the header length, followed immediately by the phrase "BitTorrent protocol" at byte position 1.
  • eDonkey2000/eMule traffic begins with a single byte representing the client version. Currently E3 represents an eDonkey client, C5 represents eMule, and D4 represents compressed eMule.
  • SSL transactions always begin with a "client hello" message. The record encapsulation scheme used to prefix all SSL packets consists of two- and three- byte header forms. Typically an SSL version 2 client hello message is prefixed with a 80 and an SSLv3 server response to a client hello begins with 16 (though this may vary).
  • DHCP packets use a "magic cookie" value of '63 82 53 63' at the start of the options section of the packet. This value is included in all DHCP packet types.

Unnamed numerical constants

[ред. | ред. код]

The term magic number or magic constant also refers to the programming practice of using numbers directly in source code. This has been referred to as breaking one of the oldest rules of programming, dating back to the COBOL, FORTRAN and PL/1 manuals of the 1960s[1]. The use of unnamed magic numbers in code obscures the developers' intent in choosing that number,[2] increases opportunities for subtle errors (e.g. is every digit correct in 3.14159265358979323846 and is this equal to 3.14159?) and makes it more difficult for the program to be adapted and extended in the future.[3] Replacing all significant magic numbers with named constants makes programs easier to read, understand and maintain.[4]

Names chosen should be meaningful in terms of the domain. It is easy to imagine nonsense like int EIGHT = 16 resulting when NUMBER_OF_BITS might have been a better choice of name in the first place.

The problems associated with magic 'numbers' described above are not limited to numerical types and the term is also applied to other data types where declaring a named constant would be more flexible and communicative.[1] Thus, declaring const string testUserName = "John" is better than several occurrences of the 'magic number' "John" in a test suite.

For example, if it is required to randomly shuffle the values in an array representing a standard pack of playing cards, this pseudocode will do the job:

   for i from 1 to 52
       j := i + randomInt(53 - i) - 1
       a.swapEntries(i, j)

where a is an array object, the function randomInt(x) chooses a random integer between 1 to x, inclusive, and swapEntries(i, j) swaps the ith and jth entries in the array. In the preceding example, 52 is a magic number. It is considered better programming style to write the following:

   constant int deckSize := 52
   for i from 1 to deckSize
       j := i + randomInt(deckSize + 1 - i) - 1
       a.swapEntries(i, j)

This is preferable for several reasons:

  • It is easier to read and understand. A programmer reading the first example might wonder, What does the number 52 mean here? Why 52? The programmer might infer the meaning after reading the code carefully, but it's not obvious. Magic numbers become particularly confusing when the same number is used for different purposes in one section of code.
  • It is easier to alter the value of the number, as it is not duplicated. Changing the value of a magic number is error-prone, because the same value is often used several times in different places within a program. Also, if two semantically distinct variables or numbers have the same value they may be accidentally both edited together. To modify the first example to shuffle a Tarot deck, which has 78 cards, a programmer might naively replace every instance of 52 in the program with 78. This would cause two problems. First, it would miss the value 53 on the second line of the example, which would cause the algorithm to fail in a subtle way. Second, it would likely replace the characters "52" everywhere, regardless of whether they refer to the deck size or to something else entirely, which could introduce bugs. By contrast, changing the value of the deckSize variable in the second example would be a simple, one-line change.
  • The declarations of "magic number" variables are placed together, usually at the top of a function or file, facilitating their review and change.
  • It facilitates parameterization. For example, to generalize the above example into a procedure that shuffles a deck of any number of cards, it would be sufficient to turn deckSize into a parameter of that procedure. The first example would require several changes, perhaps:
   function shuffle (int deckSize)
      for i from 1 to deckSize
          j := i + randomInt(deckSize + 1 - i) - 1
          a.swapEntries(i, j)
  • It helps detect typos. Using a variable (instead of a literal) takes advantage of a compiler's checking. Accidentally typing "62" instead of "52" would go undetected, whereas typing "dekSize" instead of "deckSize" would result in the compiler's warning that dekSize is undeclared.
  • It can reduce typing in some IDEs. If an IDE supports code completion, it will fill in most of the variable's name from the first few letters.

Disadvantages are:

  • It hurts the locality and comprehensibility of the code. Putting the 52 in a possibly-distant place means that to understand the workings of the for loop completely (for example to estimate the run-time of the loop) one must track down the definition and verify that it is the expected number.
  • It makes the code more complex, adding 25% to the LOC in this example. An increase in complexity may be justified if there is some likelihood of confusion about the constant, or if there is a likelihood the constant may need to be changed. Neither is likely for a deck of playing cards, which has been well-known to be 52 cards for several hundred years.[сумнівно ]
  • It may be slower for the CPU to process the expression "deckSize + 1" than the expression "53". However, most modern compilers and interpreters are capable of using the fact that the variable "deckSize" has been declared as a constant and pre-calculate the value 53 in the compiled code. There is therefore usually no speed advantage to using magic numbers in code.
  • It can increase the line length of the source code, forcing lines to be broken up if many constants are used on the same line.
  • It can make debugging more difficult, especially on systems where the debugger doesn't display the values of constants.

Accepted limited use of magic numbers

[ред. | ред. код]

In some contexts the use of unnamed numerical constants is generally accepted (and arguably "not magic"). While such acceptance is subjective, and often depends on individual coding habits, the following are common examples:

  • the use of 0 and 1 as initial or incremental values in a for loop, such as for (int i = 0; i < max; i = i + 1) (assuming i++ is not supported)
  • the use of 2 to check if a number is even or odd, as in isEven = (x % 2 == 0), where % is the modulo operator
  • the use of simple arithmetic constants, e.g., in expressions such as circumference = 2 * Math.PI * radius[1], or for calculating the discriminant of a quadratic equation as d = b^2 − 4*a*c

The constants 1 and 0 are sometimes used to represent the boolean values True and False in programming languages without a boolean type such as older versions of C. Most modern programming languages provide a boolean or bool primitive type and so the use of 0 and 1 is ill-advised.

In C and C++, 0 is sometimes used to represent the null pointer or reference. As with boolean values, the C standard library includes a macro definition NULL whose use is encouraged. Other languages provide a specific null or nil value and when this is the case no alternative should be used.

Magic GUIDs

[ред. | ред. код]

Although highly discouraged, it is possible to create or alter GUIDs so that they are memorable, but this compromises their strength as near-unique IDs.[5][6]. The specifications for generating GUIDs and UUIDs are quite complex, which is what leads to them being pretty much guaranteed unique, if properly implemented. They should only be generated by a reputable software tool.[джерело?]

Java uses several GUIDs starting with CAFEEFAC.[7]

Magic debug values

[ред. | ред. код]

Magic debug values are specific values written to memory during allocation or deallocation, so that it will later be possible to tell whether or not they have become corrupted, and to make it obvious when values taken from uninitialized memory are being used. Memory is usually viewed in hexadecimal, so memorable repeating or hexspeak values are common. Numerically odd values may be preferred so that processors without byte addressing will fault when attempting to use them as pointers (which must fall at even addresses). Similarly, they may be chosen so that they are not valid codes in the instruction set for the given architecture.

Since it is very unlikely, although possible, that a 32-bit integer would take this specific value, the appearance of such a number in a debugger or memory dump most likely indicates an error such as a buffer overflow or an uninitialized variable.

Famous and common examples include:

Magic debug values
Code Description
..FACADE Used by a number of RTOSes
8BADF00D Used by Apple as the exception code in iPhone crash reports when an application has taken too long to launch or terminate.
A5A5A5A5 Used in embedded development because the alternating bit pattern (10100101) creates an easily recognized pattern on oscilloscopes and logic analyzers.
ABABABAB Used by Microsoft's HeapAlloc() to mark "no man's land" guard bytes after allocated heap memory
ABADBABE Used by Apple as the "Boot Zero Block" magic number
ABADCAFE A startup to this value to initialize all free memory to catch errant pointers[прояснити]
BAADF00D Used by Microsoft's LocalAlloc(LMEM_FIXED) to mark uninitialised allocated heap memory
BADBADBADBAD Burroughs large systems "uninitialized" memory (48-bit words)
BADC0FFEE0DDF00D Used on IBM RS/6000 64-bit systems to indicate uninitialized CPU registers
BADCAB1E Error Code returned to the Microsoft eVC debugger when connection is severed to the debugger
BADDCAFE On Sun Microsystems' Solaris, marks uninitialised kernel memory (KMEM_UNINITIALIZED_PATTERN)
BEEFCACE Used by Microsoft .NET as a magic number in resource files
C0DEDBAD A memory leak tracking tool which it will change the MMU tables so that all references to address zero
CAFEBABE Used by both Universal Mach-O binaries and Java .class files
CAFEFEED Used by Sun Microsystems' Solaris debugging kernel to mark kmemfree() memory
CCCCCCCC Used by Microsoft's C++ debugging runtime library to mark uninitialised stack memory
CDCDCDCD Used by Microsoft's C++ debugging runtime library to mark uninitialised heap memory
CEFAEDFE Seen in Intel Mach-O binaries on Apple Inc.'s Mac OS X platform (see FEEDFACE)
DDDDDDDD Used by MicroQuill's SmartHeap and Microsoft's C++ debugging heap to mark freed heap memory
DEADBABE Used at the start of Silicon Graphics' IRIX arena files
DEADBEEF Famously used on IBM systems such as the RS/6000, also used in the original Mac OS operating systems, OPENSTEP Enterprise, and the Commodore Amiga. On Sun Microsystems' Solaris, marks freed kernel memory (KMEM_FREE_PATTERN)
DEADDEAD A Microsoft Windows STOP Error code used when the user manually initiates the crash.
DEADF00D Used by Mungwall on the Commodore Amiga to mark allocated but uninitialised memory [8]
DEADFA11 Used by Apple as the exception code in iPhone crash reports when the user has force-quit the application.
EBEBEBEB From MicroQuill's SmartHeap
FADEDEAD Comes at the end to identify every AppleScript script
FDFDFDFD Used by Microsoft's C++ debugging heap to mark "no man's land" guard bytes before and after allocated heap memory
FEE1DEAD Used by Linux reboot() syscall
FEEDFACE Seen in PowerPC Mach-O binaries on Apple Inc.'s Mac OS X platform. On Sun Microsystems' Solaris, marks the red zone (KMEM_REDZONE_PATTERN)
FEEEFEEE Used by Microsoft's HeapFree() to mark freed heap memory

Note that most of these are each 32 bits long — the dword size of 32-bit architecture computers.

The prevalence of these values in Microsoft technology is no coincidence; they are discussed in detail in Steve Maguire's book Writing Solid Code from Microsoft Press. He gives a variety of criteria for these values, such as:

  • They should not be useful; that is, most algorithms that operate on them should be expected to do something unusual. Numbers like zero don't fit this criterion.
  • They should be easily recognized by the programmer as invalid values in the debugger.
  • On machines that don't have byte alignment, they should be odd numbers, so that dereferencing them as addresses causes an exception.
  • They should cause an exception, or perhaps even a debugger break, if executed as code.

Since they were often used to mark areas of memory that were essentially empty, some of these terms came to be used in phrases meaning "gone, aborted, flushed from memory"; e.g. "Your program is DEADBEEF".

Pietr Brandehörst's ZUG programming language initialized memory to either 0000, DEAD or FFFF in development environment and to 0000 in the live environment, on the basis that uninitialised variables should be encouraged to misbehave under development to trap them, but encouraged to behave in a live environment to reduce errors[джерело?].

  1. а б в Martin, Robert C, (2009). Chapter 17: Smells and Heuristics - G25 Replace Magic Numbers with Named Constants. Clean Code - A handbook of agile software craftsmanship. Boston: Prentice Hall. с. 300. ISBN 0-13-235088-2.{{cite book}}: Обслуговування CS1: Сторінки з посиланнями на джерела із зайвою пунктуацією (посилання)
  2. Martin, Robert C, (2009). Chapter 17: Smells and Heuristics - G16 Obscured Intent. Clean Code - A handbook of agile software craftsmanship. Boston: Prentice Hall. с. 295. ISBN 0-13-235088-2.{{cite book}}: Обслуговування CS1: Сторінки з посиланнями на джерела із зайвою пунктуацією (посилання)
  3. Datamation.com, "Bjarne Stroustrup on Educating Software Developers" http://itmanagement.earthweb.com/features/print.php/12297_3789981_2
  4. IBM Developer, "Six ways to write more comprehensible code" http://www.ibm.com/developerworks/linux/library/l-clear-code/?ca=dgr-FClnxw01linuxcodetips
  5. 'flounder'. Guaranteeing uniqueness. Message Management. Developer Fusion. Процитовано 16 листопада 2007.
  6. Larry Osterman (21 липня 2005). UUIDs are only unique if you generate them... Larry Osterman's WebLog - Confessions of an Old Fogey. MSDN. Процитовано 16 листопада 2007.
  7. Java SE 6 Release Notes. Процитовано 18 червня 2010.
  8. http://cataclysm.cx/random/amiga/reference/AmigaMail_Vol2_guide/node0053.html

References

[ред. | ред. код]