Jump to content

Coding 101, Coding basics, Beyond the simple types


G+_Joe C. Hecht
 Share

Recommended Posts

Coding 101, Coding basics, Beyond the simple types

 

Neither pointers or structures have been covered in TWiT's Coding 101 show, and I have seen a few questions about some of the more advanced types presented, so I thought a scribe might be helpful to some. Corrections, comments, additions, spell and grammar checking are welcomed.

 

The material presented here is general, and is non-specific to implementation (C, C#, Perl). In short, it contains some basic knowledge that any coder will need, and can be applied to most any language.

 

For the sake of new coders, we will start at the very beginning, and quickly being you up to speed. 

 

The computer's memory (RAM for random access memory), is available only when the computer is powered on. Code (computer instructions) and data is copied to (and from) persistent storage (such as a disk) that allows the code and data to persist when the computer is turned off.

 

Each byte in a computer's memory can be accessed via it's "address".

 

A "pointer" is a variable that contains the starting address of a chunk of the computers memory, and "points to" that address. The address of "zero" is considered special, it is considered to point to nowhere, and is called "NULL", 

 

You can allocate memory to get a pointer to some memory to store things, and use should always de-allocate that memory when you are done.

 

You can "cast" a pointer to access at the memory at a given address as a specific type of data. For example:

 

APointer = SomeMemoryAddrress;

integer(APointer^) := 10;

 

You can "cast" a pointer as any type you want, but always cast with care, making sure your cast actually made to the correct type at the correct address, else you will be reading or writing garbage to or from the memory location your pointer to pointing to.

 

 

Structures:

 

A structure is a "record" containing one or more pieces of data (fields) that are grouped together. The types of data contained in a structure can be mixed. For example, you could group together a person's name (a string of characters) and an person's age (a number) into a structure type called "Person".

 

The following demonstration is shown using the "Pascal" language (a language designed for teaching), and it's syntax can easily be adapted to any language (such as C#).

 

Type Person = record

  Name : string;

  Age : integer;

end; 

 

Procedures and Functions are then made to take in a given structure and do something with it's data members. For example, you might make a procedure to change the persons age in a "Person" structure:

 

procedure ChangeAge(var APerson : Person; NewAge: integer);

begin

  APerson.Age := NewAge;

end;

 

To make complex data structures, you simply make new structures containing other structures, building on the structures you have already made.

 

It is worth noting that "Objects" are in fact structures of data that are bound with a table of the functions and procedures that work with the data. Once the functions and procedures are bound to a structure, the structure becomes known as an "object" and  the functions and procedures become known as "object methods".

 

Objects make it easy to keep your data and functions together.

 

Here is the same "Person" structure and ChangeAge() procedure as above, but implemented as an object (again, in Pascal):

 

Type Person = class(TObject)

 Name : string;

 Age : integer;

 procedure ChangeAge(NewAge: integer);

end;

 

procedure Person.ChangeAge(NewAge: integer);

begin

  Age := NewAge;

end;

 

 

It is worth noting that Objects in most languages almost always have a field, method, or some other way to identify the object's type (so you can always ask an object what type it is).

 

Objects also have "pointer" to itself (so the object can always refer to it's own instance of itself).

 

If your object is made from another object, you get the data and methods from the other object, plus any data and methods that you want to add. This is called "inheritance". You make more complex objects from building on the objects you have already made. 

 

When you make an inherited object, it is usually considered  to be of the new type, plus of all the types it inherited from. For example, if you made an "Employee" object that was inherited from a "Person" object, then ask it if it was a "Employee" object, it would reply "yes". It would also reply that it is a "Person" object, since it descends from the "Person" object. Finally, it would also reply that it is a TObject, since the "Person" descends from TObject.

 

 

Since an object's "methods" are called by using the values stored in the object's table of functions, you can change the method that gets

called simply by "overriding" the method that is stored in the function table with new method that does something slightly different. Object's  make "overriding" a method easy, and also allows you to call the original "inherited" method that got overridden. For example, you might want to make an Employee object that descends from the Person object, and then override the ChangeAge() method in an inherited object, so that that looks at the incoming age, and if it is 65, it would send the employee an email wishing them a happy retirement, and then calls the original inherited ChangeAge() method in the person object that you inherited from.

 

Objects have a constructor method that gets called when the object is created, and a destructor method that gets called when the object is freed. You can "override" the constructor method to initialize your object's data members and perform other start up tasks when your object is created, and you can "override" the destructor method to perform any clean up tasks your object requires when it is freed. 

 

Finally, Objects exist as pointers to the actual structures they represent. You can "cast" an object (or pointer) as any type you want, but always cast with care, making sure your cast actually made to the correct type.

 

There are a few other advanced ways that methods can be implemented and overridden (that are beyond the scope of this article). Needless to say, objects extend structures in a very powerful way that allows you to easily leverage the code you have written in the past with the code you need to write today.

 

Other advanced types:

 

Array:

 

The "Array" type is a fixed length group of identical element types that are accessed by "index". For example, you might need to store ten Person structures:

 

var PersonsArray = array[1..10] of Person;

 

And then access the second Person structure (at index number 2) like this:

 

PersonsArray[2].Age := 10;

 

The types of the elements in the array must all be of the same type, noting that if a element type is a pointer to an identifiable structure type (or is an object), this would allow the elements of the array to contain pointers to different types that you would work with. For example, of you had several different types of structures and each had a field called "structureType" at the same location in the structure, you could make an array of pointers (fulfilling the requirement that all the array's elements are of the same type), but the pointers would actually "point" to different types of structures, and you could examine (cast) the pointer and look at the structureType field to find out what kind of structure your pointer is actually pointing.

 

Arrays of objects work the same way. You make an array of "basic" Objects (fulfilling the requirement that all the array's elements are of the same type), fill the array with all sorts of different objects type that descend from the basic Object type, then ask each object what type of object it is before you cast in order to work with it.

 

Dynamic arrays:

 

A "Dynamic array" is a compiler managed array that allows for the addition and subtraction of the number of elements the array holds. Generally, the compiler allocates elements in "chunks" containing a few extra hidden elements (making it efficient to add a few extra elements). When the size of the array is reduced, often the compiler does not actually reduce the true size of array, keeping those slots around for reuse, just in case you later decided to enlarge your array. While accessing array elements is very fast (since the elements exist all together in a row), Dynamic Arrays are often not efficient at changing sizes, as when it's size must truly change, the entire array must be reallocated in memory, and if the reallocation is not able to be done "in place", the elements of the array must be copied to the newly allocated memory. It is also possible for new allocations to fail, since the reallocation may require making room to hold a whole new copy of the array.

 

Linked Lists:

 

Linked lists are an "old style" method of dynamically storing and accessing data, and are very efficient at storage, but may not be very efficient during look up. A structure is made containing a pointer to the data you want to store, along with a pointer to the "next" data structure in the list (the link). The last element in the list has it's "next" member is set to NULL. You remember the first pointer (the head of the list), and travel the list by looping to the "next" of each element, until you find what you are looking for, or you find an element that has it's "next" set to "NULL" (and you have arrived at the end of the list).

 

Growing the list simply requires allocating a new structure, setting it's "next" pointer to NULL, finding the end of the list, and adding the address of the new structure to the last elements "next" pointer.

 

Removing any element simply requires patching the previous element's "next" pointer to point to the "next" in the element you are removing, and then deallocating the element.

 

Likewise, inserting a new element anywhere in the list simply requires allocating a new element, finding the element right before you want to insert at (called the insertAt element), and setting the new elements's "next" to point to the insertAt's "next", then setting the insertAt's "next" to point to the newly inserted element.

 

If you want to travel the list both forward and backward, you need a "doubly linked list", where each element has pointers to both the "next" and the "previous" element. You can make a circular list (great when used for a ring buffer) by having the last element's "next" point back to the first element.

 

For what it's worth, linked lists are one of the most important data concepts used in higher level programming, and is certainly worth learning. If you have trouble understanding linked lists, don't feel alone. This concept was on the test at the Borland, and almost every upcoming engineer failed that section and required assistance.

 

While Linked Lists may seem a bit confusing at first, they are really a quite simple concept, and may very well be the "impress" you need to land a great position in programming. No matter what anyone tells you, pointers, linked lists, and structures are at the core of all great programming efforts, no matter what they may be called, or how they may be exposed. Learn those things, and you can easily figure out the rest.

 

Lists:

 

"Lists" make quick work of the traditional "Linked List". Lists have the same advantage of dynamic arrays, as they can be dynamically re sized, but unlike dynamic arrays, memory usage is usually very efficient, as each element is dynamically allocated usually as part of some sort of linked list or perhaps a well managed dynamic array of pointers, meaning that growing or shrinking the list either does not (or rarely will) require reallocating the entire number of elements in the list. Accessing and list elements sometimes has some small disadvantage in speed over arrays. Most modern compilers and libraries minimize speed hits by pre-allocating additional item pointers in chunks, and sometimes, even provide a dynamic array of pointers to the list items, allowing single items to be accessed quickly by index (like an array). While internal implementation of lists is largely compiler or library dependent, (and falling somewhere between a dynamic array of pointers and a true managed Linked List) in general, you can count on lists to provided reasonably fast item management, using efficient memory techniques, with a generous supply of useful methods.

 

Dictionary Lists:

 

"Dictionary Lists" (aka "Associative Lists" aka "Associative Arrays") are "dual lists" (see above) that work with a "key" (a name) and a "value" (some data). The data is both stored and retrieved via the "key name".

 

Storing and retrieving the "key names" as stings of characters would consume a lot of memory, and the comparisons required for sorting and looking up "key names" made of character strings would be very slow.

 

"Dictionary Lists" use "hashing" for efficiency and speed. In this case, hashing is the word use for taking the string of characters (the key name) and turning into a "unique repeatable" number, using a mathematical function where no two different strings will produce the same value, yet every same string produces the same value.

 

This numeric hashed value allows the "Key Name" to be stored, sorted, and looked up as a number, usually taking up far less memory than a string of characters, and can also be compared and sorted a lot faster than strings of characters.

 

The list's key name hashed value's are stored in a sorted fashion, allowing for very fast searching using a "binary" search, that provides lighting fast item retrieval and insertions.

 

When adding a item to a "Dictionary List" if the key name hash value does not already exist in the "list", then it is added (along with the data), else if it does exist, the data associated with that key name value is (generally) replaced.

 

You can retrieve your data at a later time by asking for it by "key name", where the "key name" is rehashed and it's value is searched for. Hashing a string takes a lot of CPU cycles, so some implementations will allow you to get the hash value of a "key name" so you can store it back and retrieve data from the list by supplying the hash value of the "key name". This technique can give your data retrieval operations a huge boost in speed.

 

Like a regular list, most compilers and libraries provide a dynamic array of pointers to the list items, so the associated data items in the list can  usually be accessed by index, however, the data will probably not come back in the order it was added, since the list items are sorted by the key name's hash value when they are added.

 

"Dictionary Lists" provide a lot of power when you need to associate data with a value, and it is worth more then a note to say that the key name does not need to a string (per say), but can be any unique value that could be encoded as a sting. For example, you could use a customer number as the key, simply by encoding the number as a string, and passing the sting value as the key name. In fact, some implementations of "Dictionary Lists" allow you to send a binary buffer as the "key name", and some even allow you to provide the hash value to use. Smart, very very smart.

 

I hope someone finds this scribe to be usefull.

 

Code On!

 

Joe C. Hecht aka TJoe(h^);

Please Share the Page Link:

 

https://plus.google.com/102020992993114748478/posts/TpHtrQgBfuY

 

COPYRIGHT © 2014 by Code4Sale, LLC - ALL RIGHTS RESERVED

Link to comment
Share on other sites

Mighty fine post Joe!

 

I think pointers are another one of those things that new comers will ask, "Why do I need this? When will I ever use this?". Well, in C# (and .Net in general) pointers aren't used as heavily as they were with C and C++. You have to use pointers in an 'unsafe' environment (you literally have to use 'unsafe' to declare a pointer). There are many downsides to writing unsafe code - one is loosing garbage collection, you're just asking for memory leaks if you don't handle with care. 

 

 public static void Main()

      {

         unsafe

         {

            int var = 10;

            int* p_var = &var;

            Console.WriteLine("var is: {0} " , var);

            Console.WriteLine("var is: {0} " , p_var->ToString());

            Console.WriteLine("Address is: {0} " , (int)p_var);

         }

 

Output:

 

var is: 10

var is: 10

Address is: 0055723 (made this up)

 

You use * to declare pointers and & to 'reference' the variables address (so &var will give us the address in memory of var and we assigned it to our pointer p_var). You can name pointers just like you do any other variables, I always got into the habit of naming mine p_whatever variable I'm using it with. 

 

I have yet to run into a scenario with C# where I needed a pointer, although I'm not using it much anymore. Maybe someone can add some more pointer knowledge? Even if they aren't used much with C#, I still think they are important to learn as you said Joe. 

 

Side note: Linked Lists shudder Just stick with them... You'll get there... twitch 

Link to comment
Share on other sites

Thanks so much for the kind words Nathan Follmer.

 

I did write the article to be general. I probably should have mentioned that pointers are not generally used when writing code in managed environments (such as C#), but for sure, they exist under the hood, so they are worth understanding.

 

So many times, I refer back to a "Gilligan's Island" episode where Ginger says "To catch a mole, you must think like a mole".

 

That same approach works very well when working with compilers. You can write some very good code (and catch some very good bugs) simply by thinking like the compiler. Asking "how does it work", and "what is the compiler going to do" can solve a lot of issues.

 

I'm the sort of guy that counts the CPU cycles required as I write the code, and through my experiences, I have gained a huge distrust of automatic garbage collection (I do disassemble code on occasion to see whats really going on).

 

I can remember arguments breaking out at Borland for even the simple things like the efficiency of "x := x + 1;" V.S. "Inc(x);".

 

FWIW, Inc(x) won by a couple of clocks, but the point is we learned to care what was going on under the hood, and came from the experience as much better coders.

 

Managed code IS the future, but I am old guy usually living somewhere in kernel mode, and pointers are just a way of life.

 

You learn to be very careful. You learn distrust of the compiler, and you learn you investigate what is really going on under the hood. And the reward is some wickedly fast code!

 

Again, thanks so much for the kind words, and the additional info!

 

FWIW, I too use "T_;s" and "P's" to denote things like pointers. I think my worst habit is prefixing pointer variables with "lp" (for long pointer). Wow, that is so old (dating back to the days of 8 and 16 bit land where grabbing up something like a whole 64k block might be a real challenge).

 

Great stuff Nathan Follmer  Thanks so much! I see you are in graphics. That is my forte. I will investigate!

 

TJoe(h^);

Link to comment
Share on other sites

Yeah Joe Hecht  I somehow got thrown into video editing / motion graphics over the past few years and haven't left since LOL I still create apps and websites for this company, but spend a lot of time in Final Cut (I was forced onto a Mac... but now I love them, weird?) But i digress... 

 

I actually started in C++... hard language to learn for a beginner, but I agree with you, best to learn the dark underbelly of code to understand what is really happening. I think starting with C++ really made me a more effective programmer - I do tend to do things the hard way sometimes (like garbage collection... blew my mind when I learned C#, I don't have to do it myself?! I want the controls back!), but I've slowly learned to give up some freedoms for quicker development :P 

Link to comment
Share on other sites

Nathan Follmer  Very nice site you have! I shared a bit with my wife Lynda Hecht (especially your autumn pic - she is going to love that one), and stored back the "contact me" link for web development, as I am considering jobbing out some web work.

 

I do a bit of higher level coding as well (the low level code has to be tested by something), and we do support Windows RT, so sometimes, Visual Studio is the environment of the day.

 

I cannot seem to "get" Final Cut at all, and always end up reverting back to my Windows video tools (some I have written, others received from working with the studios). I do keep trying, and someday, I will get the hang of it.

 

I love the Macs... but only for running Windows and Linux in VM's. Fusion really excels at serving up VM's, and we run the heck out of VM's around here. Mostly for compiling, but just about everything we do is done in a VM. We do a lot of compiling, on a lot of different platforms, with top down build times often requiring 12+ hours, and spread across many machines. It's literately a build farm around here.

 

I think Skype is a very nice experience on the Mac (but sadly, my main video setup runs under Windows). I think the Textual IRC client is also very nice.

 

Again, nice page you got there, and I plan to visit often! It's folks like you that make the contributions that keep "lagger-non-posters" like me entertained when we finally look up from a long night of coding! To be truthful, that tree photo you posted is about as close as I will get to a tree this season. Thanks  for bringing it to me!

 

Joe C. Hecht

Link to comment
Share on other sites

Glad I could help Joe and thank you for the kind words also. I actually dislike my website LOL I just never have time to update it because I'm too busy, I'm glad someone enjoys it though. Glad my photos could help too, I just snapped that tree with the phone coming home this morning.

 

FCP was a hard learn for me, I had a few years on Premiere/After Effects before moving to FCP and Motion... They both have their strengths and learning curves though, like anything else. I think I prefer FCP now - 10 was a huge improvement. 

 

We're steering this ship way off topic here... ;)

Link to comment
Share on other sites

All languages have pointers. Some languages just don't allow you to manipulate them directly. If you are programming in any language, you should know, for example, if your variables or function arguments or assignments use pointers to objects or full copies of objects.

 

I might disagree with Joe a little here, in that in my experience, linked lists really aren't that commonly used. Like trees, you should learn about them because they do come up in specific contexts, but 90% of real programs end up using just dynamic arrays and hash tables. JavaScript doesn't even have arrays--it just simulates them with hash tables!

Link to comment
Share on other sites

  • 4 months later...
 Share

×
×
  • Create New...