Learning Forever

Friday, June 26, 2026

AI in Software Engineering: From Code Writing to System Ownership

Why I Am Writing This

As organizations rapidly adopt tools such as Claude Code, Codex, and other AI-powered development assistants, engineering leaders are naturally beginning to expect significantly higher productivity from their teams. We expect features to be built faster, integrations to be delivered sooner, and implementation timelines to shrink.

AI can certainly generate code, metadata, tests, and documentation in a matter of hours. However, this raises an important question for enterprise software teams:

Can we safely ship software at the same speed at which AI can generate it?

For internal tools, perhaps the answer is often yes. But for enterprise products and customer implementations, where security, compliance, reliability, and long-term maintainability are critical, the answer is far less obvious.

This article explores how AI is changing software engineering and why system ownership, judgment, and verification may become even more important in the AI era.

1. The Real Shift. From Code to Specification

Generative AI is not simply making coding faster. It is changing what matters in software engineering. The core shift is moving away from writing code toward defining clear specifications. What we are building, why we are building it, and under what constraints it must operate.

In this model, intent, architecture, and quality expectations become the primary engineering artifacts. Code becomes an output rather than the starting point.

2. Better Inputs Lead to Better Systems

AI does not replace engineering thinking. It magnifies it.

Teams that provide clear requirements, strong architecture, well-defined constraints, and high engineering standards will produce better systems faster. Conversely, unclear requirements, weak architectural decisions, and poor standards can also be accelerated, resulting in systems that are difficult to maintain, troubleshoot, and evolve.

The quality of the output is therefore heavily dependent on the quality of the input. As AI removes much of the effort associated with implementation, clarity of thought and precision in specification become increasingly important.

3. The Partial Understanding Problem. Would You Ship What You Only Partially Understand?

A critical tension emerges in AI assisted development. Teams can now generate large portions of a system they only partially understand. In my experience, engineers may truly understand only a fraction of what an AI agent produces, especially when working at high velocity.

This raises a fundamental question. Would you confidently ship customer facing software when no one on the team fully understands the entire system?

For internal tools, this level of partial understanding may be acceptable due to low risk and fast iteration cycles. But for production systems in enterprise environments, the stakes are far higher. Partial understanding increases exposure to hidden security flaws, edge case failures, and long term maintainability risks.

4. From Code Writing to System Validation and Governance

The responsibility of senior engineers is shifting from writing code to validating and governing the quality, safety, and design of systems built with AI. Senior engineers move from being code producers to system validators and owners, ensuring that AI generated implementations are correct, safe, and production ready.

Regardless of how the code is generated, it will ultimately be committed, reviewed, and owned by an engineer or team. AI may assist in implementation, but responsibility for the code cannot be delegated. The engineer whose name is associated with the change remains accountable for its correctness, maintainability, and operational behavior.

Some organizations may eventually allow AI agents to commit code directly. However, in enterprise software, the review and approval process still requires human judgment. Can we realistically afford a world where one AI agent writes the code and another AI agent reviews and approves it for production, with no human accountability in the loop? For systems that handle sensitive customer data, financial transactions, or business critical operations, human oversight remains essential.

However, this introduces a deeper tension. If engineers themselves may only understand a fraction of the generated system, what does “human review” actually mean in practice? In many cases, review risks becoming a procedural checkpoint rather than true comprehension. The code is approved, but not fully understood. This raises an uncomfortable question about whether governance is still based on understanding, or simply trust in the process.

Ultimately, customers do not buy software from Claude, Codex, or any other AI assistant. They buy it from us. When a production issue occurs, teams cannot tell customers that an AI model generated the faulty implementation. Accountability remains entirely human. AI can assist in building systems, but ownership of failures, fixes, and customer trust continues to rest with engineering teams.

Conclusion

Software engineering is shifting from code production to system stewardship. The winners in this new era will be those who can clearly specify intent, critically evaluate AI generated outputs, and confidently take ownership of the systems they ship.

Friday, May 29, 2026

HTTPS & mTLS

Every time you visit a website over HTTPS, a sophisticated cryptographic exchange happens in milliseconds — certificates are checked, signatures verified, and a shared encrypted session established before a single byte of your request is sent. This post unpacks exactly what happens, step by step, and then extends it to mutual TLS (mTLS), where both sides of the connection prove who they are.

We will use DigiCert — one of the world's largest certificate authorities — as our real-world example throughout.

What Is a Digital Certificate?

A digital certificate is a signed document that binds a public key to an identity. Think of it as a cryptographic passport. It does not just contain a key — it attaches that key to a name, an organisation, and a validity window, all stamped by a trusted authority called a Certificate Authority (CA).

A typical certificate contains:

Subject — the entity it was issued to (e.g. login.example.com)
Issuer — the CA that signed it (e.g. DigiCert TLS RSA SHA256 2020 CA1)
Public key — the cryptographic key belonging to the subject
Validity dates — Not Before / Not After
Extended Key Usage (EKU) — what the cert can be used for (server auth, client auth, etc.)
CA's digital signature — proves DigiCert vouched for all of the above

The certificate is not secret — it is sent to everyone who connects. Its power comes entirely from the CA's signature at the end.

Public Key vs Certificate — Are They the Same Thing?

This is a common point of confusion. They are different things, but closely related: the certificate contains the public key, plus a great deal more.

A public key alone is identity-less. It is just a number. Anyone could generate one and claim it belongs to google.com. The certificate is what binds that key to an identity — and the CA's signature is what makes that binding trustworthy.

The analogy:

Public key = your fingerprint. Unique, but tells nobody who you are.
Certificate = your passport. Fingerprint + name + issuing authority + expiry + government stamp.
CA's signature = the government stamp. Without it, the passport is worthless.
Private key = the secret only you know, which proves the passport is genuinely yours.

Figure 1 — Anatomy of an X.509 certificate. The public key is one field among many.

Normal HTTPS — One-Way TLS

In normal HTTPS, only the server proves its identity. The client (your browser) never presents a certificate. The server sends its certificate, the browser validates it, and a shared encrypted session is established.

The Four Validation Checks

When your browser receives the server's certificate, it performs exactly four checks in order:

Check	What it verifies	How
1. Chain of Trust	Is the certificate signed by a trusted CA?	Walks up the chain until it finds a root CA in the browser's built-in trust store
2. Expiry	Is the certificate currently valid?	Checks Not Before and Not After dates against today
3. Hostname Match	Does the cert belong to this domain?	Matches the URL against the CN or Subject Alternative Names (SAN) in the cert
4. Revocation	Has the cert been revoked by the CA?	Queries the CA via CRL (Certificate Revocation List) or OCSP (real-time check)

The chain of trust check is the most important. The browser does not trust the server's certificate directly — it trusts the root CA (e.g. DigiCert's root), which is pre-installed in the browser. The chain looks like this:

login.example.com cert → signed by DigiCert Intermediate CA → signed by DigiCert Root CA → found in browser trust store ✓

Any tiny tampering with the certificate (changing the domain, swapping the public key, altering the expiry) produces a different hash, and the signature check fails immediately.

What Does a Digital Signature Actually Sign?

This is where most explanations become vague. There are actually two distinct signatures in every TLS connection, and it is important to understand both.

Signature 1 — The CA Signs the Certificate (once, at issuance)

When DigiCert issues a certificate to login.example.com, it does not just sign the public key. It takes a SHA-256 hash of the entire certificate contents — subject, issuer, validity dates, EKU flags, serial number, and the public key all together — and then signs that hash using DigiCert's own private key.

The client verifies this by:

Extracting DigiCert's public key from the browser trust store
Decrypting the signature to recover the original hash
Independently hashing the certificate contents it received
Comparing the two hashes — if they match, the certificate is genuine and untampered

Signature 2 — The Server Signs the Live Handshake

Just having the certificate is not enough. Anyone could copy a real certificate. So the client challenges the server to sign the live handshake transcript with its private key. Only the genuine server — the one holding the matching private key — can produce a valid signature.

The server takes the full handshake transcript so far (ClientHello + ServerHello + both certificates + all prior messages), hashes it, signs the hash with its private key, and sends the result back as a CertificateVerify message. The client then verifies this signature using the public key extracted from the certificate it just validated.

This is the critical step that defeats impersonation. Even if an attacker copies a real certificate, they cannot sign the handshake challenge without the private key — which never leaves the legitimate server.

Why both signatures are necessary: Signature 1 proves the certificate is legitimate. Signature 2 proves the server actually owns the private key that matches the public key inside that certificate.

Mutual TLS (mTLS) — Both Sides Authenticate

In mTLS, the same mechanism runs in reverse for the client. The server demands a certificate from the client during the handshake, validates it against its own truststore, and challenges the client to prove private key ownership — identical to what the client does with the server, just in the opposite direction.

Figure 2 — Full mTLS handshake. Dashed arrows are the additional steps beyond normal HTTPS.

The Symmetric Picture

Everything that happens on the server side has an exact mirror on the client side in mTLS:

Aspect	Server Side (both HTTPS and mTLS)	Client Side (mTLS only)
Certificate issued by	Public CA (e.g. DigiCert)	Dedicated Client Auth CA
EKU in certificate	id-kp-serverAuth	id-kp-clientAuth
CA signature verified using	Browser / OS trust store	Server's own truststore
Live handshake signed by	Server's private key	Client's private key
Signature verified using	Server's public key (from cert)	Client's public key (from cert)
What it proves	"This is the real server"	"This is the real client"

DigiCert — A Real-World Certificate Authority

DigiCert is one of the world's largest commercial CAs, trusted by default in all major browsers, operating systems, and application servers. It provides a good illustration of how the CA role works in practice.

What DigiCert actually does

Before issuing a certificate, DigiCert verifies the applicant's identity. There are three levels of verification:

Domain Validation (DV) — Proves you control the domain, by having you place a specific DNS record or a file on the web server. Fast — typically minutes to hours. Used for most standard HTTPS websites.
Organisation Validation (OV) — Verifies the legal organisation behind the domain. DigiCert checks business registration documents and may make phone contact. Takes a few days.
Extended Validation (EV) — The most rigorous level. DigiCert audits legal existence, physical address, and operational status. Used by banks and high-security sites.

The root key is kept offline

DigiCert's root CA private key is stored offline in Hardware Security Modules (HSMs) — physical devices in secured data centres that never connect to the internet. Day-to-day certificate signing is done by intermediate CAs that chain up to the root. If an intermediate is compromised, DigiCert can revoke just that intermediate without touching the root.

Separate hierarchies for server vs client auth

DigiCert maintains separate root hierarchies for different purposes:

The DigiCert TLS RSA SHA256 intermediate — issues server authentication certificates (for websites)
The DigiCert Assured ID Root G2 / G3 — dedicated client authentication roots, used for mTLS client certificates

This separation means a problem in the client auth hierarchy cannot affect the trust of public websites, and vice versa. Each hierarchy is purpose-built and independently audited.

How to Tell if a Server Requires mTLS

The simplest test is to try connecting without a client certificate. If the connection succeeds, it is plain HTTPS. If the TLS handshake fails or returns an HTTP 400 error, the server is requiring a client certificate.

Using curl

Run the following from a terminal:

curl -v https://yourserver.com/api

Plain HTTPS — the handshake completes and you get a normal response.

mTLS required — you will see an error such as:

SSL_ERROR_HANDSHAKE_FAILURE_ALERT
HTTP/1.1 400 No required SSL certificate was sent

Using OpenSSL

This method is more revealing — it shows exactly which CAs the server trusts for client certificates:

openssl s_client -connect yourserver.com:443

Look for the following in the output:

Plain HTTPS:

No client certificate CA names sent

mTLS server:

Acceptable client certificate CA names
/C=US/O=DigiCert Inc/CN=DigiCert Assured ID Root G2
/C=BE/O=GlobalSign nv-sa/CN=GlobalSign Client Authentication Root R45

The "Acceptable client certificate CA names" section is the server explicitly listing which CAs it trusts for client certificates. In plain HTTPS this section is simply absent.

Summary

The entire system of HTTPS and mTLS rests on three interlocking ideas: asymmetric cryptography (public/private key pairs), digital signatures (hash and sign with a private key), and certificate authorities (trusted third parties that bind keys to identities). Remove any one of these and the model collapses.

Property	Normal HTTPS	Mutual TLS (mTLS)
Who presents a certificate?	Server only	Both server and client
Who validates?	Client (browser)	Both sides validate each other
Trust store used for client cert	Not applicable	Server's own configured truststore
Who signs the live handshake?	Server only	Both server and client
Client certificate required?	No	Yes
Typical use case	All public websites	API-to-API, machine identity, zero-trust networks

mTLS does not replace HTTPS — it extends it. The server still proves its identity exactly as before. mTLS simply adds the mirror requirement: the client must also prove its identity, using the same cryptographic mechanism, in the opposite direction.

Saturday, August 23, 2025

Why Iteration Has Been My Biggest Learning

In my 20 years of product development, the biggest lesson I’ve learned has come in the last three years - The Power of Iteration. For a long time, I believed success was about getting things "right" on the first attempt. But real progress rarely works that way. Whether it's building products, shaping user experiences, or solving complex challenges, the best solutions evolve gradually, not instantly.

Why iteration works:

It mirrors the way humans naturally solve problems. We don't leap to perfect answers in one go. Instead, we observe a challenge, form ideas, test them out, and then refine based on what we learn. Each cycle gets us closer to a meaningful solution. What seemed impossible at first becomes manageable when broken down into smaller, repeated improvements.

Iteration isn't about failure, it's about discovery. Every version is valuable because it reveals insights we couldn't see before. Even a design that doesn't "work" teaches us something crucial about what to avoid or improve. This mindset takes the pressure off perfection and shifts the focus to progress.

In my own work, I've seen how embracing iteration creates resilience and innovation. Teams become more open to experimentation because they know mistakes are not dead-ends but stepping stones. Stakeholders gain confidence because they see visible progress rather than waiting for one "big reveal". Most importantly, products grow stronger because they've been tested, challenged, and refined repeatedly.

The beauty of iteration is that it turns complexity into clarity and uncertainty into momentum. It's not just a method rather it is a mindset. And once you adopt it, you start to realize that every step forward, no matter how small, is part of building something extraordinary.

Friday, March 28, 2025

Parable of the Pottery Class

The Parable of the Pottery Class is a well-known story that illustrates the power of practice and iteration over perfectionism. It originates from the book Art & Fear by David Bayles and Ted Orland.

The Story:

A ceramics teacher divided his class into two groups. One group was graded solely on the quantity of pots they produced, while the other group was graded on the quality of a single pot.

The quantity group was instructed to make as many pots as possible—weighting their total output at the end of the semester.
The quality group was tasked with creating just one perfect pot.

The Result:

By the end of the semester, the students in the quantity group produced the highest-quality pots—not just in volume, but also in craftsmanship. They improved through constant practice, experimentation, and learning from mistakes. Meanwhile, the quality group spent too much time theorizing and planning, resulting in inferior work.

The Lesson:

Practice leads to mastery. Repetition and doing the work help refine skills faster than overanalyzing.
Failure is a teacher. Making mistakes and iterating lead to better outcomes than trying to be perfect from the start.
Action beats overthinking. Creativity and skill develop through hands-on experience, not just planning.

This principle applies beyond pottery—to writing, programming, business, and any creative field. The more you do, the better you become.

Saturday, March 9, 2024

UUID (Universally Unique Identifier)

UUID stands for Universally Unique Identifier. It is a 128-bit identifier standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE). UUIDs are used to uniquely identify information in computer systems and across distributed systems. UUIDs are represented by 32(x4=128) hexadecimal digits, like this : eaf20a8a-7687-4acb-8253-218682888dd8 In many computer systems and programming languages, timestamps are represented using the Unix time format, also known as POSIX time or Unix epoch time. This format represents time as the number of seconds (or milliseconds) that have elapsed since midnight Coordinated Universal Time (UTC) on January 1, 1970 (the Unix epoch). For example, the current Unix time at the time of writing (March 8, 2024) would be a large number of seconds (or milliseconds) since January 1, 1970. In programming languages like Python, JavaScript, Java, etc., timestamps are often represented as integers or floating-point numbers, indicating the number of seconds (or milliseconds) since the Unix epoch. Using 64 bits (double precision floating-point or a long integer) is common for this level of precision. UUIDs version 1 is grouped into five sections

Time Low: The first 32 bits of the UUID, representing the low 32 bits of the timestamp.
Time Mid: The next 16 bits of the UUID, representing the middle 16 bits of the timestamp.
Time High and Version: The next 16 bits of the UUID, representing the high 16 bits of the timestamp, along with a version number (indicating the UUID version).
Clock Sequence and Variant: The next 8 bits of the UUID, representing the clock sequence (used for UUIDs generated within the same timestamp) and variant bits. The process of generating the clock sequence typically involves generating a random or pseudo-random number of 14 bits in length.

Node: The last 48 bits of the UUID, representing the node identifier, typically the MAC address of the computer generating the UUID. UUID Version 2 (Domain Identifier-Based): UUID version 2 is similar to version 1 but includes a domain identifier, the clock sequence field is combined with the domain identifier to ensure uniqueness within the same domain and timestamp. UUID Version 3 and 5 (Name-Based): Versions 3 and 5 UUIDs are based on hashing a namespace identifier and name to produce a UUID. Version 3 uses MD5 hashing, while version 5 uses SHA-1 hashing. They ensure reproducibility, meaning the same namespace and name input will always produce the same UUID output. Version 5 is considered more secure due to the use of the SHA-1 hashing algorithm. UUID Version 4 (Random): Version 4 UUIDs are entirely random and do not rely on any algorithm or source of information other than random number generation. They are generated using cryptographically secure random number generators, ensuring a very low probability of collision. This property makes UUID Version 4 ideal for situations where uniqueness and unpredictability are crucial, such as in cryptographic applications or distributed systems. Cryptographically secure random number generators (CSPRNGs) are algorithms designed to generate random numbers that are suitable for use in cryptographic applications. Their canonical textual representation is a series of 32 hexadecimal characters which are separated into five groups by hyphens in the form 8-4-4-4-12.

Monday, February 26, 2024

Encodings

ASCII

ASCII stands for American Standard Code for Information Interchange. It is a character encoding standard that assigns numerical values (codes) to represent characters, including letters, numbers, punctuation marks, and control characters.

The American Standard Code for Information Interchange (ASCII) was developed by a committee of the American Standards Association (ASA), called the X3 committee, by its X3.2.4 working group in early 60's. The ASA later became the United States of America Standards Institute (USASI) and ultimately became the American National Standards Institute (ANSI).

ASCII is a 7-bit character set containing 128 characters. 2⁷=128. It contains the numbers from 0-9, the upper and lower case English letters from A to Z, and some special characters.

ASCII has been widely used in computers and communication equipment for encoding text data. However, with the need to represent characters beyond the original 128, extended versions of ASCII have been developed, such as ISO 8859 and UTF-8, which support a wider range of characters including those from non-English languages and special symbols.

ASCII Printable Characters

 
  Char

  Number

  Description

  0 - 31

  Control characters (see below)

  32

  space

  !

  33

  exclamation mark

  "

  34

  quotation mark

  #

  35

  number sign

  $

  36

  dollar sign

  %

  37

  percent sign

  &

  38

  ampersand

  '

  39

  apostrophe

  (

  40

  left parenthesis

  )

  41

  right parenthesis

  *

  42

  asterisk

  +

  43

  plus sign

  ,

  44

  comma

  -

  45

  hyphen

  .

  46

  period

  /

  47

  slash

  0

  48

  digit 0

  1

  49

  digit 1

  2

  50

  digit 2

  3

  51

  digit 3

  4

  52

  digit 4

  5

  53

  digit 5

  6

  54

  digit 6

  7

  55

  digit 7

  8

  56

  digit 8

  9

  57

  digit 9

  :

  58

  colon

  ;

  59

  semicolon

  < 

  60

  less-than

  =

  61

  equals-to

  > 

  62

  greater-than

  ?

  63

  question mark

  @

  64

  at sign

  A

  65

  uppercase A

  B

  66

  uppercase B

  C

  67

  uppercase C

  D

  68

  uppercase D

  E

  69

  uppercase E

  F

  70

  uppercase F

  G

  71

  uppercase G

  H

  72

  uppercase H

  I

  73

  uppercase I

  J

  74

  uppercase J

  K

  75

  uppercase K

  L

  76

  uppercase L

  M

  77

  uppercase M

  N

  78

  uppercase N

  O

  79

  uppercase O

  P

  80

  uppercase P

  Q

  81

  uppercase Q

  R

  82

  uppercase R

  S

  83

  uppercase S

  T

  84

  uppercase T

  U

  85

  uppercase U

  V

  86

  uppercase V

  W

  87

  uppercase W

  X

  88

  uppercase X

  Y

  89

  uppercase Y

  Z

  90

  uppercase Z

  [

  91

  left square bracket

  \

  92

  backslash

  ]

  93

  right square bracket

  ^

  94

  caret

  _

  95

  underscore

  `

  96

  grave accent

  a

  97

  lowercase a

  b

  98

  lowercase b

  c

  99

  lowercase c

  d

  100

  lowercase d

  e

  101

  lowercase e

  f

  102

  lowercase f

  g

  103

  lowercase g

  h

  104

  lowercase h

  i

  105

  lowercase i

  j

  106

  lowercase j

  k

  107

  lowercase k

  l

  108

  lowercase l

  m

  109

  lowercase m

  n

  110

  lowercase n

  o

  111

  lowercase o

  p

  112

  lowercase p

  q

  113

  lowercase q

  r

  114

  lowercase r

  s

  115

  lowercase s

  t

  116

  lowercase t

  u

  117

  lowercase u

  v

  118

  lowercase v 

  w

  119

  lowercase w 

  x

  120

  lowercase x

  y

  121

  lowercase y

  z

  122

  lowercase z

  {

  123

  left curly brace

  |

  124

  vertical bar

  }

  125

  right curly brace

  ~

  126

  tilde

ASCII Control Characters

 
   Char

  Number

  Description

  NUL

  00

  null character

  SOH

  01

  start of header

  STX

  02

  start of text

  ETX

  03

  end of text

  EOT

  04

  end of transmission

  ENQ

  05

  enquiry

  ACK

  06

  acknowledge

  BEL

  07

  bell (ring)

  BS

  08

  backspace

  HT

  09

  horizontal tab

  LF

  10

  line feed

  VT

  11

  vertical tab

  FF

  12

  form feed

  CR

  13

  carriage return

  SO

  14

  shift out

  SI

  15

  shift in

  DLE

  16

  data link escape

  DC1

  17

  device control 1

  DC2

  18

  device control 2

  DC3

  19

  device control 3

  DC4

  20

  device control 4

  NAK

  21

  negative acknowledge

  SYN

  22

  synchronize

  ETB

  23

  end transmission block

  CAN

  24

  cancel

  EM

  25

  end of medium

  SUB

  26

  substitute

  ESC

  27

  escape

  FS

  28

  file separator

  GS

  29

  group separator

  RS

  30

  record separator

  US

  31

  unit separator

  DEL

  127

  delete (rubout)

URL Encodings

URL encoding, also known as percent-encoding, is a mechanism used to convert certain characters in a URL (Uniform Resource Locator) into a format that can be safely transmitted over the internet.

In URLs, certain characters have special meanings or functions. For example, the characters ?, &, /, and = are used to separate different parts of the URL or to denote query parameters. If you want to include these characters in the URL as part of data rather than as delimiters or special characters, they need to be encoded.

URL encoding works by replacing non-alphanumeric characters with a '%' followed by their ASCII hexadecimal value. For instance, a space character ' ' is represented as '%20', since its ASCII hexadecimal value is 20.

ANSI

The American National Standards Institute (ANSI) is a private, non-profit organization that administers and coordinates the U.S. voluntary standards and conformity assessment system. The ANSI character set was the standard set of characters used in Windows operating systems through Windows 95 and Windows NT, after which Unicode was adopted.

ISO-8859

ISO 8859, also known as Latin alphabet No. 1, is a series of character encoding standards developed by the International Organization for Standardization (ISO) in 1987. These standards are designed to extend the ASCII character set to include characters from various languages that use Latin scripts, such as English, French, German, Spanish, and many others. Each ISO 8859 standard defines an 8-bit character encoding, allowing for a total of 256 characters. There are several versions of the ISO 8859 standard, each tailored to support specific languages or language groups. The most commonly used versions include ISO 8859-1, ISO 8859-2, ISO 8859-3, and so on, up to ISO 8859-16. Each version provides support for additional characters beyond the original ASCII character set while maintaining compatibility with ASCII for the first 128 characters.

In ISO-8859-1, the characters from 128 to 159 are not defined. The next part of ISO-8859-1 (codes from 160-191) contains commonly used special characters. If you use the less than (<) or greater than (>) signs in your HTML text, the browser might mix them with tags. Entity names or entity numbers can be used to display reserved HTML characters. Entity names are represented as &entity_name; Entity numbers are represented as &#entity_number;

ISO-8859-1 Symbols (160-191)

  
  Character

  Entity Number

    Enity Name

  Description

  &#160;

  &nbsp;

  non-breaking space

  ¡

  &#161;

  &iexcl;

  inverted exclamation mark

  ¢

  &#162;

  &cent;

  cent

  £

  &#163;

  &pound;

  pound

  ¤

  &#164;

  &curren;

  currency

  ¥

  &#165;

  &yen;

  yen

  ¦

  &#166;

  &brvbar;

  broken vertical bar

  §

  &#167;

  &sect;

  section

  ¨

  &#168;

  &uml;

  spacing diaeresis

  ©

  &#169;

  &copy;

  copyright

  ª

  &#170;

  &ordf;

  feminine ordinal indicator

  «

  &#171;

  &laquo;

  angle quotation mark (left)

  ¬

  &#172;

  &not;

  negation

  ­

  &#173;

  &shy;

  soft hyphen

  ®

  &#174;

  &reg;

  registered trademark

  ¯

  &#175;

  &macr;

  spacing macron

  °

  &#176;

  &deg;

  degree

  ±

  &#177;

  &plusmn;

  plus-or-minus

  ²

  &#178;

  &sup2;

  superscript 2

  ³

  &#179;

  &sup3;

  superscript 3

  ´

  &#180;

  &acute;

  spacing acute

  µ

  &#181;

  &micro;

  micro

  ¶

  &#182;

  &para;

  paragraph

  ·

  &#183;

  &middot;

  middle dot

  ¸

  &#184;

  &cedil;

  spacing cedilla

  ¹

  &#185;

  &sup1;

  superscript 1

  º

  &#186;

  &ordm;

  masculine ordinal indicator

  »

  &#187;

  &raquo;

  angle quotation mark (right)

  ¼

  &#188;

  &frac14;

  fraction 1/4

  ½

  &#189;

  &frac12;

  fraction 1/2

  ¾

  &#190;

  &frac34;

  fraction 3/4

  ¿

  &#191;

  &iquest;

  inverted question mark

ISO-8859-1 Characters (192-255)

 
  Character

  Entity Number

  Entity Name

  Description

  À

  &#192;

  &Agrave;

  capital a, grave accent

  Á

  &#193;

  &Aacute;

  capital a, acute accent

  Â

  &#194;

  &Acirc;

  capital a, circumflex accent

  Ã

  &#195;

  &Atilde;

  capital a, tilde

  Ä

  &#196;

  &Auml;

  capital a, umlaut mark

  Å

  &#197;

  &Aring;

  capital a, ring

  Æ

  &#198;

  &AElig;

  capital ae

  Ç

  &#199;

  &Ccedil;

  capital c, cedilla

  È

  &#200;

  &Egrave;

  capital e, grave accent

  É

  &#201;

  &Eacute;

  capital e, acute accent

  Ê

  &#202;

  &Ecirc;

  capital e, circumflex accent

  Ë

  &#203;

  &Euml;

  capital e, umlaut mark

  Ì

  &#204;

  &Igrave;

  capital i, grave accent

  Í

  &#205;

  &Iacute;

  capital i, acute accent

  Î

  &#206;

  &Icirc;

  capital i, circumflex accent

  Ï

  &#207;

  &Iuml;

  capital i, umlaut mark

  Ð

  &#208;

  &ETH;

  capital eth, Icelandic

  Ñ

  &#209;

  &Ntilde;

  capital n, tilde

  Ò

  &#210;

  &Ograve;

  capital o, grave accent

  Ó

  &#211;

  &Oacute;

  capital o, acute accent

  Ô

  &#212;

  &Ocirc;

  capital o, circumflex accent

  Õ

  &#213;

  &Otilde;

  capital o, tilde

  Ö

  &#214;

  &Ouml;

  capital o, umlaut mark

  ×

  &#215;

  &times;

  multiplication

  Ø

  &#216;

  &Oslash;

  capital o, slash

  Ù

  &#217;

  &Ugrave;

  capital u, grave accent

  Ú

  &#218;

  &Uacute;

  capital u, acute accent

  Û

  &#219;

  &Ucirc;

  capital u, circumflex accent

  Ü

  &#220;

  &Uuml;

  capital u, umlaut mark

  Ý

  &#221;

  &Yacute;

  capital y, acute accent

  Þ

  &#222;

  &THORN;

  capital THORN, Icelandic

  ß

  &#223;

  &szlig;

  small sharp s, German

  à

  &#224;

  &agrave;

  small a, grave accent

  á

  &#225;

  &aacute;

  small a, acute accent

  â

  &#226;

  &acirc;

  small a, circumflex accent

  ã

  &#227;

  &atilde;

  small a, tilde

  ä

  &#228;

  &auml;

  small a, umlaut mark

  å

  &#229;

  &aring;

  small a, ring

  æ

  &#230;

  &aelig;

  small ae

  ç

  &#231;

  &ccedil;

  small c, cedilla

  è

  &#232;

  &egrave;

  small e, grave accent

  é

  &#233;

  &eacute;

  small e, acute accent

  ê

  &#234;

  &ecirc;

  small e, circumflex accent

  ë

  &#235;

  &euml;

  small e, umlaut mark

  ì

  &#236;

  &igrave;

  small i, grave accent

  í

  &#237;

  &iacute;

  small i, acute accent

  î

  &#238;

  &icirc;

  small i, circumflex accent

  ï

  &#239;

  &iuml;

  small i, umlaut mark

  ð

  &#240;

  &eth;

  small eth, Icelandic

  ñ

  &#241;

  &ntilde;

  small n, tilde

  ò

  &#242;

  &ograve;

  small o, grave accent

  ó

  &#243;

  &oacute;

  small o, acute accent

  ô

  &#244;

  &ocirc;

  small o, circumflex accent

  õ

  &#245;

  &otilde;

  small o, tilde

  ö

  &#246;

  &ouml;

  small o, umlaut mark

  ÷

  &#247;

  &divide;

  division

  ø

  &#248;

  &oslash;

  small o, slash

  ù

  &#249;

  &ugrave;

  small u, grave accent

  ú

  &#250;

  &uacute;

  small u, acute accent

  û

  &#251;

  &ucirc;

  small u, circumflex accent

  ü

  &#252;

  &uuml;

  small u, umlaut mark

  ý

  &#253;

  &yacute;

  small y, acute accent

  þ

  &#254;

  &thorn;

  small thorn, Icelandic

  ÿ

  &#255;

  &yuml;

  small y, umlaut mark

Variants of ISO-8859

 
  Number

  Description

  Covers

  8859-1

  Latin 1

  North America, Western Europe, Latin America, the
  Caribbean, Canada, Africa.

  8859-2

  Latin 2

  Eastern Europe.

  8859-3

  Latin 3

  SE Europe, Esperanto, miscellaneous others.

  8859-4

  Latin 4

  Scandinavia/Baltics (and others not in
  ISO-8859-1).

  8859-5

  Latin/Cyrillic

  The Cyrillic alphabet. Bulgarian, Belarusian,
  Russian and Macedonian.

  8859-6

  Latin/Arabic

  The Arabic alphabet.

  8859-7

  Latin/Greek

  The modern Greek alphabet and mathematical
  symbols derived from the Greek.

  8859-8

  Latin/Hebrew

  The Hebrew alphabet.

  8859-9

  Latin/Turkish

  The Turkish alphabet. Same as ISO-8859-1 except
  Turkish characters replace Icelandic.

  8859-10

  Latin/Nordic

  Nordic alphabets. Lappish, Nordic, Eskimo.

  8859-15

  Latin 9 (Latin 0)

  Similar to ISO-8859-1 but replaces some less
  common symbols with the euro sign and some other missing characters.

  2022-JP

  Latin/Japanese 1

  The Japanese alphabet part 1.

  2022-JP-2

  Latin/Japanese 2

  The Japanese alphabet part 2.

  2022-KR

  Latin/Korean 1

  The Korean alphabet.

However, ISO 8859 has limitations, particularly in supporting characters from languages outside the Western European region, which led to the development of more comprehensive encoding standards like Unicode.

ANSI Code Page & Windows-1252

ANSI code pages officially called "Windows code pages after Microsoft accepted the former term being a misnomer are used for native non-Unicode (byte oriented) applications using a graphical user interface on Windows systems. The term "ANSI" is a misnomer because these Windows code pages didn't comply with any ANSI standard. Code page 1252 was based on an early ANSI draft that later became the international standard ISO 8859-1. Windows-1252 was the first default character set in Microsoft Windows. Undeclared charsets in HTML are also assumed to be Windows-1252. Windows-1252 is identical to ISO-8859-1 except for the code points 128-159 (0x80-0x9F). In ISO-8859-1, the characters from 128 to 159 are not defined. Windows-1252 has several characters, punctuation, arithmetic and business symbols assigned to these code points.

 
  Character

  Number

  Entity Name

  Description

  €

  128

  &euro;

  euro sign

  129

  NOT USED

  ‚

  130

  &sbquo;

  single low-9 quotation mark

  ƒ

  131

  &fnof;

  Latin small letter f with hook

  „

  132

  &bdquo;

  double low-9 quotation mark

  …

  133

  &hellip;

  horizontal ellipsis

  †

  134

  &dagger;

  dagger

  ‡

  135

  &Dagger;

  double dagger

  ˆ

  136

  &circ;

  modifier letter circumflex accent

  ‰

  137

  &permil;

  per mille sign

  Š

  138

  &Scaron;

  Latin capital letter S with caron

  ‹

  139

  &lsaquo;

  single left-pointing angle quotation mark

  Œ

  140

  &OElig;

  Latin capital ligature OE

  141

  NOT USED

  Ž

  142

  &Zcaron;

  Latin capital letter Z with caron

  143

  NOT USED

  144

  NOT USED

  ‘

  145

  &lsquo;

  left single quotation mark

  ’

  146

  &rsquo;

  right single quotation mark

  “

  147

  &ldquo;

  left double quotation mark

  ”

  148

  &rdquo;

  right double quotation mark

  •

  149

  &bull;

  bullet

  –

  150

  &ndash;

  en dash

  —

  151

  &mdash;

  em dash

  ˜

  152

  &tilde;

  small tilde

  ™

  153

  &trade;

  trade mark sign

  š

  154

  &scaron;

  Latin small letter s with caron

  ›

  155

  &rsaquo;

  single right-pointing angle quotation mark

  œ

  156

  &oelig;

  Latin small ligature oe

  157

  NOT USED

  ž

  158

  &zcaron;

  Latin small letter z with caron

  Ÿ

  159

  &Yuml;

  Latin capital letter Y with diaeresis

HEXADECIMAL

Hexadecimal, often abbreviated as "hex," is a base-16 numeral system used in mathematics and computer science. In hexadecimal, numbers are represented using 16 symbols: the digits 0-9 and the letters A-F (where A represents 10, B represents 11, and so on up to F representing 15). Hexadecimal is commonly used in computing because it provides a more concise way to represent binary data, as each hexadecimal digit corresponds to four binary digits (bits).

Decimal	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
Hexadecimal	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F

After reaching 9 in decimal, hexadecimal uses letters A-F to represent values from 10 to 15. Lets convert decimal value 269 to hexadecimal..

The hexadecimal number is the reverse of the remainder we get in each step.

Base64

Base64 is a binary-to-text encoding scheme that represents binary data in an ASCII string format by translating it into a radix-64 representation. It's used to encode binary data, such as images, audio files, or any other binary content, into a format that can be transmitted over text-based channels, such as email or URLs, without corruption due to special characters or encoding issues. The binary data is divided into groups of 6 bits. Each group of 6 bits is then represented by a character from a predefined set of 64 printable ASCII characters. These characters typically include uppercase and lowercase letters (A-Z, a-z), digits (0-9), and two additional symbols (usually '+' and '/'). Padding characters ('=') are added if the number of bits in the original binary data is not divisible by 6. For example, consider the sentence Hi\n, where the \n represents a newline. The first step in the encoding process is to obtain the binary representation of each ASCII character.

UTF-8

UTF-8, which stands for Unicode Transformation Format 8-bit, is a variable-width character encoding capable of encoding all possible Unicode code points. It's the most commonly used encoding on the internet and in computing systems worldwide because it efficiently represents a wide range of characters while maintaining backward compatibility with ASCII.

UTF-8 converts a code point (which represents a single character in Unicode) into a set of one to four bytes. UTF-8 is compact and efficient, especially for languages that use mostly ASCII characters. For example, an English text encoded in UTF-8 will use the same space as of ASCII text. A code point is a number assigned to represent an abstract character in unicode. The code point for a character is typically represented in hexadecimal notation. For example, the code point for the letter "A" is U+0041, where "U+" indicates that the following digits represent a Unicode code point, and "0041" is the hexadecimal representation of the code point.

In Unicode, a "plane" refers to a continuous group of 65,536 (2^16) code points. Unicode is organized into a multilevel hierarchical structure, and planes are one of the key components of this structure. The Unicode Standard assigns code points to different planes to accommodate a vast number of characters from various scripts and symbol sets. Unicode divides its code space into 17 planes, labeled from 0 to 16 (0x0 to 0x10 in hexadecimal). Each plane contains 65,536 code points, providing a total of 1,114,112 (17 * 65,536) code points. The first plane, Plane 0 (U+0000 to U+FFFF), known as the Basic Multilingual Plane (BMP), contains most commonly used characters for modern text processing, covering scripts such as Latin, Cyrillic, Greek, Hebrew, Arabic, Chinese, Japanese, and Korean, as well as many symbols, punctuation marks, and control characters. Planes 1 through 16 are referred to as "supplementary planes." They contain additional characters and symbols, including historical scripts, rare characters, emoji, mathematical symbols, musical symbols, and more. Plane 1 in Unicode, also known as the Supplementary Multilingual Plane (SMP), consists of code points ranging from U+10000 to U+1FFFF. Likewise Plane 2 consists of code points ranging from U+20000 to U+2FFFF.

In UTF-8 encoding, code points in Plane 0 are represented using sequences of one to three bytes, depending on the code point's value

For code points in the range U+0000 to U+007F (0 to 127), UTF-8 encodes them as follows: Code points in this range are represented using a single byte. The byte's value directly corresponds to the code point's value.
For code points in the range U+0080 to U+07FF, UTF-8 encodes them as follows: Code points in this range are represented using two bytes. The high-order 5 bits of the code point are stored in the first byte, and the low-order 6 bits are stored in the second byte.
For code points in the range U+0800 to U+FFFF, UTF-8 encodes them as follows: Code points in this range are represented using three bytes. The high-order 4 bits of the code point are stored in the first byte, the next 6 bits in the second byte, and the low-order 6 bits in the third byte.

Here's a general pattern for representing code points in Plane 0 using UTF-8:

One-byte sequence: 0xxxxxxx
Two-byte sequence: 110xxxxx 10xxxxxx
Three-byte sequence: 1110xxxx 10xxxxxx 10xxxxxx

Where 'x' represents bits from the code point.
For example, let's say we have the code point U+0041. Its binary representation is: U+0041 = 0000 0000 0100 0001 To represent this code point in UTF-8: Since it falls in the range U+0000 to U+007F, it is represented using a single byte. Therefore, the UTF-8 representation of U+0041 would be: 01000001

Lets take another example U+0081

Its binary representation is: 0000 0000 1000 0001

The high-order 5 bits = 00010 & low-order 6 bits = 000001

U+0081=11000010 10000001

Lets take another example U+FFFF

Its binary representation is: 1111 1111 1111 1111

high order 4 bits=1111

first low order 6 bits=111111

second low order 6 bits=111111
U+FFFF= 11101111 10111111 10111111

Lets take another example UTF-8 representation of U+10348 which is a plane 1 codepoint Binary representation = 0001 0000 0011 0100 1000 This needs 20 bits to represented and the unicode variable size format can support upto 21 bits. Lets break it down into nine high order bit and 12 low order bits. 000100000 001101001000 As you can see the last group is only 11 bits hence we will prefix 0. Final UTF-8 encoding will be 11110000 10100000 10001101 10001000

Char	Number	Description
	0 - 31	Control characters (see below)
	32	space
!	33	exclamation mark
"	34	quotation mark
#	35	number sign
$	36	dollar sign
%	37	percent sign
&	38	ampersand
'	39	apostrophe
(	40	left parenthesis
)	41	right parenthesis
*	42	asterisk
+	43	plus sign
,	44	comma
-	45	hyphen
.	46	period
/	47	slash
0	48	digit 0
1	49	digit 1
2	50	digit 2
3	51	digit 3
4	52	digit 4
5	53	digit 5
6	54	digit 6
7	55	digit 7
8	56	digit 8
9	57	digit 9
:	58	colon
;	59	semicolon
<	60	less-than
=	61	equals-to
>	62	greater-than
?	63	question mark
@	64	at sign
A	65	uppercase A
B	66	uppercase B
C	67	uppercase C
D	68	uppercase D
E	69	uppercase E
F	70	uppercase F
G	71	uppercase G
H	72	uppercase H
I	73	uppercase I
J	74	uppercase J
K	75	uppercase K
L	76	uppercase L
M	77	uppercase M
N	78	uppercase N
O	79	uppercase O
P	80	uppercase P
Q	81	uppercase Q
R	82	uppercase R
S	83	uppercase S
T	84	uppercase T
U	85	uppercase U
V	86	uppercase V
W	87	uppercase W
X	88	uppercase X
Y	89	uppercase Y
Z	90	uppercase Z
[	91	left square bracket
\	92	backslash
]	93	right square bracket
^	94	caret
_	95	underscore
`	96	grave accent
a	97	lowercase a
b	98	lowercase b
c	99	lowercase c
d	100	lowercase d
e	101	lowercase e
f	102	lowercase f
g	103	lowercase g
h	104	lowercase h
i	105	lowercase i
j	106	lowercase j
k	107	lowercase k
l	108	lowercase l
m	109	lowercase m
n	110	lowercase n
o	111	lowercase o
p	112	lowercase p
q	113	lowercase q
r	114	lowercase r
s	115	lowercase s
t	116	lowercase t
u	117	lowercase u
v	118	lowercase v
w	119	lowercase w
x	120	lowercase x
y	121	lowercase y
z	122	lowercase z
{	123	left curly brace
\|	124	vertical bar
}	125	right curly brace
~	126	tilde

Character	Entity Number	Enity Name	Description
			non-breaking space
¡	¡	¡	inverted exclamation mark
¢	¢	¢	cent
£	£	£	pound
¤	¤	¤	currency
¥	¥	¥	yen
¦	¦	¦	broken vertical bar
§	§	§	section
¨	¨	¨	spacing diaeresis
©	©	©	copyright
ª	ª	ª	feminine ordinal indicator
«	«	«	angle quotation mark (left)
¬	¬	¬	negation
			soft hyphen
®	®	®	registered trademark
¯	¯	¯	spacing macron
°	°	°	degree
±	±	±	plus-or-minus
²	²	²	superscript 2
³	³	³	superscript 3
´	´	´	spacing acute
µ	µ	µ	micro
¶	¶	¶	paragraph
·	·	·	middle dot
¸	¸	¸	spacing cedilla
¹	¹	¹	superscript 1
º	º	º	masculine ordinal indicator
»	»	»	angle quotation mark (right)
¼	¼	¼	fraction 1/4
½	½	½	fraction 1/2
¾	¾	¾	fraction 3/4
¿	¿	¿	inverted question mark

Character	Entity Number	Entity Name	Description
À	À	À	capital a, grave accent
Á	Á	Á	capital a, acute accent
Â	Â	Â	capital a, circumflex accent
Ã	Ã	Ã	capital a, tilde
Ä	Ä	Ä	capital a, umlaut mark
Å	Å	Å	capital a, ring
Æ	Æ	Æ	capital ae
Ç	Ç	Ç	capital c, cedilla
È	È	È	capital e, grave accent
É	É	É	capital e, acute accent
Ê	Ê	Ê	capital e, circumflex accent
Ë	Ë	Ë	capital e, umlaut mark
Ì	Ì	Ì	capital i, grave accent
Í	Í	Í	capital i, acute accent
Î	Î	Î	capital i, circumflex accent
Ï	Ï	Ï	capital i, umlaut mark
Ð	Ð	Ð	capital eth, Icelandic
Ñ	Ñ	Ñ	capital n, tilde
Ò	Ò	Ò	capital o, grave accent
Ó	Ó	Ó	capital o, acute accent
Ô	Ô	Ô	capital o, circumflex accent
Õ	Õ	Õ	capital o, tilde
Ö	Ö	Ö	capital o, umlaut mark
×	×	×	multiplication
Ø	Ø	Ø	capital o, slash
Ù	Ù	Ù	capital u, grave accent
Ú	Ú	Ú	capital u, acute accent
Û	Û	Û	capital u, circumflex accent
Ü	Ü	Ü	capital u, umlaut mark
Ý	Ý	Ý	capital y, acute accent
Þ	Þ	Þ	capital THORN, Icelandic
ß	ß	ß	small sharp s, German
à	à	à	small a, grave accent
á	á	á	small a, acute accent
â	â	â	small a, circumflex accent
ã	ã	ã	small a, tilde
ä	ä	ä	small a, umlaut mark
å	å	å	small a, ring
æ	æ	æ	small ae
ç	ç	ç	small c, cedilla
è	è	è	small e, grave accent
é	é	é	small e, acute accent
ê	ê	ê	small e, circumflex accent
ë	ë	ë	small e, umlaut mark
ì	ì	ì	small i, grave accent
í	í	í	small i, acute accent
î	î	î	small i, circumflex accent
ï	ï	ï	small i, umlaut mark
ð	ð	ð	small eth, Icelandic
ñ	ñ	ñ	small n, tilde
ò	ò	ò	small o, grave accent
ó	ó	ó	small o, acute accent
ô	ô	ô	small o, circumflex accent
õ	õ	õ	small o, tilde
ö	ö	ö	small o, umlaut mark
÷	÷	÷	division
ø	ø	ø	small o, slash
ù	ù	ù	small u, grave accent
ú	ú	ú	small u, acute accent
û	û	û	small u, circumflex accent
ü	ü	ü	small u, umlaut mark
ý	ý	ý	small y, acute accent
þ	þ	þ	small thorn, Icelandic
ÿ	ÿ	ÿ	small y, umlaut mark

Number	Description	Covers
8859-1	Latin 1	North America, Western Europe, Latin America, the Caribbean, Canada, Africa.
8859-2	Latin 2	Eastern Europe.
8859-3	Latin 3	SE Europe, Esperanto, miscellaneous others.
8859-4	Latin 4	Scandinavia/Baltics (and others not in ISO-8859-1).
8859-5	Latin/Cyrillic	The Cyrillic alphabet. Bulgarian, Belarusian, Russian and Macedonian.
8859-6	Latin/Arabic	The Arabic alphabet.
8859-7	Latin/Greek	The modern Greek alphabet and mathematical symbols derived from the Greek.
8859-8	Latin/Hebrew	The Hebrew alphabet.
8859-9	Latin/Turkish	The Turkish alphabet. Same as ISO-8859-1 except Turkish characters replace Icelandic.
8859-10	Latin/Nordic	Nordic alphabets. Lappish, Nordic, Eskimo.
8859-15	Latin 9 (Latin 0)	Similar to ISO-8859-1 but replaces some less common symbols with the euro sign and some other missing characters.
2022-JP	Latin/Japanese 1	The Japanese alphabet part 1.
2022-JP-2	Latin/Japanese 2	The Japanese alphabet part 2.
2022-KR	Latin/Korean 1	The Korean alphabet.