Fmars Blog: September 2014

C++

Use the review part in " Crack Code"

C++ vs Java

1.Java runs in a virtual machine2.C++ natively supports unsigned arithmetic

Either signed int or unsigned int, it is a sequence of binary bits in the memory. The difference is how software gives it semantic meaning. In C++ a binary sequence can be treat as a signed int or unsigned int. But Java can only treat it as signed int. Thus if you want to use unsigned int and related operation, Java will automatically use unsigned long to guarantee the range of value.

3 In Java, parameters are always passed by value (or, with objects, their references are passed by value) In C++, parameters can be passed by value, pointer, or by reference

C++ doesn't give a strict requirement on how to implement reference. Compiler usually implements reference by using a pointer const. But whether they are identical internally, they are totally different usage. 1. Pointer can point to some else but reference cannot change once it has been binding with something.2. A reference must be binding with something when it is declared. A pointer can point to null but reference can not.3. Reference doesn't have algorithm operations like int *p ++

4 Java has built-in garbage collection5 C++ allows operator overloading

TODO: why doesn't Java support operator overloading?

6 C++ allows multiple inheritance of classes

13.2 Compare Hash table vs an STL map

The first in mind difference is STL map is implemented in Binary Search Tree that each operation runs in O(logn). However unordered_map is implemented in actual hash.

The trade off for larger time complexity is STL map maintains the order in the mean time. That means it support operations related to the

1. find min/max 2. print the elements in sorted 3. find the exact or the most closed (STL::map::lower_bound, find the first equal or bigger; STL::map::upper_bound, find the first

The implementation of a hash

1. a good hash function to ensure uniform 2. a collision resolving 3. dynamically increase or decrease table size

13.3 Virtual functions in C++?

Virtual functions is a very important part in C++ to represent Polymorphism. Let's talk about Virtual function first then step further to Polymorphism.

What's virtual function

The class function defined with virtual keyword is a virtual function. The difference between a virtual function and a normal class function is :

if a function is defined as 'virtual' in the base class, the most derived class's implementation of function is called according to the actual type of the object referred to, regardless of the declared type of pointer or reference.

The result of program would be "Derived" rather then "Base".

Static binding & Dynamic binding

In static binding compiler will know each function's address(offset) at compile time. In dynamic binding, compiler can know the exact function called only during the run time. Thus need a additional structure to store this information called vtable for each class.

Insight of Virtual Function implementation

Each class X will have an virtual table for each base class which X derives from (C++ is multiple inheritance language).
Each object of class X will have a vptr variable pointing to the first virtual table of class X.
If a virtual function is not overridden by the derived class, the vtable of the derived class stores the address of the function in his parent class.
When a virtual function is called, the virtual table got by vptr is used to resolve the address function.

The vptr is usually stored at the very first position in a object. vptr is a pointer which points to the first virtual table. Within each virtual table, the pointer to virtual functions are stored. Consider the following program.

Because there are two vptr in the derived class, thus the size of derived class is 8 bytes (each pointer has 4 bytes in 32 bit system).
Explicit convert the vptr to a int pointer. We will find out the address of first virtual table.
Add 1 to the address of first virtual table, we will get the address of second virtual table.
Refer to the address of second virtual table, and plus 1 then we will get the address of second virtual function in the second virtual table
Use a function pointer then we can call this virtual function manually. Looks cool aha.

Trade-off of the virtual function

It takes more memory space because each class also needs to store a virtual table and each instance of the class needs to store a pointer to the virtual table. Also the function address is not know during the compile time rather it is known during the run-time which calculation is needed because it will first find the pointer to virtual table and then use the offset to find out the address of the virtual function.

Pure Virtual Function & Abstract Class

A pure virtual function is a virtual function whose deceleration is ended as '=0'. A pure virtual function makes its class as a abstract class which can not be instantiated until all the pure virtual function has been implemented.

Object Oriented Language

Modern OO Language provides three capacities

Encapsulation
Inheritance
Polymorphism

Polymorphism in Programming Language

In programming language 'polymorphism' means some code or operations behave differently in different contexts. Things like Template, overloading, overriding and virtual function are all representation of polymorphism.

Overloading & Overriding

Overloading is when you define two function with same name in same class but distinct signature.
Overriding is when you define the function already defined in the parent class with the same signature

TODO: What programming experience do you have related to Polymorphism

13.4 Deep copy & Shallow copy
Shallow copy only copy the pointer to the data.
Deep copy copies the exact data.

TODO: The bug about memcpy in Database Project related to overlay

13.5 Keyword "volatile"

For compiler, keyword 'volatile' does two things.

Don't cache the value in a temp register. Instead always read the value from memory
Avoid compiler optimizing the code seemed useless within current scope however may be modified from a outer operation

In practice keyword 'volatile' does two things.

For hardware access like interrupt routine
For inter- thread communication

C++11

According to the C++11 ISO Standard, the volatile keyword is only meant for use for hardware access; do not use it for inter-thread communication. For inter-thread communication, the standard library provides std::atomic<T> templates.

For multi-thread application programming, volatile doesn't It does not provide any synchronization, it does not create memory fences, nor does it ensure the order of execution of operations. It does not make operations atomic. It does not make your code magically thread safe

TODO: Review CS295 about consistency of multithread programming and hardware arcitecture
TODO: Find out about memory fence, atomic operation in this post
http://stackoverflow.com/questions/4557979/when-to-use-volatile-with-multi-threading

13.6 Name Hiding
In C++, when you have a class with an overloaded method, and you then extend and override that method, you must override all of the overloaded

methods

13.7 Why destructor virtual
The constructor and destructor of a derived class will call all the based classes's corresponding constructor and destructor function.
If a based class pointer points to a derived class object, the derived class's destructor won't be called if the destructor is not a virtual function.

13.7 Smart Pointer
"Note that after Delete or Free operation, the value of a pointer may not change."
Use a class that contains a pointer to point the wanted block of memory which could avoid memory leak because when a smart point is out of scope the destructor will be called automatically to free the space. Within the class it stores a pointer to point to the block of memory and a integer which stores the reference number that points to this memory.

13.8 Iterator
C++ uses iterator to give a general way for programmer to access the container. It is a general interface which hides the detail of how to implement. It also provide a way for some common algorithm to be implemented beyond different types of containers like sort, reverse function.

腾讯电面二面 09.21.2014

刚刚二面结束，心情比较沉重，目测就到这儿了。。面试过程不畅和一直听不清电话那头对方说啥一定是有原因的。。
------------------------------------------------------------------------------------------
这次面试和上次内容一样，还就是挑了简历上一个项目问。但是问题完全不一样啊。
先是大概讲一下虚拟内存是什么，再问如果物理内存用完了怎么办。之后问你知道哪些置换算法。然后重点来了。。
先是问你有没有读过linux或者windows之类系统的置换算法，如果你没读过源码那你自己怎么弄出来的。然后问到数据库，问有没有读过mysql之类数据库的源代码，如果没读过那自己设计岂不是会出现很多问题而且没有个比较么。

我一想，确实说的很有道理，在做数据库的时候阅读mysql的源码想必是很酷的一件事。不过我确实没有读啊。准确的说应该是，未曾敢于尝试去读。。
总之问了些类似的问题，感觉自己表现还是比较糟糕的。
-------------------------------------------------------------------------------------------
留了两道算法题，用邮件发过来，给半小时做然后邮件回复。（这里想吐槽下，感觉在规范化啊工具啊等方面还是和国外差距比较大。没有在线的交互代码测评，用一下googledoc也是可以接受的啊。。。）

就不透露题目具体内容了。两道题。一个是双向链表的节点删除。另一个是二叉树上的搜索。

遇到了一个非常非常古怪的问题。写完题目之后我就自己写了点testcase跑。结果每次都段错误，搞得我非常紧张，觉得自己代码没有问题，到最后还是段错误。
后来觉得是不是因为是在Windows下用的CodeBlocks的原因。于是在面试结束后，把同样的程序放在VC++6.0和Linux下G++跑，都完全没有问题，但是CodeBlocks还是段错误。好奇怪啊，导致我在面试时候至少花了十五分钟琢磨到底自己哪儿错了。。

有兴趣的朋友也可以试试在你的CodeBlocks下跑一下这个代码看看会不会出段错误。如果哪位高手知道是什么原因还请指点啊
我用的是windows7 Code::Blocks 13.12。在运行完test_2后test_3会段错误

没有修改后继节点的前继节点。。想必挂，是妥妥的了

INTERVIEW DIRECTORY

腾讯电面 09.17.2014

无论毕业之后去哪儿，几年后的打算还是回国，BAT当然不能错过。
昨天早上五点电话响，还没清醒接起电话，对方说是腾讯HR，问我现在是九点钟么。我表示我是西海岸啊现在是五点多。非常尴尬。然后说看我工作意向填了北京和广州，问我可以接受深圳么，我表示因为网申至少要填两个意向，我没办法只好又填了个广州，其实我只想去北京。然后对方表示现在北京还没有对于职位。。顿时好尴尬，不过还是安排了电面，时间就是当天晚上十一点，效率倒是很高。
=======================================================================
晚上十一点，等了大概十分钟还是没来电话，以为被耍了，刚打算躺下电话就行了。极其强烈的广东腔啊，对方表示一时半会没找到咋拨美国电话。。
有点习惯了英文面试，突然一下用中文好不习惯啊，名词啥的也不太清楚怎么讲，对方貌似是广东仔，普通话说得也不顺溜，两个人就都结结巴巴的，对方名字都没听清，职位一串也就只听清了微信两个字，不过就够了，微信掉渣天的牛逼产品。

面试三十多分钟，先是问了下你对简历上哪个项目还记得比较清楚，我说数据库。然后接下来百分之八十五的时间就在问这个，问的还是比较深入有水平。包括硬盘、内存、b树、一致性等的问题。所以，我的一面很简单，几乎全部时间就是自己讲了一下自己的一个project。

之后就是我提问了。。
问了下腾讯之后如果打算以微信为平台开发各种其他产品的话，本身因为用户数目已经超级胖达，之后数据量恐怕又会在数量级上上升，现在腾讯是否具备处理更大规模数据的能力，已经今后在大数据方面有哪些技术挑战，需要哪方面的码农。

整个过程比较愉快。唯一比较尴尬的就是技术面试难道不应该来个算法题么。。

=======================================================================
在第二天早上五点接到电话安排了二面，HR开口第一句又是”你现在是九点么。。“

Thinknear Phone Interview Fulltime 09.15.2014

When I's applying summer intern last, I unintended applied a Thinknear full time position. It's really embarrassed. However it seems that they had record my information because a very nice HR got in touch with me and scheduled a phone interview.

Then I begun to have a look at this company and found that it is a really amazing company. It has a really amazing idea about the the usage of location information. By analysis customers daily data besides with geo location, they perform a refined ads method and do the ads recommendation. Based on such background, it's obvious that Thinknear is a latest technical company related to big data and the cloud. So it is unexpected fit for me.

The phone interview was pretty common. First ask me to talk about what I did in last internship. Then ask some details about how do I read paper to find the idea.
Then ask what's your most interested project and you want to continue in the further. I said is the database project. Then he asked why.

After this we built up a shared google doc and did a algorithm problem. It's a pretty common one. Check if a binary tree is a binary search tree. And asked about the time complexity.

After that it's time to ask him questions and then finished.
Quite great company and quite short an interview.

----------------------------------------------------------------------
UPDATED three days later
----------------------------------------------------------------------
I always thought that although the algorithm for binary search tree check works, it looks really ugly. When I trying to figure out a elegant way, I suddenly realized that my method was wrong. Here is my code.

class Node {

Node *left, *right;

int value;

}

bool is_bst(Node *root) {

vector<int> v;

preorder_traverse(root,v);

for(int i=1;i<v.size();i++)

if(!v[i-1]<=v[i])

return false;

return true;

}

void preorder_traverse( Node*p,vector<int>&v){

if(p==NULL)

return ;

preorder_traverse(p->left);

v.push_back(p->val);

preorder_traverse(p->right);

}

After wrote down the code, I asked the interviewer that if it is possible to have duplicated items and it it is , is should be placed at left child or right child. The interviewer said there could be and would be placed at left child. So I updated the code like this "if(!v[i-1]<=v[i])". However it was totally wrong.
I think one solution could work.

Let the function bool is_bst ( Node *p, int min, bool min_is_equal, int max, bool max_is_equal) denote that if the subtree whose the root is p is a binary search tree that all the elements in the subtree are bigger than min and smaller then max. The min_is_equal and max_is_equal denote that if it could be equal. Do the check function recursively and update the parameter would work.

[Leetcode Solution] Merge k Sorted Lists

Analysis

This is a problem not hard but interesting. Here we go and have a look.
In order to merge k sort lists, there are many solutions.Time complexity analysed based on a simplified situation that that are n lists and each of them contains m elements.

The first one came into mind is do it just like how to merge two lists in the merge sort. Each time compare the current element of all the lists. Find out the smallest one and insert it into result list. However this solution could be really slow because it is O(m^(n*m)) time complexity.
Another solution is each time merge two lists, and then merge the result list with another one until no list. If so the time complexity is O(n*n*m). However if use binary merge that first we merge all the original lists into n/2 lists and then merge these n/2 lists into n/4 and so forth until only one list left. This could be more efficient.
One straightforward solution is just store all the elements in a vector, and then sort them. It's simple but works with O((n*m)*log(n*m)).
A good solution in theoretical perspective is that we can maintain a priority queue with m elements. Each time we pop up the smallest element in the priority queue in O(1) time and then push in the element which is the next node of the popped one in O(log(m)) time. The overall time complexity is O((n*m)*log(m)). It should be faster then the above algorithm however in fact it is a little slower then the above one which may because it has a bigger coefficient.

Note

The signature of priority_queue constructor in C++ STL found here

The difference between comparison parameter for priority_queue and for sort found here
operator () found here
const qualifier found here

Code

[Leetcode Solution] Swap Nodes in Pairs

Analysis

Each time swap the following two nodes

Note

Code

[Leetcode Solution] Reverse Nodes in k-Group

Analysis

Each time reverse one group of nodes

Use pre_last denote the last node of previous group
Use cur_first denote the first node of current group
Each time after reversing, make pre_last->next=cur_first

Adding a auxiliary node ahead of the head to avoid the specific situation of the first node

Note

Code

[Leetcode Solution] Remove Duplicates from Sorted Array

Analysis

Note

Code

[Leetcode Solution] Remove Element

Analysis

Use two pointers. One points to the current element. The other points to the current position where to store the next unique element.

Note

Code

[Leetcode Solution] Implement strStr()

Analysis

Note

Code

[Leetcode Solution] Substring with Concatenation of All Words

Analysis

Brutal force search

Note

Code

[Leetcode Solution] Next Permutation

Analysis

Let's consider a simpler situation. Given n integers [1..n] and a permutation generated by these n integers. How to find the next permutation.

It seems easy right. If we treat the permutation as a integer number, our purpose is to increase value of this number to the nearest one by reorder the integers.

Say the permutation is like this [a(1),a(2),...,a(n)]. if i<j and a(i) < a(j). If we swap ai and aj the value of permutation will increase. In order to get the nearest bigger number, we would like to increase a bit in least significant bit by swap it with a bit at more least significant direction. So the solution is find the first a(i) that there is an a(j) that a(i) < a(j) and i<j. After that we swap the value of a(i) and a(j) then resort the integers from a(i+1) to a(n) in order to change it to the smallest one.

Now consider the given scenario. There are different integers in the permutation and duplicates.

First we can just treat different integers as the sequence of 1 .. n because they share the same order. Only order makes a difference rather then the value.
Second if duplicates exist, we will only update the part which finds the first a(i).

Note

Code

[Leetcode Solution] Longest Valid Parentheses

GOOD PROBLEM

Analysis

Very similar to problems like 'Best Time to Buy and Sell Stock' and 'Largest Rectangle in Histogram' which can be easily solved in O(n*n) but has a tricky O(n) solution by two pointers scan or stack.
As the same of 'Largest Rectangle in Histogram' this problem use a stack to maintain the previous state and calculate the current state in constant time which eventually leads to a O(n) time solution.

Use a stack to store the previous information. Iterate the input string.

If string[i] == '(', push it back to the stack and store the position of '('
If string[i] == ')' then pop the last element in the stack and calculate the length of the substring formed by these ')' and '(' based on the position information stored before.

The tricky is, if a test case exists like this
"( ) ( )"
the above solution will get two parentheses pairs whose length is 2 for each but cannot sum up them together.

So the solution is adding another pointer 'start' which used to store the start position of a entire valid paired sub string. Because we know that the sub string will become invalid if the number of ')' is larger then '(' when we iterate from left to right. So as long as the number of ')' is not larger then '(', the sub string is still valid and it could be formed as a entire paired sub string.

Thus we will only reset the 'start' pointer when stack is empty and the current char is ')'.
When we calculate the length of sub string, if the stack is empty, we should use the 'start' as the start position rather then the position information stored in the stack.

Note

Code

[Leetcode Solution] Count and Say

Analysis

Note

Code

[Leetcode Solution]Combination Sum II

Analysis

Note

Code

[Leetcode Solution]Combination Sum

Analysis

Note

Code

[Leetcode Solution]First Missing Positive

Analysis

Iterate elements in the array. Put each element at the corresponding position just like bucket sort
Then check which positive integer is missing

Note

Code

[Leetcode Solution] Trapping Rain Water

Analysis

The volume of water can be trapped above one point depends on the highest integers in its bi-directions.
Use three passes. First find out the left side highest integer of each node, second find out the right side. Last compute the volume of water trapped at each node

Note

Code

[Leetcode Solution] Multiply Strings

Analysis

Similar with string addition
Iterate each digit in num1 and num2, multiply them and store the result at correct position

Note

Code

[Leetcode Solution] Wildcard Matching

GOOD PROBLEM

Analysis

Directly use DFS of course ETL
Use DP also ETL

f[i][j]=f[i-1][j-1](a[i]==b[j] || b[j]=='?')

f[i][j]=f[k][j-1](i<=k<=a.size())(b[j]=='*')

PRUNING ONE: Knowing that if there are multiple consecutive * in the string b, then eliminate them and remain one * is ok. This pruning passes all the testcases except for the last one.

PRUNING TWO: Another pruning comes up with this post. Here let me show the details about this.

Let f(p,q) denote if a[0..p] matches b[0..q]. Then our task is to find out the value of f(a.size,b.size()). The searching procedure of finding the answer is a tree structure like this.

Each node(p,q) will have one child if b[q]=='?' or a[p]==b[q], or multi children if b[q]=='*'. ( In the figure we can see that node(p,q) has multi children while node(p,q+1) has one child).
The subtle trick is if we reached the node say (p+2+2,q+1+2) then there is no need to search upper layer's nodes which circled by a dotted line. This can be proved by contradiction in following figure.

Let two lines denote the two strings a and b. Here b[q]=='*'. And we do the search to the q+1+t. Then we can know there is no need to do the remained search at q. Because if there is a possible solution from the remaining cases, there must be also a possible solution at the current scenario. Because the second '*' can fill the gap between A and B in above figure.

PRUNING THREE. after two pruning above the program still is blocked at the last test case. So the last pruning comes that let aux(k) denote the number of alphabet in b[k..end]. If a search node(p,q) which a.size() - p < aux(q) then we know there won't be a solution from this point.

My Blog List

C++ vs Java

13.2 Compare Hash table vs an STL map

C++ Basics *

C++ Unsigned Int *

Bit Manipulation *

Reverse Ordered Pairs *

20th popularist URLs *

Analysis

Note

The signature of priority_queue constructor in C++ STL found here

The difference between comparison parameter for priority_queue and for sort found here

Code

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

GOOD PROBLEM

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

Analysis

Note

Code

GOOD PROBLEM

Analysis

Note

Code